Sometimes the most impressive tools solve a problem you never knew you had. In the case of QueryTree, a new data analysis tool, that problem is something most people never question: spreadsheets.
For all the shiny-shiny copy-and-paste-click-and-drag-ness in new journalism tools, most data digging comes back to at least some simple spreadsheet work, and that represents a significant hurdle for many journalists used to working with simpler tools.
While interface design has undergone generations of improvement on the web, spreadsheet software interfaces have remained largely unchanged for decades.
So why did no one think to do this before?
You only need 10 choices
QueryTree is a drag-and-drop spreadsheet analysis tool. On the left are 10 boxes that you can drag onto the main canvas: Table and File allow you to directly input or upload data respectively. Sort and Filter do just that, and Select allows you to reorder your data so that the most relevant columns come first, and irrelevant ones are left out entirely.
Group performs the same function as pivot tables in traditional spreadsheet packages: it aggregates your data so you can, for example, see the total amount of payments to one company rather than having to add them manually.
Join and Append do two things that aren’t easy in Excel at all: join datasets with a common field (such as an area code – so you might for example join a crime dataset with one on deprivation); and append a new dataset to an old one (where, for example, data might be published monthly and you want to look over more than one month’s worth)
Finally there are two visualisation options: Line and Bar chart. These are pretty rudimentary at the moment, but there are plans to expand both types and functionality (such as sharing and embedding).
Given the feature bloat of Excel it’s striking to see just how little functionality data analysis can be boiled down to – and in fact, you could easily reduce them further to just 7 by dropping the visualisation and direct data entry.
Join the dots
To analyse a piece of data you just drag each box onto the canvas as you need it: so, for example, you might drag ‘File’ on to upload your data, then ‘Sort’ next to it in order to sort it (as soon as one box is dragged near another it is automatically joined).
Your options on what to sort appear as soon as you place the box, and a panel below the canvas shows you the results. To the right of that you can change or delete the sorting options.
All of this not only makes it simpler than having to remember where filter or sort is on the dozens of menus in Excel, but it’s also faster than using menus in the first place – and that’s crucial for journalists up against a deadline.
Every new box you add creates a new view over the data: you could ‘Group’ council spending by recipient, then ‘Sort’ from largest amount to smallest, then ‘Filter’ for those companies with a particular word in their name.
What’s more, each option you drag onto the canvas is a separate view over the data: so you can switch between looking at your data sorted by one field, to looking at it filtered by another, to looking at it aggregated by ‘Group’.
The process is much quicker and more intuitive than switching between sheets on an ever-expanding (and -slower) spreadsheet.
Plug and play
But the real time-saver is in the way you can plug new datasets into previous data flows.
The image above, for example, shows how you might filter, sort, group and visualise one dataset on GPs’ expenses.
To do the same on an updated dataset you would just need to drag that new data onto the canvas, join it to the processes that had been used last time, select the connection to the old data and delete it.
…But still in beta
QueryTree is a hugely promising tool – a potential Excel-killer – but key to its usefulness will be how fast the service is in practice, not just as a workflow on paper. If their servers can cope with what could be a huge demand, and what could be huge files, then this could be a very powerful tool indeed.
Meanwhile, as you’d expect with a tool that’s still being tested, there are bugs such as stumbling over CSVs with pound signs in their names, and small areas in need of tweaking.
Cleverly, the tool allows you to work with data before it’s been fully loaded – but doesn’t yet tell you when it has. There are plans to change this, to improve error handling (telling you what’s gone wrong), and to add XML and JSON import options to the current CSV-only. And the proposed pricing structure may change too.
Although the tool is currently being tested in private beta it’s well worth signing up for and helping them improve it.
UPDATE (Feb 5 2013): To answer the question posed in the headline, no, I don’t think this is an Excel killer.