Sometimes the most impressive tools solve a problem you never knew you had. In the case of QueryTree, a new data analysis tool, that problem is something most people never question: spreadsheets.
For all the shiny-shiny copy-and-paste-click-and-drag-ness in new journalism tools, most data digging comes back to at least some simple spreadsheet work, and that represents a significant hurdle for many journalists used to working with simpler tools.
While interface design has undergone generations of improvement on the web, spreadsheet software interfaces have remained largely unchanged for decades.
So why did no one think to do this before?
You only need 10 choices
QueryTree is a drag-and-drop spreadsheet analysis tool. On the left are 10 boxes that you can drag onto the main canvas: Table and File allow you to directly input or upload data respectively. Sort and Filter do just that, and Select allows you to reorder your data so that the most relevant columns come first, and irrelevant ones are left out entirely.
Group performs the same function as pivot tables in traditional spreadsheet packages: it aggregates your data so you can, for example, see the total amount of payments to one company rather than having to add them manually.
Join and Append do two things that aren’t easy in Excel at all: join datasets with a common field (such as an area code – so you might for example join a crime dataset with one on deprivation); and append a new dataset to an old one (where, for example, data might be published monthly and you want to look over more than one month’s worth)
Finally there are two visualisation options: Line and Bar chart. These are pretty rudimentary at the moment, but there are plans to expand both types and functionality (such as sharing and embedding).
Given the feature bloat of Excel it’s striking to see just how little functionality data analysis can be boiled down to – and in fact, you could easily reduce them further to just 7 by dropping the visualisation and direct data entry.
Join the dots
To analyse a piece of data you just drag each box onto the canvas as you need it: so, for example, you might drag ‘File’ on to upload your data, then ‘Sort’ next to it in order to sort it (as soon as one box is dragged near another it is automatically joined).
Your options on what to sort appear as soon as you place the box, and a panel below the canvas shows you the results. To the right of that you can change or delete the sorting options.
All of this not only makes it simpler than having to remember where filter or sort is on the dozens of menus in Excel, but it’s also faster than using menus in the first place – and that’s crucial for journalists up against a deadline.
Every new box you add creates a new view over the data: you could ‘Group’ council spending by recipient, then ‘Sort’ from largest amount to smallest, then ‘Filter’ for those companies with a particular word in their name.
What’s more, each option you drag onto the canvas is a separate view over the data: so you can switch between looking at your data sorted by one field, to looking at it filtered by another, to looking at it aggregated by ‘Group’.
The process is much quicker and more intuitive than switching between sheets on an ever-expanding (and -slower) spreadsheet.
Plug and play
But the real time-saver is in the way you can plug new datasets into previous data flows.
The image above, for example, shows how you might filter, sort, group and visualise one dataset on GPs’ expenses.
To do the same on an updated dataset you would just need to drag that new data onto the canvas, join it to the processes that had been used last time, select the connection to the old data and delete it.
…But still in beta
QueryTree is a hugely promising tool – a potential Excel-killer – but key to its usefulness will be how fast the service is in practice, not just as a workflow on paper. If their servers can cope with what could be a huge demand, and what could be huge files, then this could be a very powerful tool indeed.
Meanwhile, as you’d expect with a tool that’s still being tested, there are bugs such as stumbling over CSVs with pound signs in their names, and small areas in need of tweaking.
Cleverly, the tool allows you to work with data before it’s been fully loaded – but doesn’t yet tell you when it has. There are plans to change this, to improve error handling (telling you what’s gone wrong), and to add XML and JSON import options to the current CSV-only. And the proposed pricing structure may change too.
Although the tool is currently being tested in private beta it’s well worth signing up for and helping them improve it.
UPDATE (Feb 5 2013): To answer the question posed in the headline, no, I don’t think this is an Excel killer.
Recently participated in a data visualisation hackfest. I found that one of the biggest challenges is that data is presented in some pretty different ways:
– CSV/.xls files for download
– Google spreadsheet
– REST API returning JSON or XML
– Database with a SQL endpoint (eg, a PHPmyadmin interface)
– Downloadable database
Each of those is potentially a perfectly reasonable way of accessing data, but having the tools and platforms to cope with all of them – let alone mashing up several at once – is tough. If there was a middleware that could solve part of that problem (as it sounds like this may do in future), that’d be great.
Given the explosion of APIs I strongly suspect someone will eventually produce a plug-and-play tool for using them. At the moment Open (formerly Google) Refine is good for that, but the trend towards specific tools for specific tasks continues.
The answer to the question “Is this an Excel killer” is probably a resounding no. Why?
“QueryTree is a drag-and-drop spreadsheet analysis tool.” Which is to say “QueryTree is a drag-and-drop Excel analysis tool.”
Journalists will continue to request and receive data in the formats sources use. For the foreseable future, based on the growth curves of their nearest open-source competitors – that means Excel will continue to be a major player and likely market leader – which means we’ll still need to read xls and xlsx spreadsheets. Even if another feature-rich spreadsheet replaced Excel, tools to expedite analysis of the feature rich spreadsheet wouldn’t replace the value of those features to users outside the newsroom.
OpenOffice Calc was poised to be an Excel competitor, but still lacks acceptance in enterprise or government contexts. LibreOffice may eclipse OpenOffice because Libre is taking the lead in translating the more-open XML formats Microsoft adopted — docx and xlsx. But neither are close to replacing Excel, and the ability to manage spreadsheet data in other formats won’t address the many reasons businesses need spreadsheets. The ability to analyze data using point-and-click, drag-and-drop tools may expedite some journalistic tasks, but it won’t end the need for journalistic organizations as a whole to comprehend and work with emerging data formats.
Even setting aside the world of restful API’s, JSON and XML, just to make sense of spreadsheets sometimes requires tools beyond those that expedite analysis of one spreadsheet. Spreadsheets only scale so far. Analytic tools that don’t expand scalability of data contained in spreadsheets don’t address scalability requirements. For example, to compare changes in monthly payroll, each of which comprises single large spreadsheet, we may want to upload a CSV-flattened spreadsheet that into a SQL database, where we can tease out month-to-month or year-to-year trends, or problematically compare payrolls with data from other spreadsheets – budgets, or lists of contract invoices for example.
Drag and drop tools can be very useful in a journalistic context and in a business context. Suggestions that drag-and-drop tools will “kill” not only the need for an organization to leverage more advanced technology, but may actually “kill” the more advanced technologies may not be as useful. The suggestion may be less than useful if it discourages journalists or news organizations from continuing to expand technological capacity. A better question is whether this new drag-and-drop tool will complement available spreadsheet technology
I agree with everything you say. Despite the regrettably provocative headline, I don’t think QueryTree is an Excel killer.
What I do think it does is provide an easier entry point into data analysis which will hopefully lead more journalists into Excel functions, SQL, and other points on a learning curve which may just be a little less steep.
Pingback: Daily links 02/06/2013 at Sarah Hartley
Pingback: #Tip of the day for journalists: Try spreadsheet analysis tool QueryTree | Editors Blog | Journalism.co.uk
Pingback: #Tip of the day for journalists: Try spreadsheet analysis tool QueryTree | My Blog
Pingback: Paletleme Amirliği – Şubat 2013 « Emrah Göker'in İstifhanesi
Pingback: Press Coverage For QueryTree’s Launch