This post by Tony Hirst should be recommended reading for every journalist interested in the potential of computers for reporting.
Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users.
Put aside any intimidation you might feel at the mention of APIs and query languages. What it boils down to is this: you can alter the web address of a Google spreadsheet to filter the data and find the story.
Simple as that.
Hirst uses the example of the spreadsheet of MPs expenses recently released by The Guardian (they’ve also published Lords expenses). By altering the URLs this is what he generates (I’m quoting his bullet points):
- the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);
- How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083
- So which people did not claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: select B,C,I where I!=23083 (using <> for ‘not equals’ also works); NB here’s a more refined take on that query: select B,C,I where (I!=23083 and I>=0) order by I
- search for the name, party (column D) and constituency (column E) of people whose first name is Jane or is recorded as John (rather than “Mr John”, or “Rt Hon John”): select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)
- only show the people who have claimed less than £100,000 in total allowances : select * where F<100000
- what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): select sum(I)
- So how many MPs are there? Count the number of rows in an arbitrary column: select count(I)
- Find the average amount claimed by the MPs: select sum(I)/count(I)
- Find out how much has been claimed by each party (column D): select D,sum(I) where I>=0 group by D (Setting I>0 just ensures there is something in the column)
- For each party, find out how much (on average) each party member claims:select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D”>select D,sum(I)/count(I) where I>=0 group by D
OK, you need to know the words to use (and if you have a link to an easy reference for these let me know*), but this is still a lot easier than using programming languages and databases.
As I say, this also illustrates the importance of publishing raw data so users can interrogate it in their own ways, which is precisely what The Guardian’s Data Store has been doing, meaning that people like Tony can create interfaces like this.
*Tony has very generously created this page which helps you formulate your search – and generates the URL. If you were working on a different spreadsheet you could just replace the spreadsheet URL and change any column references accordingly.
UPDATE: Tony also has a version which allows you to pick from Guardian datasets.
Pingback: Making it a Little Easier to Use Google Spreadsheets as a Database (Hopefully!) « OUseful.Info, the blog…
Pingback: The golden age of computer-assisted reporting is at hand » Nieman Journalism Lab
Pingback: The golden age of data journalism?
Pingback: links for 2009-05-21 «
Curious problem with all of the links in the comments section.
Thanks – I recently updated my version of wordpress so I wonder if one of the plugins is not working probably: my guess being the dofollow one.
Pingback: links for 2009-05-22 « Sarah Hartley
The mention of Google Spreadsheets here is a little misleading – you could equally do the same sort of analysis using an offline tool like Microsoft Excel or Calc (the free OpenOffice equivalent).
The point remains though that most journalists could do with help in getting to grips with this; I’ve had colleagues asking this week precisely about how they can use spreadsheets to cut through the swathes of data around expenses.
Pingback: links for 2009-05-25 « David Black
But spreadsheets are just databases. Its just another label?
Whether you use google, microsoft, Zoho or any other out there.
And of course you can sort the fields to generate the ‘useful’ data. This should not be ‘news’.
James, Calli – the ‘news’ here it seems to me is that you can alter the URL in ways that filter information in more sophisticated ways than just using ‘sort’. It’s that that makes this more like a database you can interrogate using combinations of factors than a simple spreadsheet
Pingback: Every news organisation should have a Datastore | Online Journalism Blog
This is interesting indeed.
I tried with my own spreadsheet,
but I always got this message:
“Oops, an error occured.
Reason: Access denied
Description: Access denied”
Anyone can view and edit the spreadsheet via the key I used in this example, how come I still get access denied ?
The problem is that don’t work with private spreadsheet only public. Any idea to work with provate spreadsheet and make queries like this. Will be great !
After reading your blog, I was curious to search if there is any app that really uses Google spreadsheet as database and check this out, there is a entire time tracker using Google Spreadsheet as database. http://screeperzone.com/2009/06/05/activity-tracker-plus-track-all-your-life-activities-with-just-a-single-click/.
How would I pass a variable to the url? Is there a way to mask the URL?
Probably best to ask that question on Tony’s original post, but I’ll ask him too.
@miek What do you mean exactly? what do you want to do?
IF you just want to bookmark queries into a spreadsheet, then the google viz api is accessed via a URL; eg http://ouseful.open.ac.uk/datastore/coiwebsites.php?run=true&gqc=A,B,EL%2BEN%2BEP%2BER%2BET&gqw=&gqo= has links to the spreadsheet query:
@James Goffin :
The author here means that Google Spreadsheets can be used as Databases in the cloud. You just update the data and create some dynamic app that receives the updated data and displays it on the users screen, be it a mobile, tablet or desktop PC.
Thanks for the post. Here’s a post which shows how to convert your excel to web in minutes http://blog.caspio.com/integration/convert-ms-excel-to-web/
Good question. I don’t know – any ideas?
Wriote moгe, thats all I hаve to saу. Literally, it seemns аs thоugh you relied οn the video tо make youг point.
You сlearly know what youre talkinjg abοut, whyy throw аωay уour intelligence оn just powting videos tо your weblog ωhen уou cοuld be giviing uus sokething
informative tο read?