Jun 23, 2009
June 23rd, 2009 by Paul Bradshaw
The Telegraph have finally published their MPs’ expenses data online – and it’s worth the wait. Here are some initial thoughts and reactions:
- Firstly, they’ve made user behaviour an editorial feature. In plain English: they’re showing the most searched-for MPs and constituencies, which is not only potentially interesting in itself, but also makes it easier for the majority of users who are making those searches (i.e. they can access it with a click rather than by typing)
- There’s also a table for most expensive MPs. As this is going to remain static, it would be good to see a dedicated page with more information – in the same way the paper did in its weekend supplement.
- The results page for a particular MP has a search engine-friendly URL. Very often, database-generated pages have poor search engine optimisation, partly because the URLs are full of digits and symbols, and partly because they are dynamically generated. This appears to avoid both problems – the URL for the second home allowance of Khalid Mahmood MP, for example, is http://parliament.telegraph.co.uk/mpsexpenses/second-home/Khalid-Mahmood/mp-11087
- The uncensored expenses files themselves are embedded using Issuu. This seems a strange choice as it doesn’t allow users to tag or comment – and the email/embed option is disabled for “secret documents”
- There’s some nice subtle animation on the second home part of expenses, and clear visualisation on other parts.
- The MP Details page is intelligently related both to the Telegraph site (related articles) and the wider web, with the facility to easily email that MP, go to their Wikipedia entry, and ‘bookmark’.
- Joy of joys, you can also download the MPs expenses spreadsheet from here (on Google Docs) – although this is for all MPs rather than the one being viewed. Curiously, while viewing you can see who else is viewing and even (as I did) attempt to chat (no, they didn’t chat back).
I’ll most likely update this post later as I get some details from behind the curtain.
And there are more general thoughts around the online treatment of expenses generally which I’ll try to blog at another point.
May 19, 2009
May 19th, 2009 by Paul Bradshaw
This post by Tony Hirst should be recommended reading for every journalist interested in the potential of computers for reporting.
Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users.
Put aside any intimidation you might feel at the mention of APIs and query languages. What it boils down to is this: you can alter the web address of a Google spreadsheet to filter the data and find the story.
Simple as that.
Hirst uses the example of the spreadsheet of MPs expenses recently released by The Guardian (they’ve also published Lords expenses). By altering the URLs this is what he generates (I’m quoting his bullet points):
- the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);
- How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083
- So which people did not claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: select B,C,I where I!=23083 (using <> for ‘not equals’ also works); NB here’s a more refined take on that query: select B,C,I where (I!=23083 and I>=0) order by I
- search for the name, party (column D) and constituency (column E) of people whose first name is Jane or is recorded as John (rather than “Mr John”, or “Rt Hon John”): select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)
- only show the people who have claimed less than £100,000 in total allowances : select * where F<100000
- what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): select sum(I)
- So how many MPs are there? Count the number of rows in an arbitrary column: select count(I)
- Find the average amount claimed by the MPs: select sum(I)/count(I)
- Find out how much has been claimed by each party (column D): select D,sum(I) where I>=0 group by D (Setting I>0 just ensures there is something in the column)
- For each party, find out how much (on average) each party member claims:select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D”>select D,sum(I)/count(I) where I>=0 group by D
OK, you need to know the words to use (and if you have a link to an easy reference for these let me know*), but this is still a lot easier than using programming languages and databases.
As I say, this also illustrates the importance of publishing raw data so users can interrogate it in their own ways, which is precisely what The Guardian’s Data Store has been doing, meaning that people like Tony can create interfaces like this.
Wonderful.
*Tony has very generously created this page which helps you formulate your search – and generates the URL. If you were working on a different spreadsheet you could just replace the spreadsheet URL and change any column references accordingly.
UPDATE: Tony also has a version which allows you to pick from Guardian datasets.