<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; spreadsheets</title>
	<atom:link href="http://onlinejournalismblog.com/tag/spreadsheets/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Thu, 24 May 2012 08:39:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Getting full addresses for data from an FOI response (using APIs)</title>
		<link>http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/</link>
		<comments>http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 13:00:29 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[online journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[concatenate]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[foi]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[grel]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[sean mcgrath]]></category>
		<category><![CDATA[spreadsheets]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=13647</guid>
		<description><![CDATA[Here&#8217;s an example of how APIs can be useful to journalists when they need to combine two sets of data. I recently spoke to Lincoln investigative journalism student Sean McGrath who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic). He [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F18%2Fgetting-full-addresses-for-school-data-in-an-foi-response%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F03_2F18_2Fgetting-full-addresses-for-school-data-in-an-foi-response_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F18%2Fgetting-full-addresses-for-school-data-in-an-foi-response%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a rel="attachment wp-att-13687" href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/heatfullcolour1-2/"><img class="aligncenter size-thumbnail wp-image-13687" src="http://http://ojb.journallocal.co.uk/files/2011/03/heatfullcolour11-400x426.jpg" alt="Heat Map" width="400" height="426" /></a></p>
<p>Here&#8217;s an example of how APIs can be useful to journalists when they need to combine two sets of data.</p>
<p>I recently spoke to Lincoln investigative journalism student <a href="http://www.seanmcgrath.co.uk/" onclick="urchinTracker('/outgoing/www.seanmcgrath.co.uk/?referer=');">Sean McGrath</a> who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic).</p>
<p>He had spent 3 days cleaning up the data and manually adding postcodes to it. This seemed a good example where using an API might cut down your work considerably, and so in this post I explain how you make a start on the same problem in less than an hour using Excel, Google Refine and the Google Maps API.</p>
<h2>Step 1: Get the data in the right format to work with an API</h2>
<p>APIs can do all sorts of things, but one of the things they do which is particularly useful for journalists is <em>answer questions</em>.<span id="more-13647"></span></p>
<p>If we give the <strong>Google Maps API</strong> an address, for example, it will give us all sorts of information in return, such as latitude and longitude, postcode, and so on (I&#8217;ll explain how to do this later in the post). That&#8217;s what we&#8217;re going to use here &#8211; but we might use other APIs or datasets instead.</p>
<p>Sean&#8217;s <a href="https://spreadsheets.google.com/pub?hl=en_GB&amp;hl=en_GB&amp;key=0ApTo6f5Yj1iJdDkzS3NHRl9rQ0Y5ODNhQzZSS1R5Rmc&amp;output=html" onclick="urchinTracker('/outgoing/spreadsheets.google.com/pub?hl=en_GB_amp_hl=en_GB_amp_key=0ApTo6f5Yj1iJdDkzS3NHRl9rQ0Y5ODNhQzZSS1R5Rmc_amp_output=html&amp;referer=');">spreadsheet</a> had one column for school names and another for its town or city &#8211; but we needed those details to be together so we had a complete &#8216;address&#8217;. In order to do that we needed to open the spreadsheet in Excel and create a new column that combined the two.</p>
<p>I created the new column on the left (column A) and typed the following into cell A3:</p>
<pre>=CONCATENATE(B3, ", ", I3)</pre>
<p>This copies the value in cell B3, puts a comma and space after it (&#8220;, &#8220;), and then copies whatever is in I3. In other words, it combines the two cells to create a full address.</p>
<p>To copy the formula down the whole spreadsheet for all the other rows I used my favourite ever shortcut: hold down CTRL and clicked on the + in the bottom right corner of that cell.</p>
<p>Now it&#8217;s ready to use in Google Refine.</p>
<h2>Step 2: Using Google Refine to ask Google Maps API a question</h2>
<p>Now open the spreadsheet in Google Refine.</p>
<p>In Refine click on the arrow at the top of column A (the one you created using =CONCATENATE) and select <strong>Edit Column &gt; Add Column by fetching URLs</strong></p>
<p>A new window will appear with a code box. Type this:</p>
<pre>"http://maps.googleapis.com/maps/api/geocode/json?sensor=false&amp;address=" + escape(value, "url")</pre>
<p>That basically creates a URL by adding the address in column A (&#8216;value&#8217;) to the Google Maps API URL. The URL itself is basically the spreadsheet &#8216;asking&#8217; the Google Maps API to give it all the information it has about the address &#8211; it also asks it to provide that information in a format called JSON (note &#8216;json&#8217; in the URL)</p>
<p>You can see all this being done in <a href="http://www.youtube.com/watch?v=m5ER2qRH1OQ" onclick="urchinTracker('/outgoing/www.youtube.com/watch?v=m5ER2qRH1OQ&amp;referer=');">Google Refine&#8217;s own video</a>:</p>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/5tsyz3ibYzk?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/5tsyz3ibYzk?version=3" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Give your new column a name, and click OK. You&#8217;ll see that the new column contains a raft of code &#8211; JSON &#8211; about each school. This contains all that geographical information &#8211; but we still need to extract an address from that.*</p>
<h2>Step 3: Using Google Refine to extract the address from the Google Maps data</h2>
<p>Create a further column based on the one you&#8217;ve just created by clicking the arrow at the top and selecting <strong>Edit Column &gt; Add Column based on this column</strong></p>
<p>We need to write some more code. This took a bit of trial and error but here&#8217;s what I ended up with:</p>
<pre>value.parseJson().results[0].formatted_address</pre>
<p>&#8216;value&#8217; is the value in the column we&#8217;re basing this new column on. parseJson looks through the JSON code. If you look in it you&#8217;ll see there&#8217;s a bit called &#8216;results&#8217;, and within that a bit called &#8216;formatted address&#8217; which has what we need.</p>
<p>Now we have a new column with the full address &#8211; including postcode.</p>
<h2>Step 4: Using Excel to split the address up again</h2>
<p>We can now export this (<strong>Export</strong> is in the upper right corner of Google Refine) as a spreadsheet and open it up in Excel again.</p>
<p>To split that address column into its parts select it and then select <strong>Data &gt; Text to columns</strong> to split that address into separate items, with postcodes in their own. (There are other ways you could do this, for example extracting the last 5 characters of each cell instead).</p>
<p>Alternatively, you could get the postcode from the JSON directly, with a different line of code at Step 3 (if you work this out let me know) &#8211; or you could extract the lat/long as detailed in the video and use the Postcodes API at <a href="http://www.uk-postcodes.com/api.php" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/api.php?referer=');">http://www.uk-postcodes.com/api.php</a> to get the postcode from that. As always there are various ways to crack the nut.</p>
<p>Note: I haven&#8217;t &#8216;learned&#8217; JSON or GREL or any other language in this &#8211; just done a bit of searching (it took around an hour) to find the code that I needed and adapted it with educated guesswork.</p>
<p>*Problems 1 and 2: not all addresses return results from Google Maps because we haven&#8217;t given it enough detail. Also, there&#8217;s a 2500 limit on &#8216;free&#8217; calls to their API &#8211; and we have 5000+ records, so almost 2000 are returned &#8216;LIMIT EXCEEDED&#8217;. A possible solution to the latter would be to split this into 2 spreadsheets and then merge the results later. A possible solution to the former may be to find &#8211; or create by scraping &#8211; another dataset that has more address information (<a href="http://www.programmableweb.com/mashup/uk-schools" onclick="urchinTracker('/outgoing/www.programmableweb.com/mashup/uk-schools?referer=');">for example this one</a>).</p>
<p>FROM THE COMMENTS: <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/#comment-297261">Chip Oglesby suggests some other workarounds</a>, including doing it all in Refine and using the Yahoo Maps API for half of the calls.</p>
<p>UPDATE: Tony Hirst follows on from this and <a href="http://blog.ouseful.info/2011/04/08/a-first-quick-viz-of-uk-university-fees/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/04/08/a-first-quick-viz-of-uk-university-fees/?referer=');">finds other solutions to some of the problems outlined</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F18%2Fgetting-full-addresses-for-school-data-in-an-foi-response%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Adding geographical information to a spreadsheet based on postcodes – Google Refine and APIs</title>
		<link>http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/</link>
		<comments>http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 08:59:00 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[postcodes]]></category>
		<category><![CDATA[spreadsheets]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=12093</guid>
		<description><![CDATA[If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week &#8211; and this is how I used Google Refine to do that: adding extra columns to [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F12%2F16%2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F12_2F16_2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F12%2F16%2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week &#8211; and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.</p>
<p>You can <a href="http://www.screencast.com/users/paulbradshaw/folders/Jing/media/b2c5c0d1-21ce-40a0-ad7a-1f67bba7d2e1" onclick="urchinTracker('/outgoing/www.screencast.com/users/paulbradshaw/folders/Jing/media/b2c5c0d1-21ce-40a0-ad7a-1f67bba7d2e1?referer=');">watch a video tutorial of this here</a>.<br />
<h3>1. Find a website that gives information based on a postcode</h3>
<p>First, I needed to find an API which would return a page of information on any postcode in JSON&#8230;</p>
<p>If that sounds like double-dutch, don&#8217;t worry, try this instead.</p>
<p><em>Translation</em>: First, I needed either of these websites: <a href="http://www.uk-postcodes.com/" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/?referer=');">http://www.uk-postcodes.com/</a> or <a href="http://mapit.mysociety.org/" onclick="urchinTracker('/outgoing/mapit.mysociety.org/?referer=');">http://mapit.mysociety.org/</a></p>
<p>Both of these will generate a page giving you details about any given postcode. The formatting of these pages is consistent, e.g.
<ul>	
<li><a href="http://www.uk-postcodes.com/postcode/B422SU.json" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/postcode/B422SU.json?referer=');">http://www.uk-postcodes.com/postcode/B422SU.json</a></li>
<p>	
<li><a href="http://mapit.mysociety.org/postcode/B42%202SU" onclick="urchinTracker('/outgoing/mapit.mysociety.org/postcode/B42_202SU?referer=');">http://mapit.mysociety.org/postcode/B42%202SU</a></li>
<p></ul>
<p>(The first removes the space between the two parts of the postcode, and adds .json; the second replaces the space with %20 &#8211; although I&#8217;m <a href="http://twitter.com/dracos/statuses/14442954189967360" onclick="urchinTracker('/outgoing/twitter.com/dracos/statuses/14442954189967360?referer=');">told by Matthew Somerville that it will work with spaces and postcodes without spaces</a>)</p>
<p>This information will be important when we start to use Google Refine&#8230;<br />
<h3>2. Create a new column that has text in the same format as the webpages you want to fetch</h3>
<p>In Google Refine click on the arrow at the top of your postcode column and <a href="http://excelnotes.posterous.com/grel-remove-spaces-from-a-column" onclick="urchinTracker('/outgoing/excelnotes.posterous.com/grel-remove-spaces-from-a-column?referer=');">follow the instructions here</a> to create a new column which has the same postcode information, but with no spaces. To replace the space with %20 instead you would replace the express with<br />
<blockquote>
<pre>value.split(" ").join("%20")</pre>
<p></p></blockquote>
<p>Let&#8217;s name this column &#8216;SpacesRemoved&#8217; and click OK.</p>
<p>Now that we&#8217;ve got postcodes in the same format as the webpages above, we can start to fetch a bunch of code giving us extra information on those postcodes.<br />
<h3>3. Write some code that goes to a webpage and fetches information about each postcode</h3>
<p>In Google Refine click on the arrow at the top of your &#8216;SpacesRemoved&#8217; column and create a new column by selecting <em>&#8216;Edit column&#8217; &gt; &#8216;Add column by fetching URLs&#8230;&#8217;</em></p>
<p>You can <a href="http://code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices?referer=');">read more about this functionality here</a>.</p>
<p>This time you will type the expression:<br />
<blockquote>
<pre>"http://www.uk-postcodes.com/postcode/"+value+".json"</pre>
<p></p></blockquote>
<p>That basically creates a URL that inserts &#8216;value&#8217; (the value in the previous column) where you want it.</p>
<p>Call this column &#8216;JSON for postcode&#8217; and click OK.</p>
<p>Each cell will now be filled with the results of that webpage. This might take a while.<br />
<h3>4. Write some code that pulls out a specific piece of information from that</h3>
<p>In Google Refine click on the arrow at the top of your &#8216;SpacesRemoved&#8217; column and create a new column by selecting <em>&#8216;Edit column&#8217; &gt; &#8216;Add column based on this column&#8230;&#8217;</em></p>
<p>Write the following expression:<br />
<blockquote>
<pre>value.parseJson()["administrative"]["district"]["title"]</pre>
<p></p></blockquote>
<p>Look at the preview as you type this and you&#8217;ll see information become more specific as you add each term in square brackets.</p>
<p>Call this &#8216;Council&#8217; and click OK.</p>
<p>This column will now be populated with the council names for each postcode. You can repeat this process for other information, adapting the expression for different pieces of information such as constituency, easting and northing, and so on.<br />
<h3>5. Export as a standard spreadsheet</h3>
<p>Click <em>Export</em> in the top right corner and save your spreadsheet in the format you prefer. You can then upload this to Google Docs and share it publicly.<br />
<h3>Other possibilities</h3>
<p>Although this post is about postcode data you can use the same principles to add information based on any data that you can find an API for. For example if you had a column of charities you could use the Open Charities API to pull further details (<a href="http://opencharities.org/info/about" onclick="urchinTracker('/outgoing/opencharities.org/info/about?referer=');">http://opencharities.org/info/about</a>). For local authority data you could pull from the OpenlyLocal API (<a href="http://openlylocal.com/info/api" onclick="urchinTracker('/outgoing/openlylocal.com/info/api?referer=');">http://openlylocal.com/info/api</a>).</p>
<p>If you know of other similarly useful APIs let me know. </p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F12%2F16%2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Google Spreadsheets as a database (no, it really is very interesting, honest)</title>
		<link>http://onlinejournalismblog.com/2009/05/19/using-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest/</link>
		<comments>http://onlinejournalismblog.com/2009/05/19/using-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest/#comments</comments>
		<pubDate>Tue, 19 May 2009 15:05:24 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[computer aided reporting]]></category>
		<category><![CDATA[data store]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[Guardian]]></category>
		<category><![CDATA[spreadsheets]]></category>
		<category><![CDATA[tony hirst]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=2708</guid>
		<description><![CDATA[This post by Tony Hirst should be recommended reading for every journalist interested in the potential of computers for reporting. Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users. Put [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F05%2F19%2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2009_2F05_2F19_2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F05%2F19%2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://ouseful.wordpress.com/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/" onclick="urchinTracker('/outgoing/ouseful.wordpress.com/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/?referer=');">This post by Tony Hirst</a> should be recommended reading for every journalist interested in the potential of computers for reporting.</p>
<p>Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users.</p>
<p>Put aside any intimidation you might feel at the mention of APIs and query languages. What it boils down to is this: <strong>you can alter the web address of a Google spreadsheet to filter the data and find the story.</strong></p>
<p>Simple as that. </p>
<p>Hirst uses the example of the <a href="http://spreadsheets.google.com/ccc?key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/ccc?key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">spreadsheet of MPs expenses</a> recently <a href="http://www.guardian.co.uk/news/datablog/2009/may/15/mps-expenses-houseofcommons" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2009/may/15/mps-expenses-houseofcommons?referer=');">released by The Guardian</a> (they&#8217;ve also published <a href="http://www.guardian.co.uk/news/datablog/2009/may/15/lordreform-mps-expenses" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2009/may/15/lordreform-mps-expenses?referer=');">Lords expenses</a>). By altering the URLs this is what he generates (I&#8217;m quoting his bullet points):</p>
<ul>
<li>the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,I%20where%20I=23083&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_I_20where_20I=23083_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,I where I=23083</a> (column I is the additional costs allowance column);</li>
<li>How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20count(I)%20where%20I=23083&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20count_I_20where_20I=23083_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select count(I) where I=23083</a></li>
<li>So which people <em>did not</em> claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,I%20where%20I!=23083&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_I_20where_20I_=23083_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,I where I!=23083</a> (using &lt;&gt; for ‘not equals’ also works); NB here’s a more refined take on that query: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,I%20where%20(I!=23083%20and%20I%3E=0)%20order%20by%20I&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_I_20where_20_I_=23083_20and_20I_3E=0_20order_20by_20I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,I where (I!=23083 and I&gt;=0) order by I</a></li>
<li>search for the name, party (column D) and constituency (column E) of people whose first name is <em>Jane</em> or is recorded as <em>John</em> (rather than “Mr John”, or “Rt Hon John”): <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,D,E%20where%20(C%20contains%20'Joan'%20or%20C%20matches%20'John')&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_D_E_20where_20_C_20contains_20_Joan_20or_20C_20matches_20_John_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)</a></li>
<li>only show the people who have claimed less than £100,000 in total allowances : <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20*%20where%20F%3C100000&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20_20where_20F_3C100000_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select * where F&lt;100000</a></li>
<li>what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20sum(I)&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20sum_I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select sum(I)</a></li>
<li>So how many MPs are there? Count the number of rows in an arbitrary column: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20count(I)&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20count_I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select count(I)</a></li>
<li>Find the average amount claimed by the MPs: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20sum(I)/count(I)&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20sum_I_/count_I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select sum(I)/count(I)</a></li>
<li>Find out how much has been claimed by each party (column D): <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20D,sum(I)%20where%20I%3E=0%20group%20by%20D&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20D_sum_I_20where_20I_3E=0_20group_20by_20D_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select D,sum(I) where I&gt;=0 group by D</a> (Setting I&gt;0 just ensures there is something in the column)</li>
<li>For each party, find out how much (on average) each party member claims:<a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20D_sum_I_/count_I_20where_20I_3E=0_20group_20by_20D_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D”&gt;select D,sum(I)/count(I) where I&gt;=0 group by D</a></li>
</ul>
<p>OK, you need to know the words to use (and <strong>if you have a link to an easy reference for these let me know*</strong>), but this is still a lot easier than using programming languages and databases.</p>
<p>As I say, this also illustrates the importance of publishing raw data so users can interrogate it in their own ways, which is precisely what <a href="http://www.guardian.co.uk/data-store" onclick="urchinTracker('/outgoing/www.guardian.co.uk/data-store?referer=');">The Guardian&#8217;s Data Store</a> has been doing, meaning that people like Tony can <a href="http://ouseful.open.ac.uk/mpExpensesSearch.html" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/mpExpensesSearch.html?referer=');">create interfaces like this</a>.</p>
<p>Wonderful.</p>
<p>*Tony has very generously created <a href="http://ouseful.open.ac.uk/datastore/gspreadsheetdb2.php" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/datastore/gspreadsheetdb2.php?referer=');">this page which helps you formulate your search &#8211; and generates the URL</a>. If you were working on a different spreadsheet you could just replace the spreadsheet URL and change any column references accordingly.</p>
<p>UPDATE: Tony also has <a href="http://ouseful.open.ac.uk/datastore/gspreadsheetdb4.php" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?referer=');">a version which allows you to pick from Guardian datasets</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F05%2F19%2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2009/05/19/using-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>

