<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; data</title>
	<atom:link href="http://onlinejournalismblog.com/tag/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Sat, 11 Feb 2012 12:06:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Social Interest Positioning – Visualising Facebook Friends’Likes With Data Grabbed Using Google Refine</title>
		<link>http://blog.ouseful.info/2012/01/04/social-interest-positioning-visualising-facebook-friends-likes/</link>
		<comments>http://blog.ouseful.info/2012/01/04/social-interest-positioning-visualising-facebook-friends-likes/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 11:06:45 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[analytics]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6729</guid>
		<description><![CDATA[What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here&#8217;s a recipe for visualising that data&#8230; After discovering via Martin Hawksey that the recent (December, 2011) 2.5 release of Google Refine allows [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6729&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here&#8217;s a recipe for visualising that data&#8230;</p>
<p>After discovering <a href="http://mashe.hawksey.info/2012/01/free-and-rebuild-the-tweets-export-twapperkeeper-archives-using-google-refine/" onclick="urchinTracker('/outgoing/mashe.hawksey.info/2012/01/free-and-rebuild-the-tweets-export-twapperkeeper-archives-using-google-refine/?referer=');">via Martin Hawksey</a> that the recent (December, 2011) <a href="http://code.google.com/p/google-refine/wiki/ChangesFor2p5" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/ChangesFor2p5?referer=');">2.5 release of Google Refine</a> allows you to import JSON and XML feeds to bootstrap a new project, I wondered whether it would be able to pull in data from the Facebook API if I was logged in to Facebook (Google Refine does run in the browser after all&#8230;)</p>
<p>Looking through the <a href="http://developers.facebook.com/docs/reference/api/" onclick="urchinTracker('/outgoing/developers.facebook.com/docs/reference/api/?referer=');">Facebook API documentation</a> whilst logged in to Facebook, it&#8217;s easy enough to find exemplar links to things like your friends list (<em>https://graph.facebook.com/<strong>me</strong>/friends?access_token=A_LONG_JUMBLE_OF_LETTERS</em>) or the list of likes someone has made (<em>https://graph.facebook.com/<strong>me</strong>/likes?access_token=A_LONG_JUMBLE_OF_LETTERS</em>); replacing <em>me</em> with the Facebook ID of one of your friends should pull down a list of their friends, or likes, etc.</p>
<p>(Note that validity of the access token is time limited, so you can&#8217;t grab a copy of the access token and hope to use the same one day after day.)</p>
<p>Grabbing the link to your friends on Facebook is simply a case of opening a new project, choosing to <em>get the data from a Web Address</em>, and then pasting in the friends list URL:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6633972841/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6633972841/?referer=');"><img src="http://farm8.staticflickr.com/7025/6633972841_5909bf4a46.jpg" width="500" height="152" alt="Google Refine - import Facebook friends list" /></a></p>
<p>Click on next, and Google Refine will download the data, which you can then parse as a JSON file, and from which you can identify individual record types:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6633985963/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6633985963/?referer=');"><img src="http://farm8.staticflickr.com/7141/6633985963_8669f64d93.jpg" width="500" height="498" alt="Google Refine - import Facebook friends" /></a></p>
<p>If you click the highlighted selection, you should see the data that will be used to create your project:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6633996055/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6633996055/?referer=');"><img src="http://farm8.staticflickr.com/7145/6633996055_c739a3309c.jpg" width="500" height="158" alt="Google Refine - click to view the data" /></a></p>
<p>You can now click on <em>Create Project</em> to start working on the data &#8211; the first thing I do is tidy up the column names:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6634000149/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6634000149/?referer=');"><img src="http://farm8.staticflickr.com/7165/6634000149_d6f09e94d2.jpg" width="500" height="442" alt="Google Refine - rename columns" /></a></p>
<p>We can now work some magic &#8211; such as pulling in the Likes our friends have made. To do this, we need to create the URL for each friend&#8217;s Likes using their Facebook ID, and then pull the data down. We can use Google Refine to harvest this data for us by creating a new column containing the data pulled in from a URL built around the value of each cell in another column:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6634005421/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6634005421/?referer=');"><img src="http://farm8.staticflickr.com/7016/6634005421_54812070ff.jpg" width="500" height="465" alt="Google Refine - new column from URL" /></a></p>
<p>The Likes URL has the form <em>https://graph.facebook.com/<strong>me</strong>/likes?access_token=A_LONG_JUMBLE_OF_LETTERS</em> which we&#8217;ll tinker with as follows:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6634026767/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6634026767/?referer=');"><img src="http://farm8.staticflickr.com/7152/6634026767_972c2bcf71.jpg" width="500" height="388" alt="Google Refine - crafting URLs for new column creation" /></a></p>
<p>The throttle control tells Refine how often to make each call. I set this to 500ms (that is, half a second), so it takes a few minutes to pull in my couple of hundred or so friends (I don&#8217;t use Facebook a lot;-). I&#8217;m not sure what limit the Facebook API is happy with (if you hit it too fast (i.e. set the throttle time too low), you may find the Facebook API stops returning data to you for a cooling down period&#8230;)?</p>
<p>Having imported the data, you should find a new column:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6634040291/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6634040291/?referer=');"><img src="http://farm8.staticflickr.com/7175/6634040291_0c306768b5.jpg" width="500" height="226" alt="Google Refine - new data imported" /></a></p>
<p>At this point, it is possible to generate a new column from each of the records/Likes in the imported data&#8230; in theory (or maybe not..). I found this caused Refine to hang though, so instead I exprted the data using the default <em>Templating&#8230;</em> export format, which produces some sort of JSON output&#8230;</p>
<p>I then used this Python script to generate a two column data file where each row contained a (new) unique identifier for each friend and the name of one of their likes:</p>
<p><pre class="brush: python;">import simplejson,csv

writer=csv.writer(open('fbliketest.csv','wb+'),quoting=csv.QUOTE_ALL)

fn='my-fb-friends-likes.txt'

data = simplejson.load(open(fn,'r'))
id=0
for d in data['rows']:
	id=id+1
	#'interests' is the column name containing the Likes data
	interests=simplejson.loads(d['interests'])
	for i in interests['data']:
		print str(id),i['name'],i['category']
		writer.writerow([str(id),i['name'].encode('ascii','ignore')])
</pre></p>
<p>[I think this R script, in answer to a related @mhawksey Stack Overflow question, also does the trick: <a href="http://stackoverflow.com/questions/9105597/r-building-a-list-from-matching-values-in-a-data-frame" onclick="urchinTracker('/outgoing/stackoverflow.com/questions/9105597/r-building-a-list-from-matching-values-in-a-data-frame?referer=');">R: Building a list from matching values in a data.frame</a>]</p>
<p>I could then import this data into Gephi and use it to generate a network diagram of what they commonly liked:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6629870307/" title="Sketching common likes amongst my facebook friends by psychemedia, on Flickr" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6629870307/?referer=');"><img src="http://farm8.staticflickr.com/7007/6629870307_fc9a948788_z.jpg" width="640" height="444" alt="Sketching common likes amongst my facebook friends"></a></p>
<p>Rather than returning Likes, I could equally have pulled back lists of the movies, music or books they like, their own friends lists (permissions settings allowing), etc etc, and then generated friends&#8217;interest maps on that basis.</p>
<p>[See also: <a href="http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/?referer=');">Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I</a> and <a href="http://blog.ouseful.info/2011/10/16/so-where-am-i-socially-situated-on-google/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/10/16/so-where-am-i-socially-situated-on-google/?referer=');">how to visualise Google+ networks</a>]</p>
<p>PS dropping out of Google Refine and into a Python script is a bit clunky, I have to admit. What would be nice would be to be able to do something like a &#8220;create new rows with new column from column&#8221; pattern that would let you set up an iterator through the contents of each of the cells in the column you want to generate the new column from, and for each pass of the iterator: 1) duplicate the original data row to create a new row; 2) add a new column; 3) populate the cell with the contents of the current iteration state. Or something like that&#8230;</p>
<p>PPS Related to the PS request, there is a sort of related feature in the 2.5 release of Google Refine that lets you merge data from across rows with a common key into a newly shaped data set: <a href="http://code.google.com/p/google-refine/source/detail?r=2356" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/source/detail?r=2356&amp;referer=');">Key/value Columnize</a>. Seeing this, it got me wondering what a fusion of Google Refine and RStudio might be like (or even just R support within Google Refine?)</p>
<p>PPPS <a href="http://translate.google.com/translate?hl=en&amp;sl=fi&amp;u=http://verkostoanatomia.wordpress.com/2012/02/10/kansanedustajat-facebookissa-kuka-on-kenenkin-kaveri/" onclick="urchinTracker('/outgoing/translate.google.com/translate?hl=en_amp_sl=fi_amp_u=http_//verkostoanatomia.wordpress.com/2012/02/10/kansanedustajat-facebookissa-kuka-on-kenenkin-kaveri/&amp;referer=');">this could be interesting</a> &#8211; looks like you can test to see if a friendship exists given two Facebook user IDs.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6729/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6729/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6729/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6729/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6729/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6729/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6729/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6729/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6729/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6729/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6729/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6729/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6729/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6729/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6729/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6729&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/01/04/social-interest-positioning-visualising-facebook-friends-likes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://farm8.staticflickr.com/7007/6629870307_fc9a948788_z.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7175/6634040291_0c306768b5.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7152/6634026767_972c2bcf71.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7016/6634005421_54812070ff.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7165/6634000149_d6f09e94d2.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7145/6633996055_c739a3309c.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7141/6633985963_8669f64d93.jpg" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7025/6633972841_5909bf4a46.jpg" length="" type="" />
		</item>
		<item>
		<title>Mapping the New Year Honours List – Where Did the Honours Go?</title>
		<link>http://blog.ouseful.info/2012/01/02/mapping-the-new-year-honours-list/</link>
		<comments>http://blog.ouseful.info/2012/01/02/mapping-the-new-year-honours-list/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 18:13:22 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[Tinkering]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6698</guid>
		<description><![CDATA[When I get a chance, I&#8217;ll post a (not totally unsympathetic) response to Milo Yiannopoulos&#8217;post The pitiful cult of ‘data journalism’, but in the meantime, here&#8217;s a view over some data that was released a couple of days ago &#8211; a map of where the New Year Honours went [link] [Hmm... so WordPress.com doesn't seem [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6698&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When I get a chance, I&#8217;ll post a (not totally unsympathetic) response to Milo Yiannopoulos&#8217;post <a href="http://www.blottr.com/columnist/onenero/pitiful-cult-data-journalism" onclick="urchinTracker('/outgoing/www.blottr.com/columnist/onenero/pitiful-cult-data-journalism?referer=');">The pitiful cult of ‘data journalism’</a>, but in the meantime, here&#8217;s a view over some data that was released a couple of days ago &#8211; a map of where the New Year Honours went [<a href="https://www.google.com/fusiontables/DataSource?snapid=S350326TBDt" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/DataSource?snapid=S350326TBDt&amp;referer=');">link</a>]</p>
<p><a href="https://www.google.com/fusiontables/DataSource?snapid=S350326TBDt" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/DataSource?snapid=S350326TBDt&amp;referer=');"><img src="http://farm8.staticflickr.com/7148/6621303687_5946960f26.jpg" width="500" height="341" alt="New Year Honours map" /></a></p>
<p>[Hmm... so WordPress.com doesn't seem to want to let me embed a Google Fusion Table map iframe, and Google Maps (which are embeddable) just shows an empty folder when I try to view the Fusion Table KML... (the Fusion Table export KML doesn't seem to include lat/lng data either? Maybe I need to explore some hosting elsewhere this year...]</p>
<p>Note that I wouldn&#8217;t make the claim that this represents an example of data journalism. It&#8217;s a sketch map showing which parts of the country various recipients of honours this time round presumably live. Just by posting the map, I&#8217;m not reporting any particular story. Instead, I&#8217;m trying to find a way of looking at the day to see whether or not there may be any interesting stories that are suggested by viewing the data in this way.</p>
<p>There was a small element of work involved in generating the map view, though&#8230; Working backwards, when I used Google Fusion tables to geocode the locations of the honoured, some of the points were incorrectly located:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6621116429/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6621116429/?referer=');"><img src="http://farm8.staticflickr.com/7018/6621116429_f50ff723b8.jpg" width="500" height="203" alt="Google Fusion Tables - correcting fault geocoding" /></a></p>
<p>(It would be nice to be able to force a locale to the geocoder, maybe telling it to use <em>maps.google.co.uk</em> as the base, rather than (presumably) <em>maps.google.com</em>?)</p>
<p>The approach I took to tidying these was rather clunky, first going into the table view and filtering on the mispositioned locations:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6621124887/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6621124887/?referer=');"><img src="http://farm8.staticflickr.com/7156/6621124887_25fc8b0dfd.jpg" width="500" height="252" alt="Google Fusion Tables - correcting geocoding errors" /></a></p>
<p>Then correcting them:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6621131063/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6621131063/?referer=');"><img src="http://farm8.staticflickr.com/7173/6621131063_0a412e8577.jpg" width="500" height="198" alt="Google Fusion Table, Correct Geocode errors" /></a></p>
<p>What would be really handy would be if Google Fusion Tables let you see a tabular view of data within a particular map view &#8211; so for example, if I could zoom in to the US map and then get a tabular view of the records displayed on that particular local map view&#8230; (If it does already support this and I just missed it, please let me know via the comments..;-)</p>
<p>So how did I get the data into Google Fusion Tables? The <a href="http://www.direct.gov.uk/prod_consum_dg/groups/dg_digitalassets/@dg/@en/documents/digitalasset/dg_200711.pdf" onclick="urchinTracker('/outgoing/www.direct.gov.uk/prod_consum_dg/groups/dg_digitalassets/_dg/_en/documents/digitalasset/dg_200711.pdf?referer=');">original data was posted as a PDF</a> on the DirectGov website (<a href="http://www.direct.gov.uk/en/Nl1/Newsroom/DG_200708" onclick="urchinTracker('/outgoing/www.direct.gov.uk/en/Nl1/Newsroom/DG_200708?referer=');">New Year Honours List 2012 &#8211; in detail</a>)&#8230;:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6621672799/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6621672799/?referer=');"><img src="http://farm8.staticflickr.com/7162/6621672799_cfd59df250.jpg" width="500" height="309" alt="New Year Honours data" /></a></p>
<p>&#8230;so I used Scraperwiki to <a href="https://views.scraperwiki.com/run/pdf-to-html-preview-1/?url=http://www.direct.gov.uk/prod_consum_dg/groups/dg_digitalassets/@dg/@en/documents/digitalasset/dg_200711.pdf" onclick="urchinTracker('/outgoing/views.scraperwiki.com/run/pdf-to-html-preview-1/?url=http_//www.direct.gov.uk/prod_consum_dg/groups/dg_digitalassets/_dg/_en/documents/digitalasset/dg_200711.pdf&amp;referer=');">preview</a> and read through the PDF and extract the honours list data (my scraper is a little clunky and doesnlt pull out 100% of the data, missing the occasional name and contribution details when it&#8217;s split over several lines; but <em>I think</em> it does a reasonable enough job for now, particularly as I am currently more interested in focussing on the possible high level process for extracting and manipulating the data, rather than the correctness of it&#8230;!;-)</p>
<p>Here&#8217;s the scraper (feel free to improve upon it&#8230;.:-): <a href="https://scraperwiki.com/scrapers/new_year_honours_2012/" onclick="urchinTracker('/outgoing/scraperwiki.com/scrapers/new_year_honours_2012/?referer=');">Scraperwiki: New Year Honours 2012</a></p>
<p>I then did a little bit of tweaking in Google Refine, normalising some of the facets and crudely attempting to separate out each person&#8217;s role and the contribution for which the award was made.</p>
<p>For example, in the case of <em>Dr Glenis Carole Basiro DAVEY</em>, given column data of the form &#8220;<em>The Open University, Science Faculty and Health Education and Training Programme, Africa. For services to Higher and Health Education.</em>&#8220;, we can use the following expressions to generate new sub-columns:</p>
<p>- <em>value.match(/.*(For .*)/)[0]</em> to pull out things like &#8220;For services to Higher and Health Education.&#8221;<br />
- <em>value.match(/(.*)For .*/)[0]</em> to pull out things like &#8220;The Open University, Science Faculty and Health Education and Training Programme, Africa.&#8221;</p>
<p>I also ran each person&#8217;s record through Reuters Open Calais service using Google Refine&#8217;s ability to <a href="http://code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices?referer=');">augment data with data from a URL</a> (&#8220;Add column by fetching URLs&#8221;), pulling the data back as JSON. Here&#8217;s the URL format I used (polling once every 500ms in order to stay with the max. 4 calls per limit threshold mandated by the API.)</p>
<p><pre class="brush: jscript;">&quot;http://api.opencalais.com/enlighten/rest/?licenseID=&lt;strong&gt;MY_LICENSE_KEY&lt;/strong&gt;&amp;content=&quot; + escape(value,'url') + &quot;&amp;paramsXML=%3Cc%3Aparams%20xmlns%3Ac%3D%22http%3A%2F%2Fs.opencalais.com%2F1%2Fpred%2F%22%20xmlns%3Ardf%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%3E%20%20%3Cc%3AprocessingDirectives%20c%3AcontentType%3D%22TEXT%2FRAW%22%20c%3AoutputFormat%3D%22Application%2FJSON%22%20%20%3E%20%20%3C%2Fc%3AprocessingDirectives%3E%20%20%3Cc%3AuserDirectives%3E%20%20%3C%2Fc%3AuserDirectives%3E%20%20%3Cc%3AexternalMetadata%3E%20%20%3C%2Fc%3AexternalMetadata%3E%20%20%3C%2Fc%3Aparams%3E&quot;</pre></p>
<p>Unpicking this a little:</p>
<p>- <em>licenseID</em> is set to my license key value<br />
- <em>content</em> is the URL escaped version of the text I wanted to process (in this case, I created a new column from the name column that also pulled in data from a second column (the contribution column). The GREL formula I used to join the columns took the form: <tt>value+', '+cells["contribution"].value</tt>)<br />
- <em>paramsXML</em> is the <a href="http://www.albionresearch.com/misc/urlencode.php" onclick="urchinTracker('/outgoing/www.albionresearch.com/misc/urlencode.php?referer=');">URL encoded version</a> of the following parameters, which set the content encoding for the result to be JSON (the default is XML):</p>
<p><pre class="brush: xml;">&lt;c:params xmlns:c=&quot;http://s.opencalais.com/1/pred/&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
&lt;c:processingDirectives c:contentType=&quot;TEXT/RAW&quot; c:outputFormat=&quot;Application/JSON&quot;  &gt;
&lt;/c:processingDirectives&gt;
&lt;c:userDirectives&gt;
&lt;/c:userDirectives&gt;
&lt;c:externalMetadata&gt;
&lt;/c:externalMetadata&gt;
&lt;/c:params&gt;</pre></p>
<p>So much for process &#8211; now where are the stories? That&#8217;s left, for now, as an exercise for the reader. An obvious starting point is just to see who received honours in your locale. Remember, Google Fusion Tables lets you generate all sorts of filtered views, so it&#8217;s not too hard to map where the MBEs vs OBEs are based, for example, or have a stab at where awards relating to services to Higher Education went. Some awards also have a high correspondence with a particular location, as for example in the case of Enfield&#8230;</p>
<p>If you do generate any interesting views from the <a href="https://www.google.com/fusiontables/DataSource?dsrcid=2539311" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/DataSource?dsrcid=2539311&amp;referer=');">New Year Honours 2012 Fusion Table</a>, please post a link in the comments. And if you find a problem with/fix for the data or the scraper, please post that info in a comment too:-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6698/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6698/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6698/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6698/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6698/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6698/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6698/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6698/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6698/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6698/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6698/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6698/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6698/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6698/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6698/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6698&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/01/02/mapping-the-new-year-honours-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7148/6621303687_5946960f26.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7018/6621116429_f50ff723b8.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7156/6621124887_25fc8b0dfd.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7173/6621131063_0a412e8577.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7162/6621672799_cfd59df250.jpg" length="" type="" />
		</item>
		<item>
		<title>More Dabblings With Local Sentencing Data</title>
		<link>http://blog.ouseful.info/2011/12/01/more-dabblings-with-local-sentencing-data/</link>
		<comments>http://blog.ouseful.info/2011/12/01/more-dabblings-with-local-sentencing-data/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 16:39:19 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[Rstats]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6547</guid>
		<description><![CDATA[In Accessing and Visualising Sentencing Data for Local Courts I posted a couple of quick ways in to playing with Ministry of Justice sentencing data for the period July 2010-June 2011 at the local court level. At the end of the post, I wondered about how to wrangle the data in R so that I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6547&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://blog.ouseful.info/2011/11/29/accessing-and-visualising-sentencing-data-for-local-courts/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/11/29/accessing-and-visualising-sentencing-data-for-local-courts/?referer=');">Accessing and Visualising Sentencing Data for Local Courts</a> I posted a couple of quick ways in to playing with Ministry of Justice sentencing data for the period July 2010-June 2011 at the local court level. At the end of the post, I wondered about how to wrangle the data in R so that I could look at percentage-wise comparisons between different factors (Age, gender) and offence type and mentioned that I&#8217;d posted a related question to to the Cross Validated/Stats Exchange site (<a>Casting multidimensional data in R into a data frame</a>).</p>
<p>Courtesy of <a href="http://stats.stackexchange.com/users/696/chase" onclick="urchinTracker('/outgoing/stats.stackexchange.com/users/696/chase?referer=');">Chase</a>, I have <a href="http://stats.stackexchange.com/a/19133/5189" onclick="urchinTracker('/outgoing/stats.stackexchange.com/a/19133/5189?referer=');">an answer</a>:-) So let&#8217;s see how it plays out&#8230;</p>
<p>To start, let&#8217;s just load the Isle of Wight court sentencing data into RStudio:</p>
<p><tt>require(ggplot2)<br />
require(reshape2)<br />
iw = read.csv("http://dl.dropbox.com/u/1156404/wightCrimRecords.csv")</tt></p>
<p>Now we&#8217;re going to shape the data so that we can plot the percentage of each offence type by gender (limited to Male and Female options):</p>
<p><tt>iw.m = melt(iw, id.vars = "sex", measure.vars = "Offence_type")<br />
iw.sex = ddply(iw.m, "sex", function(x) as.data.frame(prop.table(table(x$value))))<br />
ggplot(subset(iw.sex,sex=='Female'|sex=='Male')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~sex)+ opts(axis.text.x=theme_text(angle=-90)) + xlab('Offence Type')</tt></p>
<p>Here&#8217;s the result:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6436665317/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6436665317/?referer=');"><img src="http://farm8.staticflickr.com/7030/6436665317_3609e244a0.jpg" width="500" height="369" alt="Splitting down offences by percentage and gender" /></a></p>
<p>We can also process the data over a couple of variables. So for example, we can look to see how female recorded sentences break down by offence type and age range, displaying the results as a percentage of how often each offence type on its own was recorded by age:</p>
<p><tt>iw.m2 = melt(iw, id.vars = c("sex","Offence_type" ), measure.vars = "AGE")<br />
iw.off=ddply(iw.m2, c("sex","Offence_type"), function(x) as.data.frame(prop.table(table(x$value))))</p>
<p>ggplot(subset(iw.off,sex=='Female')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~Offence_type) + opts(axis.text.x=theme_text(angle=-90)) + xlab('Age Range (Female)')</tt></p>
<p><a href="http://www.flickr.com/photos/psychemedia/6436716301/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6436716301/?referer=');"><img src="http://farm8.staticflickr.com/7142/6436716301_3434e5dcd1.jpg" width="500" height="364" alt="Offence type broken down by age and gender" /></a></p>
<p>Note that this graphic may actually be a little misleading because percentage based reports donlt play well with small numbers&#8230;: whilst there are multiple Driving Offences recorded, there are only two Burglaries, so the statistical distribution of convicted female burglars is based over a population of size two&#8230; A count would be a better way of showing this</p>
<p>PS I was hoping to be able to just transmute the variables and generate a raft of other charts, but I seem to be getting an error, maybe because some rows are missing? So: anyone know where I&#8217;m supposed to post R library bug reports?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6547/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6547/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6547/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6547/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6547/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6547/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6547/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6547/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6547/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6547/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6547/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6547/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6547/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6547/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6547/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6547&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/12/01/more-dabblings-with-local-sentencing-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://farm8.staticflickr.com/7030/6436665317_3609e244a0.jpg" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7142/6436716301_3434e5dcd1.jpg" length="" type="" />
		</item>
		<item>
		<title>Accessing and Visualising Sentencing Data for Local Courts</title>
		<link>http://blog.ouseful.info/2011/11/29/accessing-and-visualising-sentencing-data-for-local-courts/</link>
		<comments>http://blog.ouseful.info/2011/11/29/accessing-and-visualising-sentencing-data-for-local-courts/#comments</comments>
		<pubDate>Tue, 29 Nov 2011 13:20:22 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[DDJ]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[opendata]]></category>
		<category><![CDATA[policy]]></category>
		<category><![CDATA[Rstats]]></category>
		<category><![CDATA[Uncourse]]></category>
		<category><![CDATA[ventnorblog]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6513</guid>
		<description><![CDATA[A recent provisional data release from the Ministry of Justice contains sentencing data from English(?) courts, at the offence level, for the period July 2010-June 2011: &#8220;Published for the first time every sentence handed down at each court in the country between July 2010 and June 2011, along with the age and ethnicity of each [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6513&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A recent provisional data release from the Ministry of Justice contains sentencing data from English(?) courts, at the offence level, for the period July 2010-June 2011: &#8220;Published for the first time every sentence handed down at each court in the country between July 2010 and June 2011, along with the age and ethnicity of each offender.&#8221; <a href="http://www.justice.gov.uk/publications/statistics-and-data/criminal-justice/criminal-justice-statistics.htm" onclick="urchinTracker('/outgoing/www.justice.gov.uk/publications/statistics-and-data/criminal-justice/criminal-justice-statistics.htm?referer=');">Criminal Justice Statistics in England and Wales</a> [<a href="http://www.justice.gov.uk/downloads/publications/statistics-and-data/criminal-justice-stats/recordlevel.zip" onclick="urchinTracker('/outgoing/www.justice.gov.uk/downloads/publications/statistics-and-data/criminal-justice-stats/recordlevel.zip?referer=');">data</a>]</p>
<p>In this post, I&#8217;ll describe a couple of ways of working with the data to produce some simple graphical summaries of the data using Google Fusion Tables and R&#8230;</p>
<p>&#8230;but first, a couple of observations:</p>
<p>- the web page subheading is &#8220;Quarterly update of statistics on criminal offences dealt with by the criminal justice system in England and Wales.&#8221;, but the sidebar includes the link to the 12 month set of sentencing data;<br />
- the URL of the sentencing data is <em>http://www.justice.gov.uk/downloads/publications/statistics-and-data/criminal-justice-stats/recordlevel.zip</em>, which does not contain a time reference, although the data is time bound. What URL will be used if data for the period 7/11-6/12 is released in the same way next year?</p>
<p>The data is presented as a zipped CSV file, 5.4MB in the zipped form, and 134.1MB in the unzipped form.</p>
<p>The unzipped CSV file is too large to upload to a Google Spreadsheet or a Google Fusion Table, which are two of the tools I use for treating large CSV files as a database, so here are a couple of ways of getting in to the data using tools I have to hand&#8230;</p>
<h3>Unix Command Line Tools</h3>
<p>I&#8217;m on a Mac, so like Linux users I have ready access to a Console and several common unix commandline tools that are ideally suited to wrangling text files (on Windows, I suspect you need to install something like <a href="http://www.cygwin.com/" onclick="urchinTracker('/outgoing/www.cygwin.com/?referer=');">Cygwin</a>; a search for <em>windows unix utilities</em> should turn up other alternatives too).</p>
<p>In <a href="http://blog.ouseful.info/2011/06/04/playing-with-large-ish-csv-files-and-using-them-as-a-database-edina-openurl-logs/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/06/04/playing-with-large-ish-csv-files-and-using-them-as-a-database-edina-openurl-logs/?referer=');">Playing With Large (ish) CSV Files, and Using Them as a Database from the Command Line: EDINA OpenURL Logs</a> and <a href="http://blog.ouseful.info/2011/06/03/postcards-from-a-text-processing-excursion/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/06/03/postcards-from-a-text-processing-excursion/?referer=');">Postcards from a Text Processing Excursion</a> I give a couple of examples of how to get started with some of the Unix utilities, which we can crib from in this case. So for example, after unzipping the <em>recordlevel.csv</em> document I can look at the first 10 rows by opening a console window, changing directory to the directory the file is in, and running the following command:</p>
<p><tt>head recordlevel.csv</tt></p>
<p>Or I can pull out rows that contain a reference to the Isle of Wight using something like this command:</p>
<p><tt>grep -i wight recordlevel.csv &gt; recordsContainingWight.csv</tt></p>
<p>(The <em>-i</em> reads: &#8220;ignoring case&#8221;; <em>grep</em> is a command that identifies rows contain the search term (<em>wight</em> in this case). The <em>&gt; recordsContainingWight.csv</em> says &#8220;send the result to the file <em>recordsContainingWight.csv</em>&#8221; )</p>
<p>Having extracted rows that contain a reference to the Isle of Wight into a new file, I can upload this smaller file to a Google Spreadsheet, or as Google Fusion Table such as this one: <a href="https://www.google.com/fusiontables/DataSource?docid=1EXsGx9xbd0MoHqVli2VDm_rCXGhDND06_ann2-g" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/DataSource?docid=1EXsGx9xbd0MoHqVli2VDm_rCXGhDND06_ann2-g&amp;referer=');">Isle of Wight Sentencing Fusion table</a>.</p>
<p><a href="https://www.google.com/fusiontables/DataSource?docid=1EXsGx9xbd0MoHqVli2VDm_rCXGhDND06_ann2-g" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/DataSource?docid=1EXsGx9xbd0MoHqVli2VDm_rCXGhDND06_ann2-g&amp;referer=');"><img src="http://farm8.staticflickr.com/7155/6424288071_f2f97ef62a.jpg" width="500" height="173" alt="Isle fo wight sentencing data" /></a></p>
<p>Once in the fusion table, we can start to explore the data. So for example, we can aggregate the data around different values in a given column and then visualise the result (aggregate and filter options are available from the View menu; visualisation types are available from the Visualize menu):</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424302465/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424302465/?referer=');"><img src="http://farm8.staticflickr.com/7010/6424302465_3c6789b102.jpg" width="500" height="367" alt="Visualising data in google fusion tables" /></a></p>
<p>We can also introduce filters to allow use to explore subsets of the data. For example, here are the offences committed by females aged 35+:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424314293/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424314293/?referer=');"><img src="http://farm8.staticflickr.com/7026/6424314293_5aeb7b051c.jpg" width="500" height="365" alt="Data exploration in Google FUsion tables" /></a></p>
<p>Looking at data from a single court may be of passing local interest, but the real data journalism is more likely to be focussed around finding mismatches between sentencing behaviour across different courts. (Hmm, unless we can get data on who passed sentences at a local level, and look to see if there are differences there?) That said, at a local level we could try to look for outliers maybe? As far as making comparisons go, we do have Court and Force columns, so it would be possible to compare Force against force and within a Force area, Court with Court?</p>
<h3>R/RStudio</h3>
<p>If you really want to start working the data, then R may be the way to go&#8230; I use <a href="http://rstudio.org" onclick="urchinTracker('/outgoing/rstudio.org?referer=');">RStudio</a> to work with R, so it&#8217;s a simple matter to just import the whole of the reportlevel.csv dataset.</p>
<p>Once the data is loaded in, I can use a regular expression to pull out the subset of the data corresponding once again to sentencing on the Isle of Wight (i apply the regular expression to the contents of the <em>court</em> column:</p>
<p><tt>recordlevel &lt;- read.csv(&quot;~/data/recordlevel.csv&quot;)<br />
iw=subset(recordlevel,grepl(&quot;wight&quot;,court,ignore.case=TRUE))</tt></p>
<p>We can then start to produce simple statistical charts based on the data. For example, a bar plot of the sentencing numbers by age group:</p>
<p><tt>age=table(iw$AGE)<br />
barplot(age, main="IW: Sentencing by Age", xlab="Age Range")</tt></p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424364545/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424364545/?referer=');"><img src="http://farm8.staticflickr.com/7006/6424364545_8e191c961b.jpg" width="492" height="323" alt="R - bar plot" /></a></p>
<p>We can also start to look at combinations of factors. For example, how do offence types vary with age?</p>
<p><tt>ageOffence=table(iw$AGE, iw$Offence_type)<br />
barplot(ageOffence,beside=T,las=3,cex.names=0.5,main="Isle of Wight Sentences", xlab=NULL, legend = rownames(ageOffence))</tt></p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424509085/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424509085/?referer=');"><img src="http://farm8.staticflickr.com/7167/6424509085_5aff09c8cc.jpg" width="500" height="371" alt="R barplot - offences on IW" /></a></p>
<p>If we remove the <em>beside=T</em> argument, we can produce a stacked bar chart:</p>
<p><tt>barplot(ageOffence,las=3,cex.names=0.5,main="Isle of Wight Sentences", xlab=NULL, legend = rownames(ageOffence))</tt></p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424528771/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424528771/?referer=');"><img src="http://farm7.staticflickr.com/6051/6424528771_d230e757a9.jpg" width="488" height="366" alt="R - stacked bar chart" /></a></p>
<p>If we import the <em>ggplot2</em> library, we have even more flexibility over the presentation of the graph, as well as what we can do with this sort of chart type. So for example, here&#8217;s a simple plot of the number of offences per offence type:</p>
<p><tt>require(ggplot2)<br />
#You may need to install ggplot2 as a library if it isn't already installed<br />
ggplot(iw, aes(factor(Offence_type)))+ geom_bar() + opts(axis.text.x=theme_text(angle=-90))+xlab('Offence Type')</tt></p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424669473/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424669473/?referer=');"><img src="http://farm8.staticflickr.com/7167/6424669473_626b1894b3.jpg" width="500" height="363" alt="GGPlot2 in R" /></a></p>
<p>Alternatively, we can break down offence types by age:</p>
<p><tt>ggplot(iw, aes(AGE))+ geom_bar() +facet_wrap(~Offence_type)</tt></p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424691165/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424691165/?referer=');"><img src="http://farm8.staticflickr.com/7015/6424691165_fec30ffec8.jpg" width="500" height="365" alt="ggplot facet barplot" /></a></p>
<p>We can bring a bit of colour into a stacked plot that also displays the gender split on each offence:</p>
<p><tt>ggplot(iw, aes(AGE,fill=sex))+geom_bar() +facet_wrap(~Offence_type)</tt></p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424704421/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424704421/?referer=');"><img src="http://farm8.staticflickr.com/7010/6424704421_8b2955f049.jpg" width="500" height="357" alt="ggplot with stacked factor" /></a></p>
<p>One thing I&#8217;m not sure how to do is rip the data apart in a ggplot context so that we can display percentage breakdowns, so we could compare the percentage breakdown by offence type on sentences awarded to males vs. females, for example? If you do know how to do that, please post a comment below ;-)</p>
<p>PS HEre&#8217;s an easy way of getting started with ggplot&#8230; use the online hosted version at <a href="http://www.yeroon.net/ggplot2/" onclick="urchinTracker('/outgoing/www.yeroon.net/ggplot2/?referer=');">http://www.yeroon.net/ggplot2/</a> using this data set: <a href="http://dl.dropbox.com/u/1156404/wightCrimRecords.csv" onclick="urchinTracker('/outgoing/dl.dropbox.com/u/1156404/wightCrimRecords.csv?referer=');">wightCrimRecords.csv</a>; download the file to your computer then upload it as shown below:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6424754043/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6424754043/?referer=');"><img src="http://farm8.staticflickr.com/7030/6424754043_6a35dce4c5.jpg" width="500" height="377" alt="yeroon.net/ggplot2" /></a></p>
<p>PPS I got a little way towards identifying percentage breakdowns using a crib from <a href="http://blog.ouseful.info/2011/11/29/accessing-and-visualising-sentencing-data-for-local-courts/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/11/29/accessing-and-visualising-sentencing-data-for-local-courts/?referer=');">here</a>. The following command:<br />
<tt>iwp=tapply(iw$Offence_type,iw$sex,function(x){prop.table(table(x))})</tt><br />
generates a (multidimensional) array for the responseVar (Offence) about the groupVar (sex). I don&#8217;t know how to generate a single data frame from this, but we can create separate ones for each sex as follows:<br />
<tt>iwpMale=data.frame(iwp['Male'])<br />
iwpFemale=data.frame(iwp['Female'])</tt><br />
We can then plot these percentages using constructions of the form:<br />
<tt>ggplot(iwp2)+geom_bar(aes(x=Male.x,y=Male.Freq))</tt><br />
What I haven&#8217;t worked out how to do is elegantly map from the multidimensional array to a single data.frame? If you know how, please add a comment below&#8230;(I also <a href="http://stats.stackexchange.com/questions/19110/casting-multidimensional-data-in-r-into-a-dataframe" onclick="urchinTracker('/outgoing/stats.stackexchange.com/questions/19110/casting-multidimensional-data-in-r-into-a-dataframe?referer=');">posted a question on Cross Validated</a>, the stats bit of Stack Exchange&#8230;)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6513/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6513/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6513/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6513/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6513/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6513/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6513/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6513/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6513/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6513/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6513/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6513/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6513/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6513/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6513/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6513&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/11/29/accessing-and-visualising-sentencing-data-for-local-courts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://farm8.staticflickr.com/7026/6424314293_5aeb7b051c.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7006/6424364545_8e191c961b.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7167/6424509085_5aff09c8cc.jpg" length="" type="" />
<enclosure url="http://farm7.staticflickr.com/6051/6424528771_d230e757a9.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7167/6424669473_626b1894b3.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7015/6424691165_fec30ffec8.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7010/6424704421_8b2955f049.jpg" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7155/6424288071_f2f97ef62a.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7030/6424754043_6a35dce4c5.jpg" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7010/6424302465_3c6789b102.jpg" length="" type="" />
		</item>
		<item>
		<title>Sports Data Journalism and “Datatainment”</title>
		<link>http://blog.ouseful.info/2011/11/04/sports-data-journalism-and-datatainment/</link>
		<comments>http://blog.ouseful.info/2011/11/04/sports-data-journalism-and-datatainment/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 12:25:07 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6389</guid>
		<description><![CDATA[Over the last couple of years, you&#8217;ve probably noticed that data has become a Big Thing in commerce (Big Data for business advantage) as well as in the openness/transparency community, with governments and the media joining the party particularly in the context of the latter. But if you&#8217;re looking to develop data journalism skills, it&#8217;s [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6389&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Over the last couple of years, you&#8217;ve probably noticed that <em>data</em> has become a Big Thing in commerce (<a href="http://blog.ouseful.info/2008/11/06/the-tesco-data-business-notes-on-scoring-points/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2008/11/06/the-tesco-data-business-notes-on-scoring-points/?referer=');">Big Data for business advantage</a>) as well as in the openness/transparency community, with governments and the media joining the party particularly in the context of the latter. But if you&#8217;re looking to develop data journalism skills, it&#8217;s probably also worth remembering the area of sports journalism, and the wealth of data produced around sporting events.</p>
<p>Part of the attraction of developing learning activities around sports data is that there&#8217;s a good chance that it&#8217;ll keep on delivering&#8230; If you develop a way of analysing or displaying sports data that pulls out interesting features or story elements from a set of sports data, you should be able to keep on using it&#8230; To set the scene, here&#8217;s a example: <a href="http://drivenbydata.wordpress.com/2011/03/25/data-journalism-in-sports/" onclick="urchinTracker('/outgoing/drivenbydata.wordpress.com/2011/03/25/data-journalism-in-sports/?referer=');">Driven By Data: Data Journalism in Sports</a>. For a peek at my own fumblings, I&#8217;ve started exploring the automatic creation of <a href="http://f1datajunkie.posterous.com" onclick="urchinTracker('/outgoing/f1datajunkie.posterous.com?referer=');">F1DataJunkie Stats Graphics reports</a> (still a lot to be done, but it&#8217;s a start&#8230;)</p>
<p>In the extreme case, you might be able to generate story outlines, or even canned prose&#8230; For example, in certain computer games in the sports genre, you might find you&#8217;re playing a game along to a &#8220;live commentary&#8221;, generated from the data being produced by the game. Automatic commentary generation is a form of sports journalism. And automated article generation is already here, as @RobbieAllen describes in  <a href="http://radar.oreilly.com/2011/11/automated-writing-software.html" onclick="urchinTracker('/outgoing/radar.oreilly.com/2011/11/automated-writing-software.html?referer=');">How I automated my writing career</a>, a brief overview of <A href="http://automatedinsights.com/" onclick="urchinTracker('/outgoing/automatedinsights.com/?referer=');">Automated Insights</a>, a company that specialises in computer generated visualisations and prose.</p>
<p>See also: <a href="http://www.springerlink.com/content/07t047477289q577/" onclick="urchinTracker('/outgoing/www.springerlink.com/content/07t047477289q577/?referer=');">Automated Storytelling in Sports: A Rich Domain to Be Explored</a>, <a href="http://www.cs.york.ac.uk/gidy/articles/AISB2009zheng.pdf" onclick="urchinTracker('/outgoing/www.cs.york.ac.uk/gidy/articles/AISB2009zheng.pdf?referer=');">Automated Event Recognition for Football Commentary Generation</a>, <a href="http://www.cs.gmu.edu/~sean/papers/commentator.pdf" onclick="urchinTracker('/outgoing/www.cs.gmu.edu/_sean/papers/commentator.pdf?referer=');">Three RoboCup Simulation League Commentator Systems</a>, and so on&#8230;</p>
<p>Getting hold of data is always an issue, of course, but I suspect that many larger newsrooms will take a subscription to the <a href="http://www.pressassociation.com/sport/data.html" onclick="urchinTracker('/outgoing/www.pressassociation.com/sport/data.html?referer=');">Press Association sports data</a> feeds, for example&#8230;</p>
<p>Anyway, as an exercise, here&#8217;s some data to start with, from the Guardian datastore: <a href="http://www.guardian.co.uk/news/datablog/2011/nov/02/premier-league-top-goals-scorers" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/nov/02/premier-league-top-goals-scorers?referer=');">Premier League&#8217;s top scorers: who is scoring the most goals?</a> Is there a correlation with age, perhaps? (Where would you find the age data&#8230;?)</p>
<p>As well as sports reporting, I think we&#8217;re also likely to see an increase in what Head of Digital at Manchester City FC, Richard Ayers, referes to as <em>datatainment</em>: &#8220;where you use data as the primary source of entertainment. You might choose to make the visualisation of raw data entertaining or perhaps use data visualisation as part of the process of entertainment – but there’s definitely a strong editorial control which is focussed on entertaining the audience rather than exposing data.&#8221; (<a href="http://scaryredhair.wordpress.com/2011/08/17/data-entertainment-you-need-datatainment/" onclick="urchinTracker('/outgoing/scaryredhair.wordpress.com/2011/08/17/data-entertainment-you-need-datatainment/?referer=');">Data? Entertainment? You need Datatainment</a> and <a href="http://scaryredhair.wordpress.com/2011/08/21/defining-data-visualisation-data-journalism-data-entertainment/" onclick="urchinTracker('/outgoing/scaryredhair.wordpress.com/2011/08/21/defining-data-visualisation-data-journalism-data-entertainment/?referer=');">Defining Data Visualisation, Data Journalism &amp; Data Entertainment</a>).</p>
<p>Devices such as <a href="http://www.fanvision.com/" onclick="urchinTracker('/outgoing/www.fanvision.com/?referer=');">FanVision</a> already blend video and audio streams with data feeds, for example, more and more sports have &#8220;live stats apps&#8221; associated with them, and it&#8217;s not hard to imagine the data crunching that goes on under the hood in things like <a href="http://www.sportav.co.uk/video_analysis_optiplay.html" onclick="urchinTracker('/outgoing/www.sportav.co.uk/video_analysis_optiplay.html?referer=');">Optiplay</a> making an appearance on sports analysis and review sites?</p>
<p>I also think that the &#8220;data as entertainment&#8221; line might work well as a second screen activity. Things like the F1 Live Timing app already demonstrate this:</p>
<span style="text-align:center; display: block;"><a href="http://blog.ouseful.info/2011/11/04/sports-data-journalism-and-datatainment/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/11/04/sports-data-journalism-and-datatainment/?referer=');"><img src="http://img.youtube.com/vi/wqRySs0Uo2U/2.jpg" alt="" /></a></span>
<p>On the other hand, there&#8217;s an opportunity for data focussed sites that go into deep analysis for the hardcore fan. Again looking at Formula One, the <a href="http://intelligentf1.wordpress.com/about/" onclick="urchinTracker('/outgoing/intelligentf1.wordpress.com/about/?referer=');">Intelligent F1</a> blog features a data-powered model developed by <a href="http://intelligentf1.wordpress.com/the-inventor/" onclick="urchinTracker('/outgoing/intelligentf1.wordpress.com/the-inventor/?referer=');">a rocket scientist</a> that provides engagment oaround a particular race over an extended period, from predicting Sunday race behaviour based on Friday practice data and previous outings, through analysis of practice and qualifying data, to a detailed series of post-race analyses. (Complement this with technical analyses applied to the cars on the <a href="http://scarbsf1.wordpress.com/" onclick="urchinTracker('/outgoing/scarbsf1.wordpress.com/?referer=');">Scarbs F1</a>, and you have the ultimate F1 geeks paradise!;-)</p>
<p>PS This also caught my eye: <a href="http://www.slideshare.net/lizrutledge/thesis-midterm-presentation-9802146" onclick="urchinTracker('/outgoing/www.slideshare.net/lizrutledge/thesis-midterm-presentation-9802146?referer=');">Gametime [Assistant]: Girls&#8217; Lacrosse Game Data</a>, which steps through the design of a &#8220;datatainment&#8221; app&#8230;</p>
<p>PPS as the Lacrosse app suggests, the data collection thing can also improve engagement with a live event. For example, my own doodlings around a motorsport lapcharting app (<a href="http://blog.ouseful.info/2011/04/30/thoughts-on-a-couple-of-lap-charting-apps/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/04/30/thoughts-on-a-couple-of-lap-charting-apps/?referer=');">Thoughts on a Couple of Possible Lap Charting Apps</a>, <a href="https://gist.github.com/1013385" onclick="urchinTracker('/outgoing/gist.github.com/1013385?referer=');">initial code experiment</a>)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6389/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6389/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6389/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6389/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6389/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6389/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6389/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6389/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6389/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6389/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6389/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6389/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6389/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6389/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6389/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6389/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6389&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/11/04/sports-data-journalism-and-datatainment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>How Might Data Journalists Show Their Working? Sweave</title>
		<link>http://blog.ouseful.info/2011/11/01/how-might-data-journalists-show-their-working-sweave/</link>
		<comments>http://blog.ouseful.info/2011/11/01/how-might-data-journalists-show-their-working-sweave/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 11:04:27 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[DDJ]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[Rstats]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6378</guid>
		<description><![CDATA[If part of the role of data journalism is to make transparent the justification behind claims that are, or aren&#8217;t, backed up by data, there&#8217;s good reason to suppose that the journalists should be able to back up their own data-based claims with evidence about how they made use of the data. Posting links to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6378&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If part of the role of data journalism is to make transparent the justification behind claims that are, or aren&#8217;t, backed up by data, there&#8217;s good reason to suppose that the journalists should be able to back up their own data-based claims with evidence about how they made use of the data. Posting links to raw data helps to a certain extent &#8211; at least third parties can then explore the data themselves and check the claims the press are making &#8211; but you could also argue that the journalists should also make their notes available regarding how they worked the data. (The same is true in public reports, where summary statistics and charts are included in a report, along with a link to the raw data, but no transparency in how the summary reports/charts were actually produced from the data.)</p>
<p>In <a href="http://blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/?referer=');">Power Tools for Aspiring Data Journalists: R</a>, I explored how we might use the R statistical programming language to replicate a chart that appeared in one of Ben Goldacre&#8217;s Bad Science columns. I included code snippets in the post, along with the figures they generated. But is there a way of getting even closer to the source, as it were, and produce documents that essentially generate their output from some sort of &#8220;source code&#8221;?</p>
<p>For example, take this <a href="http://www.slideshare.net/psychemedia/example-sweavefunnelplot" onclick="urchinTracker('/outgoing/www.slideshare.net/psychemedia/example-sweavefunnelplot?referer=');">view of my working</a> relating to the production of the funnel chart described in Goldacre&#8217;s column:</p>
<iframe src='http://www.slideshare.net/slideshow/embed_code/9974663' width='700' height='574'></iframe>
<p>You can find the actual &#8220;source code&#8221; for that document here: <a href="https://gist.github.com/1330309" onclick="urchinTracker('/outgoing/gist.github.com/1330309?referer=');">bowel cancer funnel plot working notes</a> If you load it into something like <a href="http://rstudio.org" onclick="urchinTracker('/outgoing/rstudio.org?referer=');">RStudio</a>, you can &#8220;run&#8221; the code and generate your own PDF from it.</p>
<p>The &#8220;source&#8221; of the document includes both text and R code. When the Sweave document is processed, the R code contained within the document is executed and the results also included in the document. The charts shown in the report are generated directly from the code included in the document, using data pulled in to the document form a source referenced within the document. If the source data is changed, or the R code is changed, what&#8217;s contained in the output document will change as well.</p>
<p>This sort of workflow will be familiar to many experimental scientists, but I wonder: is it something that data journalists have considered, at least as a way of keeping working notes about data related projects they are working on?</p>
<p>PS as well as Sweave, see <a href="http://dexy.it" onclick="urchinTracker('/outgoing/dexy.it?referer=');">dexy.it</a>, which generalises the Sweave approach to allow you to create self-documenting software/code. Educators, also take note&#8230;;-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6378/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6378/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6378/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6378/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6378/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6378/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6378/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6378/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6378/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6378/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6378/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6378/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6378/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6378/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6378/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6378&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/11/01/how-might-data-journalists-show-their-working-sweave/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Power Tools for Aspiring Data Journalists: Funnel Plots in R</title>
		<link>http://blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/</link>
		<comments>http://blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/#comments</comments>
		<pubDate>Mon, 31 Oct 2011 13:32:50 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[Rstats]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6361</guid>
		<description><![CDATA[Picking up on Paul Bradshaw&#8217;s post A quick exercise for aspiring data journalists which hints at how you can use Google Spreadsheets to grab &#8211; and explore &#8211; a mortality dataset highlighted by Ben Goldacre in DIY statistical analysis: experience the thrill of touching real data, I thought I&#8217;d describe a quick way of analysing [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6361&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Picking up on Paul Bradshaw&#8217;s post <a href="http://onlinejournalismblog.com/2011/10/31/a-quick-exercise-for-aspiring-data-journalists/">A quick exercise for aspiring data journalists</a> which hints at how you can use Google Spreadsheets to grab &#8211; and explore &#8211; a mortality dataset highlighted by Ben Goldacre in <a href="http://www.guardian.co.uk/commentisfree/2011/oct/28/bad-science-diy-data-analysis" onclick="urchinTracker('/outgoing/www.guardian.co.uk/commentisfree/2011/oct/28/bad-science-diy-data-analysis?referer=');">DIY statistical analysis: experience the thrill of touching real data</a>, I thought I&#8217;d describe a quick way of analysing the data using R, a very powerful statistical programming environment that should probably be part of your toolkit if you ever want to get round to doing some serious stats, and have a go at reproducing the analysis using a bit of judicious websearching and some cut-and-paste action&#8230;</p>
<p>R is an open-source, cross-platform environment that allows you to do programming like things with stats, as well as producing a wide range of graphical statistics (stats visualisations) as if by magic. (Which is to say, it can be terrifying to try to get your head round&#8230; but once you&#8217;ve grasped a few key concepts, it becomes a really powerful tool&#8230; At least, that&#8217;s what I&#8217;m hoping as I struggle to learn how to use it myself!) </p>
<p>I&#8217;ve been using <a href="http://rstudio.org" onclick="urchinTracker('/outgoing/rstudio.org?referer=');">R-Studio</a> to work with R, a) because it&#8217;s free and works cross-platform, b) it can be run as a service and accessed via the web (though I haven&#8217;t tried that yet; the hosted option still hasn&#8217;t appeared yet, either&#8230;), and c) it offers a structured environment for managing R projects.</p>
<p>So, to get started. Paul describes a dataset posted as an HTML table by Ben Goldacre that is used to generate the dots on this graph:</p>
<p><img alt="" src="http://static.guim.co.uk/sys-images/Admin/BkFill/Default_image_group/2011/10/28/1319816474482/bowel-cancer-mortality-ra-007.jpg" title="Bowel Cancer Mortality Rates" class="alignnone" width="460" height="276" /></p>
<p>The lines come from a probabilistic model that helps us see the likely spread of death rates given a particular population size.</p>
<p>If we want to do stats on the data, then we could, as Paul suggests, pull the data into a spreadsheet and then work from there&#8230; Or, we could pull it directly into R, at which point all manner of voodoo stats capabilities become available to us.</p>
<p>As with the <em>=importHTML</em> formula in Google spreadsheets, R has a way of scraping data from an HTML table anywhere on the public web:</p>
<p><tt>#First, we need to load in the XML library that contains the scraper function<br />
library(XML)<br />
#Scrape the table<br />
cancerdata=data.frame( readHTMLTable( 'http://www.guardian.co.uk/commentisfree/2011/oct/28/bad-science-diy-data-analysis', which=1, header=c('Area','Rate','Population','Number')))</tt></p>
<p>The format is simple: <tt>readHTMLTable(url,which=TABLENUMBER)</tt> (TABLENUMBER is used to extract the N&#8217;th table in the page.) The <tt>header</tt> part labels the columns (the data pulled in from the HTML table itself contains all sorts of clutter).</p>
<p>We can inspect the data we&#8217;ve imported as follows:</p>
<p><tt>#Look at the whole table<br />
cancerdata<br />
#Look at the column headers<br />
names(cancerdata)<br />
#Look at the first 10 rows<br />
head(cancerdata)<br />
#Look at the last 10 rows<br />
tail(cancerdata)<br />
#What sort of datatype is in the Number column?<br />
class(cancerdata$Number)<br />
</tt></p>
<p>The last line &#8211; <tt>class(cancerdata$Number)</tt> &#8211; identifies the data as type &#8216;factor&#8217;. In order to do stats and plot graphs, we need the Number, Rate and Population columns to contain actual numbers&#8230; (Factors organise data according to categories; when the table is loaded in, the data is loaded in as strings of characters; rather than seeing each number as a number, it&#8217;s identified as a category.)</p>
<p><tt>#Convert the numerical columns to a numeric datatype<br />
cancerdata$Rate=as.numeric(levels(cancerdata$Rate)[as.integer(cancerdata$Rate)])<br />
cancerdata$Population=as.numeric(levels(cancerdata$Population)[as.integer(cancerdata$Population)])<br />
cancerdata$Number=as.numeric(levels(cancerdata$Number)[as.integer(cancerdata$Number)])</tt><br />
#Just check it worked&#8230;<br />
class(cancerdata$Number)<br />
head(cancerdata)<br />
</tt></p>
<p>We can now plot the data:</p>
<p><tt>#Plot the Number of deaths by the Population<br />
plot(Number ~ Population,data=cancerdata)</tt></p>
<p>If we want to, we can add a title:<br />
<tt>#Add a title to the plot<br />
plot(Number ~ Population,data=cancerdata, main='Bowel Cancer Occurrence by Population')</tt></p>
<p>We can also tweak the axis labels:</p>
<p><tt>plot(Number ~ Population,data=cancerdata, main='Bowel Cancer Occurrence by Population',ylab='Number of deaths')</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2011/10/cancerdatasimple.png" onclick="urchinTracker('/outgoing/ouseful.files.wordpress.com/2011/10/cancerdatasimple.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/10/cancerdatasimple.png?w=700" alt="" title="cancerdataSimple"   class="alignnone size-full wp-image-6364" /></a></p>
<p>The <tt>plot</tt> command is great for generating quick charts. If we want a bit more control over the charts we produce, the <em>ggplot2</em> library is the way to go. (<em>ggpplot2</em> isn't part of the standard R bundle, so you'll need to install the package yourself if you haven't already installed it. In RStudio, find the <em>Packages</em> tab, click <em>Install Packages</em>, search for <em>ggplot2</em> and then install it, along with its dependencies...):</p>
<p><tt>require(ggplot2)<br />
ggplot(cancerdata)+geom_point(aes(x=Population,y=Number))+opts(title='Bowel Cancer Data')+ylab('Number of Deaths')</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2011/10/cancerdata_ggplot.png" onclick="urchinTracker('/outgoing/ouseful.files.wordpress.com/2011/10/cancerdata_ggplot.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/10/cancerdata_ggplot.png?w=700" alt="" title="cancerdata_ggplot"   class="alignnone size-full wp-image-6365" /></a></p>
<p>Doing a bit of searching for the "funnel plot" chart type used to display the ata in Goldacre's article, I came across a post on Cross Validated, the Stack Overflow/Statck Exchange site dedicated to statistics related Q&amp;A: <a href="http://stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r" onclick="urchinTracker('/outgoing/stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r?referer=');">How to draw funnel plot using ggplot2 in R?</a></p>
<p>The meta-analysis answer seemed to produce the similar chart type, so I had a go at cribbing the code... This is a dangerous thing to do, and I can't guarantee that the analysis is the same type of analysis as the one Goldacre refers to... but what I'm trying to do is show (quickly) that R provides a very powerful stats analysis environment and could probably do the sort of analysis you want in the hands of someone who knows how to drive it, and also knows what stats methods can be appropriately applied for any given data set...</p>
<p>Anyway - here's something resembling the Goldacre plot, using the cribbed code which has confidence limits at the 95% and 99.9% levels. Note that I needed to do a couple of things:</p>
<p>1) work out what values to use where! I did this by looking at the ggplot code to see what was plotted. p was on the y-axis and should be used to present the death rate. The data provides this as a rate per 100,000, so we need to divide by 100, 000 to make it a rate in the range 0..1. The x-axis is the population.</p>
<p><tt>#TH: funnel plot code from:<br />
#TH: http://stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r/5210#5210<br />
#TH: Use our cancerdata<br />
number=cancerdata$Population<br />
#TH: The rate is given as a 'per 100,000' value, so normalise it<br />
p=cancerdata$Rate/100000</p>
<p>p.se &lt;- sqrt((p*(1-p)) / (number))<br />
df &lt;- data.frame(p, number, p.se)</p>
<p>## common effect (fixed effect model)<br />
p.fem &lt;- weighted.mean(p, 1/p.se^2)</p>
<p>## lower and upper limits for 95% and 99.9% CI, based on FEM estimator<br />
#TH: I&#039;m going to alter the spacing of the samples used to generate the curves<br />
number.seq &lt;- seq(1000, max(number), 1000)<br />
number.ll95 &lt;- p.fem - 1.96 * sqrt((p.fem*(1-p.fem)) / (number.seq))<br />
number.ul95 &lt;- p.fem + 1.96 * sqrt((p.fem*(1-p.fem)) / (number.seq))<br />
number.ll999 &lt;- p.fem - 3.29 * sqrt((p.fem*(1-p.fem)) / (number.seq))<br />
number.ul999 &lt;- p.fem + 3.29 * sqrt((p.fem*(1-p.fem)) / (number.seq))<br />
dfCI &lt;- data.frame(number.ll95, number.ul95, number.ll999, number.ul999, number.seq, p.fem)</p>
<p>## draw plot<br />
#TH: note that we need to tweak the limits of the y-axis<br />
fp &lt;- ggplot(aes(x = number, y = p), data = df) +<br />
    geom_point(shape = 1) +<br />
    geom_line(aes(x = number.seq, y = number.ll95), data = dfCI) +<br />
    geom_line(aes(x = number.seq, y = number.ul95), data = dfCI) +<br />
    geom_line(aes(x = number.seq, y = number.ll999, linetype = 2), data = dfCI) +<br />
    geom_line(aes(x = number.seq, y = number.ul999, linetype = 2), data = dfCI) +<br />
    geom_hline(aes(yintercept = p.fem), data = dfCI) +<br />
    scale_y_continuous(limits = c(0,0.0004)) +<br />
  xlab(&quot;number&quot;) + ylab(&quot;p&quot;) + theme_bw() </p>
<p>fp</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2011/10/cancerdatafunnelplot.png" onclick="urchinTracker('/outgoing/ouseful.files.wordpress.com/2011/10/cancerdatafunnelplot.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/10/cancerdatafunnelplot.png?w=700" alt="" title="cancerdataFunnelplot"   class="alignnone size-full wp-image-6367" /></a></p>
<p>As I said above, it can be quite dangerous just pinching other folks' stats code if you aren't a statistician and don't really know whether you have actually replicated someone else's analysis or done something completely different... (this is a situation I often find myself in!); which is why I think we need to encourage folk who release statistical reports to not only release their data, but also show their working, including the code they used to generate any summary tables or charts that appear in those reports.</p>
<p>In addition, it's worth noting that cribbing other folk's code and analyses and applying it to your own data may lead to a nonsense result because some stats analyses only work if the data has the right sort of distribution...So be aware of that, always post your own working somewhere, and if someone then points out that it's nonsense, you'll hopefully be able to learn from it...</p>
<p>Given those caveats, what I hope to have done is raise awareness of what R can be used to do (including pulling data into a stats computing environment via an HTML table screenscrape) and also produced some sort of recipe we could take to a statistician to say: is this the sort of thing Ben Goldacre was talking about? And if not, why not?</p>
<p>[If I've made any huge - or even minor - blunders in the above, please let me know... There's always a risk in cutting and pasting things that look like they produce the sort of thing you're interested in, but may actually be doing something completely different!]</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6361/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6361/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6361/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6361/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6361/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6361/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6361/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6361/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6361/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6361/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6361/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6361/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6361/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6361/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6361/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6361/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6361/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6361/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6361/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6361/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6361/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6361&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ouseful.files.wordpress.com/2011/10/cancerdata_ggplot.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/10/cancerdatafunnelplot.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/10/cancerdatasimple.png" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://static.guim.co.uk/sys-images/Admin/BkFill/Default_image_group/2011/10/28/1319816474482/bowel-cancer-mortality-ra-007.jpg" length="" type="" />
		</item>
		<item>
		<title>Active Lobbying Through Meetings with UK Government Ministers</title>
		<link>http://blog.ouseful.info/2011/10/17/active-lobbying-through-meetings-with-uk-government-ministers/</link>
		<comments>http://blog.ouseful.info/2011/10/17/active-lobbying-through-meetings-with-uk-government-ministers/#comments</comments>
		<pubDate>Mon, 17 Oct 2011 12:57:53 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[DDJ]]></category>
		<category><![CDATA[lobbying]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6320</guid>
		<description><![CDATA[In a move that seemed to upset collectors of UK ministerial meeting data, @whoslobbying, on grounds of wasted effort, the Guardian datastore published a spreadsheet last night containing data relating to ministerial meetings between May 2010 and March 2011. (The first release of the spreadsheet actually omitted the column containing who the meeting was with, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6320&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In a move that seemed to upset collectors of UK ministerial meeting data, @whoslobbying, on grounds of wasted effort, the Guardian datastore published a spreadsheet last night containing <a href="http://www.guardian.co.uk/news/datablog/2011/oct/16/links-government-data-business-data" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/oct/16/links-government-data-business-data?referer=');">data relating to ministerial meetings between May 2010 and March 2011</a>.</p>
<p>(The first release of the spreadsheet actually omitted the column containing who the meeting was with, but that seems to be fixed now&#8230; There are, however, still plenty of character encoding issues (apostrophes, accented characters, some sort of em-dash, etc) that might cripple some plug and play tools.)</p>
<p>Looking over the data, we can use it as the basis for a network diagram with actors (Ministers and lobbiests) with edges representing meetings between Minsiters and lobbiests. There is one slight complication in that where there is a meeting between a Minister and several lobbiests, we ideally need to separate out the separate lobbiests into their own nodes.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253752000/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253752000/?referer=');"><img src="http://farm7.static.flickr.com/6213/6253752000_e4550e8129.jpg" width="500" height="458" alt="UK gov meetings spreadsheet" /></a></p>
<p>This probably provides an ideal opportunity to have a play with the Stanford Data Wrangler and try forcing these separate lobbiests onto separate rows, but I didn&#8217;t allow myself much time for the tinkering (and the requisite learning!), so I resorted to Python script to read in the data file and split out the different lobbiests. (I also did an iterative step, cleaning the downloaded CSV file in a text editor by replacing nasty characters that caused the script to choke.) You can find the script <a href="https://gist.github.com/1292500" onclick="urchinTracker('/outgoing/gist.github.com/1292500?referer=');">here</a> (note that it makes use of the <a href="http://networkx.lanl.gov/" onclick="urchinTracker('/outgoing/networkx.lanl.gov/?referer=');">networkx</a> network analysis library, which you&#8217;ll need to install if you want to run the script.)</p>
<p>The script generates a directed graph with links from Ministers to lobbiests and dumps it to a GraphML file (<a href="http://dl.dropbox.com/u/1156404/mtgs.graphml.zip" onclick="urchinTracker('/outgoing/dl.dropbox.com/u/1156404/mtgs.graphml.zip?referer=');">available here</a>) that can be loaded directly into Gephi. Here&#8217;s a view &#8211; using Gephi &#8211; of the hearth of the network. If we filter the graph to show nodes that met with at least five different Ministers&#8230;</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253273513/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253273513/?referer=');"><img src="http://farm7.static.flickr.com/6163/6253273513_8efd7d46fd.jpg" width="258" height="500" alt="Gephi - k-core filter" /></a></p>
<p>we can get a view into the heart of the UK lobbying netwrok:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253188589/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253188589/?referer=');"><img src="http://farm7.static.flickr.com/6157/6253188589_027a9c9807.jpg" width="500" height="393" alt="Active Lobbiests" /></a></p>
<p>I sized the lobbiest nodes according to eigenvector centrality, which gives an indication of well connected they are in the network.</p>
<p>One of the nice things about Gephi is that it allows for interactive exploration of a graph, For example, I can hover over a lobbiest node &#8211; <em>Barclays</em> in this case &#8211; to see which Ministers were met:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253809962/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253809962/?referer=');"><img src="http://farm7.static.flickr.com/6115/6253809962_93dc99b73c.jpg" width="500" height="346" alt="Bankers connect..." /></a></p>
<p>Alternatively, we can see who of the well connected met with the Minister for Welfare Reform:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253287007/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253287007/?referer=');"><img src="http://farm7.static.flickr.com/6094/6253287007_f1cba29c9e.jpg" width="500" height="394" alt="Welfare meetings..." /></a></p>
<p>Looking over the data, we also see how some Ministers are inconsistently referenced within the original dataset:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253840012/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253840012/?referer=');"><img src="http://farm7.static.flickr.com/6152/6253840012_a581de1c5b.jpg" width="500" height="138" alt="Multiple mentions" /></a></p>
<p>Note that the layout algorithm is such that the different representations of the same name are likely to meet similar lobbiests, which will end up placing the node in a similar location under the force directed layout I used. Which is to say &#8211; we may be able to use <em>visual</em> tools to help us identify fractured representations of the same individual. (Note that multiple meetings between the same parties can be visualised using the thickness of the edges, which are weighted according to the number of times the edge is described in the GraphML file&#8230;)</p>
<p>Unifying the different representations of the same indivudal is something that Google Refine could help us tidy up with its <a href="http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning" onclick="urchinTracker('/outgoing/www.propublica.org/nerds/item/using-google-refine-for-data-cleaning?referer=');">various clustering tools</a>, although it would be nice if the Datastore folk addressed this at source (or at least, as part of an ongoing data quality enhancement process&#8230;;-)</p>
<p>I guess we could also trying reconciling company names against universal company identifiers, for example by using <a href="http://vimeo.com/17924204" onclick="urchinTracker('/outgoing/vimeo.com/17924204?referer=');">Google Refine&#8217;s reconciliation service and the Open Corporates database</a>? Hmmm, which makes me wonder: do MySociety, or Public Whip, offer an MP or Ministerial position reconciliation service that works with Google Refine?</p>
<p>A couple of things I haven&#8217;t done: represented the department (which could be done via a node attribute, maybe, at least for the Ministers); represented actual meetings, and what I guess we might term co-lobbying behaviour, where several organisations are in the same meeting.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6320/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6320&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/10/17/active-lobbying-through-meetings-with-uk-government-ministers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://farm7.static.flickr.com/6213/6253752000_e4550e8129.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6152/6253840012_a581de1c5b.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6094/6253287007_f1cba29c9e.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6157/6253188589_027a9c9807.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6115/6253809962_93dc99b73c.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6163/6253273513_8efd7d46fd.jpg" length="" type="" />
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Data Journalists Engaging in Co-Innovation…</title>
		<link>http://blog.ouseful.info/2011/09/13/data-journalists-engaging-in-co-innovation/</link>
		<comments>http://blog.ouseful.info/2011/09/13/data-journalists-engaging-in-co-innovation/#comments</comments>
		<pubDate>Tue, 13 Sep 2011 14:46:57 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[BBC]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[datastore]]></category>
		<category><![CDATA[DDJ]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6200</guid>
		<description><![CDATA[You may or may not have noticed that the Boundary Commission released their take on proposed parliamentary constituency boundaries today. They could have released the data &#8211; as data &#8211; in the form of shape files that can be rendered at the click of a button in things like Google Maps&#8230; but they didn&#8217;t&#8230; [The [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6200&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>You may or may not have noticed that the <a href="http://consultation.boundarycommissionforengland.independent.gov.uk/whats-proposed/" onclick="urchinTracker('/outgoing/consultation.boundarycommissionforengland.independent.gov.uk/whats-proposed/?referer=');">Boundary Commission released their take on proposed parliamentary constituency boundaries</a> today.</p>
<p>They could have released the data &#8211; as data &#8211; in the form of shape files that can be rendered at the click of a button in things like Google Maps&#8230; but they didn&#8217;t&#8230; [<a href="http://blogs.telegraph.co.uk/news/sebastianpayne/100104727/the-one-thing-the-boundary-commission-quango-forgot-to-produce-a-map/" onclick="urchinTracker('/outgoing/blogs.telegraph.co.uk/news/sebastianpayne/100104727/the-one-thing-the-boundary-commission-quango-forgot-to-produce-a-map/?referer=');">The one thing the Boundary Commission quango forgot to produce: a map</a>] (There are issues with publishing the actual shapefiles, of course. For one thing, the boundaries may yet change &#8211; and if the original shapefiles are left hanging around, people may start to draw on these now incorrect sources of data once the boundaries are fixed. But that&#8217;s a minor issue&#8230;)</p>
<p>Instead, you have to download a series of hefty PDFs, one per region, to get a flavour of the boundary changes. Drawing a direct comparison with the current boundaries is not possible.</p>
<p>The make-up of the actual constituencies appears to based on their member wards, data which is provided in a series of spreadsheets, one per region, each containing several sheets describing the ward makeup of each new constituency for the counties in the corresponding region.</p>
<p>It didn&#8217;t take long for the data junkies to get on the case though. From my perspective, the first map I saw was on the Guardian Datastore, reusing work by <a href="http://www.shef.ac.uk/trp/staff/alasdair_rae" onclick="urchinTracker('/outgoing/www.shef.ac.uk/trp/staff/alasdair_rae?referer=');">University of Sheffield academic</a> <a href="http://undertheraedar.blogspot.com/" onclick="urchinTracker('/outgoing/undertheraedar.blogspot.com/?referer=');">Alasdair Rae</a>, apparently created using Google Fusion Tables (though I haven&#8217;t see a recipe published anywhere? Or a link to the KML file that I saw Guardian Datablog editor Simon Rogers/@smfrogers <a href="https://twitter.com/#!/smfrogers/statuses/113550197140885504" onclick="urchinTracker('/outgoing/twitter.com/_/smfrogers/statuses/113550197140885504?referer=');">tweet about</a>?)</p>
<p>[I knew I should have grabbed a screen shot of the original map...:-(]</p>
<p>It appears that Conrad Quilty-Harper (@coneee) over at the Telegraph then got on the case, and came up with a comparative map drawing on Rae&#8217;s work as published on the Datablog, showing the current boundaries compared to the proposed changes, and which ties the maps together so the zoom level and focus are matched across the maps (<a href="http://www.telegraph.co.uk/news/politics/8759664/MPs-constituencies-boundary-changes-mapped.html" onclick="urchinTracker('/outgoing/www.telegraph.co.uk/news/politics/8759664/MPs-constituencies-boundary-changes-mapped.html?referer=');">MPs&#8217; constituencies: boundary changes mapped</a>):</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6144102144/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6144102144/?referer=');"><img src="http://farm7.static.flickr.com/6188/6144102144_dabef50fc8.jpg" width="419" height="500" alt="Telegraph side by side map comparison" /></a></p>
<p>Interestingly, I was alerted to this map by Simon tweeting that he liked the Telegraph map so much, they&#8217;d reused the idea (and maybe even the code?) on the Guardian site. Here&#8217;s a snapshot of the conversation between these two data journalists over the course of the day (reverse chronological order):</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6144116028/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6144116028/?referer=');"><img src="http://farm7.static.flickr.com/6159/6144116028_0437b07dee.jpg" width="231" height="500" alt="Datajournalists in co-operative bootstrapping mode" /></a></p>
<p>Here&#8217;s the handshake&#8230;</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6144140530/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6144140530/?referer=');"><img src="http://farm7.static.flickr.com/6153/6144140530_d80740b877.jpg" width="500" height="275" alt="Collaborative co-evolution" /></a></p>
<p>I absolutely love this&#8230; and what&#8217;s more, it happened over the course of four or five hours, with a couple of technology/knowledge transfers along the way, as well as evolution in the way both news agencies communicated the information compared to the way the Boundary Commission released it. (If I was evil, I&#8217;d try to FOI the Boundary Commission to see how much time, effort and expense went into their communication effort around the proposed changes, and would then try to guesstimate how much the Guardian and Telegraph teams put into it as a comparison&#8230;)</p>
<p>At the time of writing (15.30), the BBC have no data driven take on this story&#8230;</p>
<p>And out of interest, I also wondered whether Sheffield U had a take&#8230;</p>
<p><a href="http://www.sheffield.ac.uk/mediacentre/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.sheffield.ac.uk/mediacentre/?referer=');"><img src="http://farm7.static.flickr.com/6201/6144166068_75a42871e6.jpg" width="500" height="361" alt="Sheffiled u media site" /></a></p>
<p>Maybe not&#8230;</p>
<p>PS By the by, the <a href="http://datadrivenjournalism.net" onclick="urchinTracker('/outgoing/datadrivenjournalism.net?referer=');">DataDrivenJournalism.net</a> website relaunched today. I&#8217;m honoured to be on the editorial board, along with @paulbradshaw @nicolaskb @mirkolorenz @smfrogers and @stiles, and looking forward to seeing how we can start to drive interest, engagement and skills development in, as well as analysis and (re)use of, and commentary on, public open data through the data journalism route&#8230;</p>
<p>PPS if you&#8217;re into data journalism, you may also be interested in <a href="http://getthedata.org" onclick="urchinTracker('/outgoing/getthedata.org?referer=');">GetTheData.org</a>, a question and answer site in the model of Stack Overflow, with an emphasis on Q&amp;A around how to find, access, and make use of open and public datasets. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6200/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6200/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6200/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6200/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6200/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6200/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6200/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6200/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6200/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6200/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6200/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6200/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6200/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6200/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6200/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6200&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/09/13/data-journalists-engaging-in-co-innovation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://blog.ouseful.info" length="0" type="" />
<enclosure url="http://farm7.static.flickr.com/6153/6144140530_d80740b877.jpg" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6188/6144102144_dabef50fc8.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6159/6144116028_0437b07dee.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6201/6144166068_75a42871e6.jpg" length="" type="" />
		</item>
		<item>
		<title>Creating Thematic Maps Based on UK Constituency Boundaries in Google Fusion Tables</title>
		<link>http://blog.ouseful.info/2011/09/13/creating-thematic-maps-based-on-uk-constituency-boundaries-in-google-fusion-tables/</link>
		<comments>http://blog.ouseful.info/2011/09/13/creating-thematic-maps-based-on-uk-constituency-boundaries-in-google-fusion-tables/#comments</comments>
		<pubDate>Tue, 13 Sep 2011 10:16:39 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[DDJ]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6197</guid>
		<description><![CDATA[I don&#8217;t have time to chase this just now, but it could be handy&#8230; Over the last few months, several of Alasdair Rae (University of Sheffield) Google Fusion Tables generated maps have been appearing on the Guardian Datablog, including one today showing the UK&#8217;s new Parliamentay constituency boundaries. Looking at Alasdair&#8217;s fusion table for English [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6197&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t have time to chase this just now, but it could be handy&#8230; Over the last few months, several of <a href="http://undertheraedar.blogspot.com/" onclick="urchinTracker('/outgoing/undertheraedar.blogspot.com/?referer=');">Alasdair Rae</a> (University of Sheffield) Google Fusion Tables generated maps have been appearing on the Guardian Datablog, including one today showing the <a href="http://www.guardian.co.uk/news/datablog/interactive/2011/sep/13/boundary-changes-constituency-map" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/interactive/2011/sep/13/boundary-changes-constituency-map?referer=');">UK&#8217;s new Parliamentay constituency boundaries</a>.</p>
<p>Looking at Alasdair&#8217;s <a href="http://www.google.com/fusiontables/DataSource?dsrcid=628653" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/DataSource?dsrcid=628653&amp;referer=');">fusion table for English Indices of Deprivation 2010</a>, we can see how it contains various output area codes as well as KML geometry shape files that can be used to draw the boundaries on map.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6143058459/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6143058459/?referer=');"><img src="http://farm7.static.flickr.com/6196/6143058459_06bc9e5601.jpg" width="500" height="188" alt="Google fusion table - UK boundaries" /></a></p>
<p>On the to do list, then, is to a set of fusion tables that we can use to generate maps from datatables containing particular sorts of output area code. Because it&#8217;s easy to join two fusion tables by  a common column, we&#8217;d then have a Google Fusion Tables simple recipe for thematic maps:</p>
<p>1) get data containing output area or constituency codes;<br />
2) join with the appropriate mapping fusion table to annotate original data with appropriate shape files;<br />
3) generate map&#8230;</p>
<p>I wonder &#8211; have Alasdair or anyone from the Guardian Datablog/Datastore team already published such a tutorial?</p>
<p>PS Ah, here&#8217;s one example tutorial: <a href="http://www.peteraldhous.com/CAR/Making_a_thematic_map_with_Google_Fusion_Tables.pdf" onclick="urchinTracker('/outgoing/www.peteraldhous.com/CAR/Making_a_thematic_map_with_Google_Fusion_Tables.pdf?referer=');">Peter Aldhous: Thematic Maps with Google Fusion Tables [PDF]</a></p>
<p>PPS for constituency boundary shapefiles as KML see <a href="http://www.google.com/fusiontables/DataSource?dsrcid=1574396" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/DataSource?dsrcid=1574396&amp;referer=');">http://www.google.com/fusiontables/DataSource?dsrcid=1574396</a> or the Guardian Datastore&#8217;s <a href="http://www.google.com/fusiontables/exporttable?query=select+col0%3E%3E1+from+1474106+&amp;o=kmllink&amp;g=col0%3E%3E1" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/exporttable?query=select+col0_3E_3E1+from+1474106+_amp_o=kmllink_amp_g=col0_3E_3E1&amp;referer=');">http://www.google.com/fusiontables/exporttable?query=select+col0%3E%3E1+from+1474106+&amp;o=kmllink&amp;g=col0%3E%3E1</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6197/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6197/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6197/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6197/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6197/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6197/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6197/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6197/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6197/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6197/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6197/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6197/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6197/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6197/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6197/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6197/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6197/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6197/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6197/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6197/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6197/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6197&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/09/13/creating-thematic-maps-based-on-uk-constituency-boundaries-in-google-fusion-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6196/6143058459_06bc9e5601.jpg" length="" type="" />
		</item>
	</channel>
</rss>

