<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; visualisation</title>
	<atom:link href="http://onlinejournalismblog.com/tag/visualisation/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Sat, 11 Feb 2012 12:06:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Word cloud or bar chart?</title>
		<link>http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/</link>
		<comments>http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 07:54:46 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[bar charts]]></category>
		<category><![CDATA[New York Times]]></category>
		<category><![CDATA[tagxedo]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[word clouds]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15743</guid>
		<description><![CDATA[One of the easiest ways to get someone started on data visualisation is to introduce them to word clouds (it also demonstrates neatly how not all data is numerical). Using tools like Wordle and Tagxedo, you can paste in a major speech and see it visualised within a minute or so. But is a word cloud the best way of<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fword-cloud-or-bar-chart%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F01_2F27_2Fword-cloud-or-bar-chart_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fword-cloud-or-bar-chart%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/01/Choice-words1.png"><img class="alignnone  wp-image-15744" src="http://onlinejournalismblog.com/wp-content/uploads/2012/01/Choice-words1.png" alt="Bar charts preferred over word clouds" width="430" height="328" /></a></p>
<p>One of the easiest ways to get someone started on data visualisation is to introduce them to word clouds (it also demonstrates neatly how not all data is numerical).</p>
<p>Using tools like Wordle and Tagxedo, you can paste in a major speech and see it visualised within a minute or so.</p>
<p>But is a word cloud the best way of visualising speeches? The New York Times appear to think otherwise. Their <a href="http://www.nytimes.com/interactive/2012/01/24/us/politics/0124-words.html" onclick="urchinTracker('/outgoing/www.nytimes.com/interactive/2012/01/24/us/politics/0124-words.html?referer=');">visualisation</a> (above) comparing President Obama&#8217;s State of the Union address and speeches by Republican presidential candidates chooses to use something far less fashionable: the bar chart.</p>
<p>Why did they choose a bar chart? The key is the purpose of the chart: <strong>comparison</strong>. If your objective is to capture the spirit of a speech, or its key themes, then a word cloud can still work well, if you clean the data (see <a href="http://www.nytimes.com/interactive/2009/01/17/washington/20090117_ADDRESSES.html" onclick="urchinTracker('/outgoing/www.nytimes.com/interactive/2009/01/17/washington/20090117_ADDRESSES.html?referer=');">this interactive example that appeared on the New York Times in 2009</a>).</p>
<p>But if you want to compare it to speeches of others &#8211; and particularly if you want to compare on specific issues such as employment or tax &#8211; then bar charts are a better choice. Compare, for example, <a href="http://www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php" onclick="urchinTracker('/outgoing/www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php?referer=');">ReadWriteWeb&#8217;s comparison of inaugural speeches</a>, and how effective that is compared to the bar charts.</p>
<p>In short, don&#8217;t always reach for the obvious chart type &#8211; and be clear what you&#8217;re trying to communicate.</p>
<p>UPDATE: <a href="http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/" onclick="urchinTracker('/outgoing/www.niemanlab.org/2011/10/word-clouds-considered-harmful/?referer=');">More criticism of word clouds by New York Times software architect here</a> (<a href="https://twitter.com/#!/harrietebailey/statuses/162885114030858240" onclick="urchinTracker('/outgoing/twitter.com/_/harrietebailey/statuses/162885114030858240?referer=');">via Harriet Bailey</a>)</p>
<div class="wp-caption alignnone" style="width: 437px"><a href="http://www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php" onclick="urchinTracker('/outgoing/www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php?referer=');"><img src="http://rww.readwriteweb.netdna-cdn.com/images/obamaonblack.jpg" alt="Obama inaugural speech word cloud by ReadWriteWeb" width="427" height="239" /></a><p class="wp-caption-text">Obama inaugural speech word cloud by ReadWriteWeb</p></div>
<p><a href="http://flowingdata.com/2012/01/24/words-used-in-sotu-and-republican-presidential-candidates-in-debates/" onclick="urchinTracker('/outgoing/flowingdata.com/2012/01/24/words-used-in-sotu-and-republican-presidential-candidates-in-debates/?referer=');"><em>via Flowing Data</em></a></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fword-cloud-or-bar-chart%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Active Lobbying Through Meetings with UK Government Ministers</title>
		<link>http://blog.ouseful.info/2011/10/17/active-lobbying-through-meetings-with-uk-government-ministers/</link>
		<comments>http://blog.ouseful.info/2011/10/17/active-lobbying-through-meetings-with-uk-government-ministers/#comments</comments>
		<pubDate>Mon, 17 Oct 2011 12:57:53 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[DDJ]]></category>
		<category><![CDATA[lobbying]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6320</guid>
		<description><![CDATA[In a move that seemed to upset collectors of UK ministerial meeting data, @whoslobbying, on grounds of wasted effort, the Guardian datastore published a spreadsheet last night containing data relating to ministerial meetings between May 2010 and March 2011. (The first release of the spreadsheet actually omitted the column containing who the meeting was with, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6320&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In a move that seemed to upset collectors of UK ministerial meeting data, @whoslobbying, on grounds of wasted effort, the Guardian datastore published a spreadsheet last night containing <a href="http://www.guardian.co.uk/news/datablog/2011/oct/16/links-government-data-business-data" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/oct/16/links-government-data-business-data?referer=');">data relating to ministerial meetings between May 2010 and March 2011</a>.</p>
<p>(The first release of the spreadsheet actually omitted the column containing who the meeting was with, but that seems to be fixed now&#8230; There are, however, still plenty of character encoding issues (apostrophes, accented characters, some sort of em-dash, etc) that might cripple some plug and play tools.)</p>
<p>Looking over the data, we can use it as the basis for a network diagram with actors (Ministers and lobbiests) with edges representing meetings between Minsiters and lobbiests. There is one slight complication in that where there is a meeting between a Minister and several lobbiests, we ideally need to separate out the separate lobbiests into their own nodes.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253752000/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253752000/?referer=');"><img src="http://farm7.static.flickr.com/6213/6253752000_e4550e8129.jpg" width="500" height="458" alt="UK gov meetings spreadsheet" /></a></p>
<p>This probably provides an ideal opportunity to have a play with the Stanford Data Wrangler and try forcing these separate lobbiests onto separate rows, but I didn&#8217;t allow myself much time for the tinkering (and the requisite learning!), so I resorted to Python script to read in the data file and split out the different lobbiests. (I also did an iterative step, cleaning the downloaded CSV file in a text editor by replacing nasty characters that caused the script to choke.) You can find the script <a href="https://gist.github.com/1292500" onclick="urchinTracker('/outgoing/gist.github.com/1292500?referer=');">here</a> (note that it makes use of the <a href="http://networkx.lanl.gov/" onclick="urchinTracker('/outgoing/networkx.lanl.gov/?referer=');">networkx</a> network analysis library, which you&#8217;ll need to install if you want to run the script.)</p>
<p>The script generates a directed graph with links from Ministers to lobbiests and dumps it to a GraphML file (<a href="http://dl.dropbox.com/u/1156404/mtgs.graphml.zip" onclick="urchinTracker('/outgoing/dl.dropbox.com/u/1156404/mtgs.graphml.zip?referer=');">available here</a>) that can be loaded directly into Gephi. Here&#8217;s a view &#8211; using Gephi &#8211; of the hearth of the network. If we filter the graph to show nodes that met with at least five different Ministers&#8230;</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253273513/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253273513/?referer=');"><img src="http://farm7.static.flickr.com/6163/6253273513_8efd7d46fd.jpg" width="258" height="500" alt="Gephi - k-core filter" /></a></p>
<p>we can get a view into the heart of the UK lobbying netwrok:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253188589/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253188589/?referer=');"><img src="http://farm7.static.flickr.com/6157/6253188589_027a9c9807.jpg" width="500" height="393" alt="Active Lobbiests" /></a></p>
<p>I sized the lobbiest nodes according to eigenvector centrality, which gives an indication of well connected they are in the network.</p>
<p>One of the nice things about Gephi is that it allows for interactive exploration of a graph, For example, I can hover over a lobbiest node &#8211; <em>Barclays</em> in this case &#8211; to see which Ministers were met:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253809962/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253809962/?referer=');"><img src="http://farm7.static.flickr.com/6115/6253809962_93dc99b73c.jpg" width="500" height="346" alt="Bankers connect..." /></a></p>
<p>Alternatively, we can see who of the well connected met with the Minister for Welfare Reform:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253287007/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253287007/?referer=');"><img src="http://farm7.static.flickr.com/6094/6253287007_f1cba29c9e.jpg" width="500" height="394" alt="Welfare meetings..." /></a></p>
<p>Looking over the data, we also see how some Ministers are inconsistently referenced within the original dataset:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6253840012/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6253840012/?referer=');"><img src="http://farm7.static.flickr.com/6152/6253840012_a581de1c5b.jpg" width="500" height="138" alt="Multiple mentions" /></a></p>
<p>Note that the layout algorithm is such that the different representations of the same name are likely to meet similar lobbiests, which will end up placing the node in a similar location under the force directed layout I used. Which is to say &#8211; we may be able to use <em>visual</em> tools to help us identify fractured representations of the same individual. (Note that multiple meetings between the same parties can be visualised using the thickness of the edges, which are weighted according to the number of times the edge is described in the GraphML file&#8230;)</p>
<p>Unifying the different representations of the same indivudal is something that Google Refine could help us tidy up with its <a href="http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning" onclick="urchinTracker('/outgoing/www.propublica.org/nerds/item/using-google-refine-for-data-cleaning?referer=');">various clustering tools</a>, although it would be nice if the Datastore folk addressed this at source (or at least, as part of an ongoing data quality enhancement process&#8230;;-)</p>
<p>I guess we could also trying reconciling company names against universal company identifiers, for example by using <a href="http://vimeo.com/17924204" onclick="urchinTracker('/outgoing/vimeo.com/17924204?referer=');">Google Refine&#8217;s reconciliation service and the Open Corporates database</a>? Hmmm, which makes me wonder: do MySociety, or Public Whip, offer an MP or Ministerial position reconciliation service that works with Google Refine?</p>
<p>A couple of things I haven&#8217;t done: represented the department (which could be done via a node attribute, maybe, at least for the Ministers); represented actual meetings, and what I guess we might term co-lobbying behaviour, where several organisations are in the same meeting.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6320/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6320/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6320/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6320/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6320&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/10/17/active-lobbying-through-meetings-with-uk-government-ministers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://farm7.static.flickr.com/6213/6253752000_e4550e8129.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6152/6253840012_a581de1c5b.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6094/6253287007_f1cba29c9e.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6157/6253188589_027a9c9807.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6115/6253809962_93dc99b73c.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6163/6253273513_8efd7d46fd.jpg" length="" type="" />
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>SFTW: 9 data journalism tools</title>
		<link>http://onlinejournalismblog.com/2011/08/19/sftw-9-data-journalism-tools/</link>
		<comments>http://onlinejournalismblog.com/2011/08/19/sftw-9-data-journalism-tools/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 10:26:18 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[buzzdata]]></category>
		<category><![CDATA[cleaning]]></category>
		<category><![CDATA[data wrangler]]></category>
		<category><![CDATA[datamarket]]></category>
		<category><![CDATA[google news scraper]]></category>
		<category><![CDATA[impure]]></category>
		<category><![CDATA[junar]]></category>
		<category><![CDATA[metadata extraction tool]]></category>
		<category><![CDATA[roambi]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[Something for the weekend]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[zanran]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15048</guid>
		<description><![CDATA[There have been quite a few tools springing up over the past few months that I&#8217;ve not had time to blog about, so here&#8217;s a roundup post on all of them &#8211; a bumper Something For The Weekend (let me know how you find these). 1. Junar &#8211; for scraping websites and sharing data Junar presents a much easier way<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/08/19/sftw-9-data-journalism-tools/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F19%2Fsftw-9-data-journalism-tools%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F08_2F19_2Fsftw-9-data-journalism-tools_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F19%2Fsftw-9-data-journalism-tools%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>There have been quite a few tools springing up over the past few months that I&#8217;ve not had time to blog about, so here&#8217;s a roundup post on all of them &#8211; a bumper <a href="http://onlinejournalismblog.com/tag/something-for-the-weekend/">Something For The Weekend</a> (let me know how you find these).</p>
<h2>1. Junar &#8211; for scraping websites and sharing data</h2>
<p><a href="http://www.junar.com/" onclick="urchinTracker('/outgoing/www.junar.com/?referer=');">Junar </a>presents a much easier way to scrape data from online tables with its &#8216;<a href="http://www.junar.com/datastreams/create" onclick="urchinTracker('/outgoing/www.junar.com/datastreams/create?referer=');">Collect Data</a>&#8216; tool &#8211; and the team behind it tell me they have plans to build functionality allowing users to scrape linked pages, as well as the ability to scrape PDFs.<span id="more-15048"></span></p>
<h2>2. BuzzData &#8211; for sharing data</h2>
<p><a href="http://buzzdata.com/" onclick="urchinTracker('/outgoing/buzzdata.com/?referer=');">BuzzData</a> is a platform for sharing data &#8211; essentially a social network where you can follow other data journalists or datasets, tag and license your data, and &#8211; importantly &#8211; add visualisations, articles and attachments. When someone else builds on your data, it tells you, which is nice.</p>
<h2>3. DataMarket &#8211; for finding data</h2>
<p><a href="http://datamarket.com/" onclick="urchinTracker('/outgoing/datamarket.com/?referer=');">DataMarket</a> is exactly what it says on the tin: a market for data from organisations including the UN, BP, Eurostat, the IMF, USGS, and various other acronyms. You can access the data for free, or pay for extra functionality such as exporting to Excel.</p>
<h2>4. Google News Scraper &#8211; for grabbing data on news coverage</h2>
<p><a href="https://tools.issuecrawler.net/beta/googleNews/" onclick="urchinTracker('/outgoing/tools.issuecrawler.net/beta/googleNews/?referer=');">This scraper</a> will allow you to gather data on coverage of a particular issue, event or person. It only gathers the teaser text but the country data may if you want to map coverage, while the URLs can provide a starting point for further scraping experiments.</p>
<h2>5. Metadata extraction tool &#8211; a first step for searching document dumps?</h2>
<p><a href="http://meta-extractor.sourceforge.net/" onclick="urchinTracker('/outgoing/meta-extractor.sourceforge.net/?referer=');">This</a> is aimed at file preservation activities, but it has a few possible applications for journalists. Firstly, it has a Windows interface for exploring the metadata of a bunch of files, making it possible to sort in different ways to more quickly look for information you&#8217;re seeking. Secondly, the generation of an XML file will give some structure which could allow you to, for example, plot your documents on a timeline, spotting patterns or outliers.</p>
<h2>6. Roambi &#8211; data visualisation on your iPhone</h2>
<p>Sadly, <a href="http://www.roambi.com/" onclick="urchinTracker('/outgoing/www.roambi.com/?referer=');">it&#8217;s only <em>your</em> iPhone</a>, not anyone else&#8217;s, so this is more if you&#8217;re on the move but want to go through some private data visualisations which might hide a story.</p>
<h2>7. Data Wrangler &#8211; web-based data cleaning tool</h2>
<p><a href="http://vis.stanford.edu/wrangler/" onclick="urchinTracker('/outgoing/vis.stanford.edu/wrangler/?referer=');">This</a> looks pretty powerful, if not pretty full stop. Here&#8217;s a video:</p>
<p>http://vimeo.com/19185801</p>
<h2>8. Impure &#8211; visual programming language</h2>
<p>From the About page:</p>
<p>&#8220;<a href="http://www.impure.com/" onclick="urchinTracker('/outgoing/www.impure.com/?referer=');">Impure</a> is a visual programming language aimed to gather, process and visualize information. With impure is possible to obtain information from very different sources; from user owned data to diverse feeds in internet, including social media data, real time or historical financial information, images, news, search queries and many more. Impure is a tool to be in touch with data around internet, to deeply understand it. Within a modular logic interface you can quickly link information to operators, controls and visualization methods, bringing all the power of the comprehension of information and knowledge to the not programmers that want to work with information in a professional way.&#8221;</p>
<h2>9. Zanran &#8211; PDF/spreadsheet/table search engine</h2>
<p><a href="http://www.zanran.com/q/" onclick="urchinTracker('/outgoing/www.zanran.com/q/?referer=');">This looks a very useful tool</a> for narrowing down searches to PDFs, spreadsheets, and tables within webpages (the advanced search allows further narrowing by filetype, date, server location and site). Clever stuff behind it &#8211; particularly in the way it looks at images and decides if they&#8217;re charts. The site says they plan to add Word documents and PowerPoint presentations soon.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F19%2Fsftw-9-data-journalism-tools%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/08/19/sftw-9-data-journalism-tools/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>INFOGRAPHIC: UK riots &#8211; Gauging the Columnists Blame Game</title>
		<link>http://onlinejournalismblog.com/2011/08/12/visualisation-uk-riots-gauging-the-columnists-blame-game/</link>
		<comments>http://onlinejournalismblog.com/2011/08/12/visualisation-uk-riots-gauging-the-columnists-blame-game/#comments</comments>
		<pubDate>Fri, 12 Aug 2011 10:42:19 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[columnists]]></category>
		<category><![CDATA[gauge]]></category>
		<category><![CDATA[liberal conspiracy]]></category>
		<category><![CDATA[uk riots]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15049</guid>
		<description><![CDATA[Here&#8217;s a quick experiment in data visualisation to provide an instant insight into a story on how the blame game is being played by columnists. The data is taken from a Liberal Conspiracy blog post &#8211; I&#8217;ve transferred that into a spreadsheet with limited categories and used the Gauges gadget to visualise the totals. A screengrab is below &#8211; but<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/08/12/visualisation-uk-riots-gauging-the-columnists-blame-game/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F12%2Fvisualisation-uk-riots-gauging-the-columnists-blame-game%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F08_2F12_2Fvisualisation-uk-riots-gauging-the-columnists-blame-game_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F12%2Fvisualisation-uk-riots-gauging-the-columnists-blame-game%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Here&#8217;s a quick experiment in data visualisation to provide an instant insight into a story on how the blame game is being played by columnists.</p>
<p>The data is taken from a <a href="http://liberalconspiracy.org/2011/08/12/whos-to-blame-for-riots-play-the-right-wing-bingo/#comment-303039" onclick="urchinTracker('/outgoing/liberalconspiracy.org/2011/08/12/whos-to-blame-for-riots-play-the-right-wing-bingo/_comment-303039?referer=');">Liberal Conspiracy blog post</a> &#8211; I&#8217;ve transferred that into<a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDJya2Z5bkNvVjgzbDB4UlItSFpQcXc&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDJya2Z5bkNvVjgzbDB4UlItSFpQcXc_amp_hl=en_GB&amp;referer=');"> a spreadsheet</a> with limited categories and used the Gauges gadget to visualise the totals.</p>
<p>A screengrab is below &#8211; but there is also an embed code that provides a gauge that will be updated whenever a new columnist is added. <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDJya2Z5bkNvVjgzbDB4UlItSFpQcXc&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDJya2Z5bkNvVjgzbDB4UlItSFpQcXc_amp_hl=en_GB&amp;referer=');">See the spreadsheet for both the gauge and the raw data</a>.</p>
<div id="attachment_15052" class="wp-caption alignnone" style="width: 467px"><a href="http://onlinejournalismblog.com/wp-content/uploads/2011/08/ColumnistBlameGameGauge_Riots.png"><img class="size-full wp-image-15052" title="ColumnistBlameGameGauge_Riots" src="http://onlinejournalismblog.com/wp-content/uploads/2011/08/ColumnistBlameGameGauge_Riots.png" alt="Columnist Blame Game Gauge - UK Riots" width="457" height="347" /></a><p class="wp-caption-text">Columnist Blame Game Gauge</p></div>
<p><script src="https://docs.google.com/gpub?url=http%3A%2F%2F0ktjprp9lkdpl36tkqilnfo7sutd589j-ss-opensocial.googleusercontent.com%2Fgadgets%2Fifr%3Fup_title%3DGauging%2520the%2520Columnist%2520Blame%2520Game%26up_minvalue%3D0%26up_maxvalue%3D6%26up_greenrange%26up_yellowrange%26up_redrange%26up_minorticks%3D2%26up__table_query_url%3Dhttps%253A%252F%252Fdocs.google.com%252Fspreadsheet%252Ftq%253Frange%253DA11%25253AK12%2526gid%253D0%2526key%253D0ApTo6f5Yj1iJdDJya2Z5bkNvVjgzbDB4UlItSFpQcXc%2526pub%253D1%26url%3Dhttp%253A%252F%252Fwww.google.com%252Fig%252Fmodules%252Fgauge.xml%26spreadsheets%3Dspreadsheets&amp;height=753&amp;width=1280"></script></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F12%2Fvisualisation-uk-riots-gauging-the-columnists-blame-game%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/08/12/visualisation-uk-riots-gauging-the-columnists-blame-game/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>In Spanish: The inverted pyramid of data journalism part 2</title>
		<link>http://onlinejournalismblog.com/2011/07/14/in-spanish-the-inverted-pyramid-of-data-journalism-part-2/</link>
		<comments>http://onlinejournalismblog.com/2011/07/14/in-spanish-the-inverted-pyramid-of-data-journalism-part-2/#comments</comments>
		<pubDate>Thu, 14 Jul 2011 15:35:15 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[6 formas diferentes de comunicar en periodismo de datos]]></category>
		<category><![CDATA[inverted pyramid of data journalism]]></category>
		<category><![CDATA[La Pirámide Invertida del Periodismo de Datos]]></category>
		<category><![CDATA[Mauro Accurso]]></category>
		<category><![CDATA[spanish]]></category>
		<category><![CDATA[translation]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14914</guid>
		<description><![CDATA[Mauro Accurso has followed up his rapid translation of last week&#8217;s inverted pyramid of data journalism with a Spanish version of part 2: the 6 C&#8217;s of communicating data journalism. It&#8217;s copied in full below. La semana pasada les traduje la primera parte de La Pirámide Invertida del Periodismo de Datos de Paul Bradshaw que prometió extender en el aspecto de<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/07/14/in-spanish-the-inverted-pyramid-of-data-journalism-part-2/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F14%2Fin-spanish-the-inverted-pyramid-of-data-journalism-part-2%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F07_2F14_2Fin-spanish-the-inverted-pyramid-of-data-journalism-part-2_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F14%2Fin-spanish-the-inverted-pyramid-of-data-journalism-part-2%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em><strong>Mauro Accurso</strong> has followed up his <a href="http://onlinejournalismblog.com/2011/07/08/the-inverted-pyramid-of-data-journalism-in-spanish/">rapid translation</a> of last week&#8217;s <a href="http://onlinejournalismblog.com/2011/07/07/the-inverted-pyramid-of-data-journalism/">inverted pyramid of data journalism</a> with a <a href="http://tejiendo-redes.com/2011/07/13/formas-de-comunicar-informacion-la-piramide-invertida-del-periodismo-de-datos-p2/" onclick="urchinTracker('/outgoing/tejiendo-redes.com/2011/07/13/formas-de-comunicar-informacion-la-piramide-invertida-del-periodismo-de-datos-p2/?referer=');">Spanish version</a> of <a href="http://onlinejournalismblog.com/2011/07/13/the-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism/">part 2: the 6 C&#8217;s of communicating data journalism</a>. It&#8217;s copied in full below.</em></p>
<p><em>La semana pasada les traduje la primera parte de </em><a href="http://tejiendo-redes.com/2011/07/07/la-piramide-invertida-del-periodismo-de-datos/" target="_blank" onclick="urchinTracker('/outgoing/tejiendo-redes.com/2011/07/07/la-piramide-invertida-del-periodismo-de-datos/?referer=');">La Pirámide Invertida del Periodismo de Datos</a> de <a href="http://onlinejournalismblog.com/2011/07/07/the-inverted-pyramid-of-data-journalism/" target="_blank">Paul Bradshaw</a> que prometió extender en el aspecto de comunicación del extenso proceso que significa el <strong><a href="http://tejiendo-redes.com/?s=periodismo+de+datos" target="_blank" onclick="urchinTracker('/outgoing/tejiendo-redes.com/?s=periodismo+de+datos&amp;referer=');">periodismo de datos</a></strong>.</p>
<p><a href="http://maccur.files.wordpress.com/2011/07/comunicar-periodismo-de-datos.png" onclick="urchinTracker('/outgoing/maccur.files.wordpress.com/2011/07/comunicar-periodismo-de-datos.png?referer=');"><img title="comunicar periodismo de datos" src="http://maccur.files.wordpress.com/2011/07/comunicar-periodismo-de-datos.png?w=365&amp;h=436" alt="comunicar periodismo de datos" width="365" height="436" /></a>En esta segunda parte Paul recorre <strong>6 formas diferentes de comunicar en periodismo de datos</strong> que pueden ver en el cuadro de arriba y al final encontrarán un gráfico que resume toda la teoría (la cual está en desarrollo todavía y Bradshaw pide aportes, comentarios y sugerencias):</p>
<p><span id="more-14914"></span></p>
<p>“El periodismo de datos moderno ha crecido junto con un gran aumento en visualización y esto puede llevarnos algunas veces a dejar de lado diferentes formas de contar historias que involucren grandes números. La intención de lo siguiente es funcionar como un manual para asegurar que todas las opciones sean consideradas:</p>
<h1>1. VISUALIZACIÓN</h1>
<p>La visualización es la forma más rápida de comunicar los resultados del periodismo de datos: herramientas gratuitas como Google Docs lo permiten con un sólo click y herramientas más poderosas como Many Eyes sólo requieren que el usuario pegue la data cruda y seleccione de un grupo de opciones de visualización.</p>
<p>Pero facilidad no es igual a efectividad. El surgimiento de cuadros-basura demuestra que la visualización no es inmune al <a href="http://en.wikipedia.org/wiki/Churnalism" target="_blank" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/Churnalism?referer=');">churnalism</a> o al espectáculo sin profundidad. Hay una rica historia de visualizaciones en gráfica que se mantiene relevante para la generación de las infografías online: enfocarse en no más de 4 puntos de datos, evitar el 3D y asegurarse que el gráfico es autosuficiente son sólo algunas.</p>
<p>No es un proceso simple pero, sin embargo, la visualización tiene una gran ventaja que hace que ese esfuerzo valga la pena: puede hacer que la comunicación sea increíblemente efectiva. Puede proveer de un método de distribución de tu contenido que no puede ser igualado por otros tipos de comunicación listado acá.</p>
<p>Pero su mayor fortaleza es también su mayor debilidad: la naturaleza instantánea de las infografías también significa que las personas a menudo no pasan demasiado tiempo mirándolas. Las hace muy efectivas para la distribución pero no para el engagement, así que es importante pensar estratégicamente acerca de 1) asegurarse que la imagen contenga un enlace a la fuente; y 2) asegurarse que haya algo más en la fuente cuando la gente llegue.</p>
<h1>2. NARRACIÓN</h1>
<p>Un artículo tradicional puede luchar para contener la clase de números que el periodismo de datos suele recorrer, pero aún así provee una forma accesible para que las personas entiendan la historia, si está hecho bien.</p>
<p>Como con la visualización, menos suele ser más. Pero también, como en la mayoría de la narrativa, necesitas pensar en el significado y tus objetivos en comunicar esos números.</p>
<p>Las cifras abstractas pueden ser impresionantes, pero sin sentido e inútiles. ¿Qué significa que 10 millones hayan sido gastados en algo? ¿Eso es más o menos que lo usual? ¿Más o menos que algo similar? Traten de bajar los montos a cantidades manejables: las sumas por persona o por día, por ejemplo. Finalmente, usen la edición para enfocarse en las cuestiones principales y asegúrense de enlazar al conjunto.</p>
<h1>3. COMUNICACIÓN SOCIAL</h1>
<p>La comunicación es un arte social y el éxito de infografías a través de medios sociales es un testamento de eso. Pero no son sólo las infografías que son sociales, la información también lo es. The Guardian ha demostrado eso de forma exitosa con la rica comunidad del <a href="http://www.guardian.co.uk/news/datablog" target="_blank" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog?referer=');">Data Blog</a> y alrededor de su API. Iniciativas de Crowdsourcing con el objetivo de recolectar data también pueden brindar una dimensión social a la información (ejemplos que remarca Paul: proyecto “<a href="http://mps-expenses.guardian.co.uk/" target="_blank" onclick="urchinTracker('/outgoing/mps-expenses.guardian.co.uk/?referer=');">Investigate your MP’s expenses</a>” del Guardian que liberó más de 450 mil documentos para que los revisen los usuarios y cuando <a href="http://www.guardian.co.uk/technology/blog/poll/2010/jan/26/apple-tablet-crowdsource-specifications" target="_blank" onclick="urchinTracker('/outgoing/www.guardian.co.uk/technology/blog/poll/2010/jan/26/apple-tablet-crowdsource-specifications?referer=');">hicieron un crowdsource de las supuestas especificaciones del iPad</a>). Hay otros ejemplos también, especialmente cuando no hay otra forma de conseguir la información.</p>
<p>La conectividad de la web ofrece nuevas oportunidades para presentar al periodismo de datos en una forma social. La aplicación de ProPublica que provee resultados basados en tu perfil de Facebook (escuelas a las que fueron; amigos que usaron la aplicación) es un ejemplo de como el periodismo de datos puede aprovechar la data social y, al mismo tiempo, como comunicar los resultados del periodismo de datos puede ser orientado alrededor de dinámicas sociales usando elementos como concursos, compartir, competiciones, campañas y colaboración. Estamos recién en el comienzo de este aspecto del periodismo online.</p>
<h1>4. HUMANIZAR</h1>
<p>Los programas de noticias a menudo utilizan casos de estudio para tratar el problema de presentar historias basadas en números en televisión o radio. Si los tiempos de espera en hospitales han aumentado, hablan con alguien que ha tenido que esperar un montón de tiempo por una operación. En otras palabras, humaniza los números.</p>
<p>Más recientemente, el crecimiento de gráficos en movimiento generados por computadora ha bajado la presión de cierta forma, ya que los presentadores pueden utilizar animaciones poderosas para ilustrar una historia.</p>
<p>Pero una vez más, surge el punto de hacer historias relevantes para las personas. Como escribí en “<a href="http://onlinejournalismblog.com/2010/12/07/wikileaks-cablegate/" target="_blank"><em>One ambassador’s embarrassment is a tragedy, 15,000 civilian deaths is a statistic</em></a>” (resumen del post en español: <a href="http://www.uberbin.net/archivos/medios/periodismo-de-datos-y-filtraciones-masivas-cuando-la-muerte-es-una-estadistica.php" target="_blank" onclick="urchinTracker('/outgoing/www.uberbin.net/archivos/medios/periodismo-de-datos-y-filtraciones-masivas-cuando-la-muerte-es-una-estadistica.php?referer=');">periodismo de datos y filtraciones masivas – cuando la muerte es una estadística</a>): cuando te mueves más allá de escalas que podamos manejar a un nivel humano, luchas para enganchar a la gente en el tema que estás cubriendo, no importa cuán impresionante sea el gráfico.</p>
<p>Así que después de estar enterrado en información abstracta necesitamos recordar que salir y grabar una entrevista con una persona cuya vida haya sido afectada por la data puede hacer una gran diferencia para propulsar nuestra historia.</p>
<h1>5. PERSONALIZAR</h1>
<p>Uno de los grandes cambios del periodismo online es que abre toda clase de posibilidades alrededor de la interactividad. En cuanto al periodismo de datos eso significa que los usuarios pueden, potencialmente, controlar qué información es presentada a ellos en varias entradas.</p>
<p>Hay algunas formas relativamente bien establecidas de esto. Por ejemplo, cuando un gobierno presenta su último presupuesto, los sitios web de noticias muchas veces invitan al usuario a ingresar sus propios detalles y averiguar cómo el presupuesto los afecta. Una variante reciente de esto fueron <a href="http://www.bbc.co.uk/news/10373060" target="_blank" onclick="urchinTracker('/outgoing/www.bbc.co.uk/news/10373060?referer=');">esos sitios interactivos donde</a> invitaban al usuario a hacer sus propias decisiones de cómo recortarían el deficit (la <a href="http://www.ft.com/cms/s/0/abe91fdc-4e08-11df-b437-00144feab49a.html#axzz1Ru7KsxRG" target="_blank" onclick="urchinTracker('/outgoing/www.ft.com/cms/s/0/abe91fdc-4e08-11df-b437-00144feab49a.html_axzz1Ru7KsxRG?referer=');">versión del Financial Times</a> llevó eso más allá agregando estrategias de partidos y políticas).</p>
<p>Otra forma común es personalización geográfica: el usuario es invitado a entrar su código posta y otra información geográfica para descubrir como un tema en particular está resultando en su lugar de residencia. Una tercera es simplemente “tus intereses”, como demostraron los acercamientos de <a href="http://www.bivingsreport.com/2011/it-takes-a-website-of-millions-popvox-and-the-modern-congress/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+TheBivingsReport+%28The+Bivings+Report%29" target="_blank" onclick="urchinTracker('/outgoing/www.bivingsreport.com/2011/it-takes-a-website-of-millions-popvox-and-the-modern-congress/?utm_source=feedburner_amp_utm_medium=feed_amp_utm_campaign=Feed_3A+TheBivingsReport+_28The+Bivings+Report_29&amp;referer=');">Popvox</a> a engagement político y el <a href="http://www.latimes.com/news/newsmatch/" target="_blank" onclick="urchinTracker('/outgoing/www.latimes.com/news/newsmatch/?referer=');">Newsmatch de LA Times</a>.</p>
<p>Mientras más y más data personal está en manos de sitios de terceros, las posibilidades de personalización se expanden. El ejemplo de ProPublica de arriba demuestra como la información de perfil de Facebook puede ser usada para personalizar automáticamente la experiencia de una historia. Y existen varias<a href="http://m.themediabriefing.com/article/2011-07-11/how-autotrader-proves-the-location-based-mobile-business-model-works?utm_source=newsletter&amp;utm_medium=email&amp;utm_campaign=consumer-mags" target="_blank" onclick="urchinTracker('/outgoing/m.themediabriefing.com/article/2011-07-11/how-autotrader-proves-the-location-based-mobile-business-model-works?utm_source=newsletter_amp_utm_medium=email_amp_utm_campaign=consumer-mags&amp;referer=');">aplicaciones que ofrecen</a> presentar información basada en localización vía GPS.</p>
<p>Esto también indica que puede haber varias formas en las cuales la personalización y estrategias sociales pueden combinarse. Las noticias personalizadas pueden, de muchas maneras, ser usadas como una expresión de nuestra identidad: acá es donde vivo, de esta forma me afecta, en esto estoy interesado. El <a href="http://www.readwriteweb.com/archives/how_media_will_relate_to_facebook_in_the_future.php" target="_blank" onclick="urchinTracker('/outgoing/www.readwriteweb.com/archives/how_media_will_relate_to_facebook_in_the_future.php?referer=');">COO de Facebook predijo</a> que todos los medios van a ser personalizados en 3-5 años; está claro que eso es algo donde las redes sociales nos van a llevar.</p>
<h1>6. UTILIZAR</h1>
<p>La forma más compleja de comunicar los resultados del periodismo de datos es crear algún tipo de herramienta basada en la información. Las calculadoras son opciones populares, así como herramientas con GPS, pero hay un montón de amplitud para aplicaciones más complejas mientras más información está disponible del publisher y el usuario.</p>
<p>Una vez más, hay un entrecruzamiento acá con personalización, pero es posible proveer utilidad sin personalización. Y muy a menudo, la complejidad y barrera consiguiente con respecto a los competidores presenta también oportunidades comerciales.</p>
<p>En Reed Business Information, por ejemplo, su modelo está orientado hacia este tipo de utilidad: atraer usuarios en varios puntos de la cadena de comunicación (actualizaciones online, revistas impresas, noticias móviles) y direccionarlos hacia el punto donde están más cercanos a una decisión de compra. La idea es que mientras más cerca tu información está de su acción, más valiosa es para el usuario.</p>
<p>Crear utilidad de la información es ahora relativamente costoso, pero esos costos están bajando como resultado de la competencia y la estandarización.</p>
<h1>UN MEDIO PARA EXPLORAR</h1>
<p>Lo que todo lo anterior hace evidente es que hay áreas enteras de periodismo online que todavía faltan ser adecuadamente exploradas, y de hecho en la mayoría todavía falta establecer convenciones claras o ideas de buenas prácticas. Esto trata de ser un resumen de aquellas convenciones que están surgiendo pero sería genial agregar más. Mientras tanto acá tienen ambas partes del modelo juntas”:</p>
<p><a href="http://maccur.files.wordpress.com/2011/07/modelo-periodismo-de-datos.png" onclick="urchinTracker('/outgoing/maccur.files.wordpress.com/2011/07/modelo-periodismo-de-datos.png?referer=');"></a></p>
<p><img src="http://maccur.files.wordpress.com/2011/07/modelo-periodismo-de-datos.png?w=540&amp;h=410" alt="" /></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F14%2Fin-spanish-the-inverted-pyramid-of-data-journalism-part-2%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/07/14/in-spanish-the-inverted-pyramid-of-data-journalism-part-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>6 ways of communicating data journalism (The inverted pyramid of data journalism part 2)</title>
		<link>http://onlinejournalismblog.com/2011/07/13/the-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism/</link>
		<comments>http://onlinejournalismblog.com/2011/07/13/the-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism/#comments</comments>
		<pubDate>Wed, 13 Jul 2011 14:00:33 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[case studies]]></category>
		<category><![CDATA[customisation]]></category>
		<category><![CDATA[personalisation]]></category>
		<category><![CDATA[propublica]]></category>
		<category><![CDATA[reed]]></category>
		<category><![CDATA[utility]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14854</guid>
		<description><![CDATA[Last week I published an inverted pyramid of data journalism which attempted to map processes from initial compilation of data through cleaning, contextualising, and combining that. The final stage &#8211; communication &#8211; needed a post of its own, so here it is. UPDATE: Now in Spanish too. Below is a diagram illustrating 6 different types of communication in data journalism.<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/07/13/the-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F13%2Fthe-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F07_2F13_2Fthe-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F13%2Fthe-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Last week I <a href="http://onlinejournalismblog.com/2011/07/07/the-inverted-pyramid-of-data-journalism/">published an inverted pyramid of data journalism</a> which attempted to map processes from initial compilation of data through cleaning, contextualising, and combining that. The final stage &#8211; communication &#8211; needed a post of its own, so here it is.</p>
<p><em>UPDATE: <a href="http://onlinejournalismblog.com/2011/07/14/in-spanish-the-inverted-pyramid-of-data-journalism-part-2/">Now in Spanish too</a>.</em></p>
<p>Below is a diagram illustrating 6 different types of communication in data journalism. (I may have overlooked others, so please let me know if that&#8217;s the case.)</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2011/07/DataJournalism_Communicate1.gif"><img class="alignnone size-full wp-image-14892" title="Data Journalism Communicate" src="http://onlinejournalismblog.com/wp-content/uploads/2011/07/DataJournalism_Communicate1.gif" alt="Communicate: visualised, narrate, socialise, humanise, personalise, utilise" width="485" height="604" /></a></p>
<p>Modern data journalism has grown up alongside an <a href="http://www.web-strategist.com/blog/2011/07/07/infographics-are-useful-but-they-must-evolve/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+WebStrategyByJeremiah+%28Web+Strategy+by+Jeremiah%29" onclick="urchinTracker('/outgoing/www.web-strategist.com/blog/2011/07/07/infographics-are-useful-but-they-must-evolve/?utm_source=feedburner_amp_utm_medium=feed_amp_utm_campaign=Feed_3A+WebStrategyByJeremiah+_28Web+Strategy+by+Jeremiah_29&amp;referer=');">enormous growth in visualisation</a>, and this can sometimes lead us to overlook different ways of telling stories involving big numbers. The intention of the following is to act as a primer for ensuring all options are considered.<br />
<span id="more-14854"></span></p>
<h2>1. Visualisation</h2>
<p>Visualisation is the quickest way to communicate the results of data journalism: free tools such as Google Docs allow it with a single click; more powerful tools like Many Eyes only require the user to paste their raw data and select from a range of visualisation options.</p>
<p>But ease does not equal effectiveness. The rise of <a href="http://krugman.blogs.nytimes.com/2011/05/12/chartjunk/" onclick="urchinTracker('/outgoing/krugman.blogs.nytimes.com/2011/05/12/chartjunk/?referer=');">chartjunk</a> illustrates that visualisation is not immune to churnalism or spectacle without insight.</p>
<p>There is a rich history of print visualisation which remains relevant to the generation of online infographics: focusing on no more than 4 data points; avoiding 3D and ensuring the graphic is self-sufficient are just some.</p>
<p><a href="http://junkcharts.typepad.com/junk_charts/2010/05/junk-charts-talk.html" onclick="urchinTracker('/outgoing/junkcharts.typepad.com/junk_charts/2010/05/junk-charts-talk.html?referer=');">Kaiser Fung&#8217;s trifecta</a> is one useful reference point for ensuring a visualisation is effective, and <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/senators_and_he.html" onclick="urchinTracker('/outgoing/www.stat.columbia.edu/_cook/movabletype/archives/2009/11/senators_and_he.html?referer=');">this explanation of how a chart was transformed into something that could be used in a newspaper</a> is also instructive (<a href="http://junkcharts.typepad.com/junk_charts/2009/11/worthy-of-the-times.html" onclick="urchinTracker('/outgoing/junkcharts.typepad.com/junk_charts/2009/11/worthy-of-the-times.html?referer=');">summarised by Kaiser Fung here</a>).</p>
<p>In short: it&#8217;s not a simple process.</p>
<p>Visualisation has one major advantage which makes that effort worthwhile, however: it can make communication incredibly effective. And it can provide a method of distributing your content which cannot be matched by the other types of communication listed here.</p>
<p>But its major strength is also its main weakness: the instant nature of infographics also means that people often do not spend much time looking at it. It makes it very effective for distribution, but not for engagement, and so it is worth thinking strategically about 1) making sure the image contains a link back to its source; and 2) making sure that there is something more at the source when people arrive.</p>
<h2>2. Narration</h2>
<p>A traditional article can struggle to contain the sort of numbers that data journalism tends to turf up, but it still provides an accessible way for people to understand the story &#8211; if done well.</p>
<p>There are <a href="http://www.press.uchicago.edu/ucp/books/book/chicago/C/bo3636131.html" onclick="urchinTracker('/outgoing/www.press.uchicago.edu/ucp/books/book/chicago/C/bo3636131.html?referer=');">books providing useful guidance on how to write with numbers most clearly</a> &#8211; and <a href="http://www.useit.com/alertbox/writing-numbers.html" onclick="urchinTracker('/outgoing/www.useit.com/alertbox/writing-numbers.html?referer=');">some guidance for web writing too</a> (you should use numerals rather than words, as this helps people who are scanning the page).</p>
<p>As with visualisation, less is often more. But also, as in most narrative, you need to think about meaningfulness and your objectives in communicating these numbers.</p>
<p>Abstract amounts can be impressive, but meaningless and useless. What does it mean that £10m has been spent on something? Is that more or less than usual? More or less than something similar?</p>
<p>Try to bring down amounts to manageable quantities &#8211; the amount per person, or per day, for example.</p>
<p>Finally, use editing to focus in on the essentials: and make sure you link to the whole.</p>
<h2>3. Social communication</h2>
<p>Communication is a social act, and the success of infographics across social media is a testament to that. But it&#8217;s not just infographics that are social &#8211; data is too. The Guardian has demonstrated this particularly successfully with the cultivation of a healthy community around its Data Blog (which enjoys higher stickiness than the average Guardian article), and around its API.</p>
<p>Crowdsourcing initiatives aimed at gathering data can also provide a social dimension to the data. The Guardian are, again, pioneers here, with <a href="http://mps-expenses.guardian.co.uk/" onclick="urchinTracker('/outgoing/mps-expenses.guardian.co.uk/?referer=');">their MPs&#8217; expenses project</a> and <a href="http://www.guardian.co.uk/technology/blog/poll/2010/jan/26/apple-tablet-crowdsource-specifications" onclick="urchinTracker('/outgoing/www.guardian.co.uk/technology/blog/poll/2010/jan/26/apple-tablet-crowdsource-specifications?referer=');">Charles Arthur&#8217;s attempt to crowdsource predictions about the specifications of the iPad</a>. But there are other examples, too &#8211; especially <a href="http://onlinejournalismblog.com/2010/09/20/when-crowdsourcing-is-your-only-option/">when it is difficult to obtain the data any other way</a>.</p>
<p>The connectivity of the web presents new opportunities to present data journalism in a social way. <a href="http://onlinejournalismblog.com/2011/07/04/can-we-go-beyond-share-on-facebook/">ProPublica&#8217;s app that provides results based on your Facebook profile</a> (schools attended; friends who have used the app) is one example of how data journalism can leverage social data, and, equally, how communicating the results of data journalism can be geared around social dynamics, using elements such as quizzes, sharing, competition, campaigning and collaboration. We are barely at the start of this aspect of online journalism.</p>
<h2>4. Humanise</h2>
<p>Broadcast news reports often use case studies to get around the problem of presenting numbers-based stories on television and radio. If waiting times have increased, speak to someone who had to wait a long time for an operation. In other words, humanise the numbers.</p>
<p>More recently the growth of computer-generated motion graphics has relaxed that pressure somewhat, as presenters can call on powerful animation to illustrate a story.</p>
<p><embed height="350" width="425" wmode="transparent" allowfullscreen="true" type="application/x-shockwave-flash" src="http://www.youtube.com/v/_n4gnl&amp;rel=1&amp;fs=1&amp;showsearch=0"/></p>
<p>But once again, the point of making stories relevant to people comes through. As I wrote in <a title="Permanent Link to One ambassador’s embarrassment is a tragedy, 15,000 civilian deaths is a statistic" rel="bookmark" href="http://onlinejournalismblog.com/2010/12/07/wikileaks-cablegate/">One ambassador’s embarrassment is a tragedy, 15,000 civilian deaths is a statistic</a>: when you move beyond scales we can deal with on a human level, you struggle to engage people in the issue you are covering &#8211; no matter how impressive the motion graphics (that post outlines some other considerations in humanising stories, such as ensuring that case studies are representative).</p>
<p>So after being buried in abstract data we need to remember that going out and recording an interview with a person whose life has been affected by that data can make a big difference to the power of our story.</p>
<h2>5. Personalise</h2>
<p>One of the biggest changes in journalism&#8217;s move online is that it <a href="http://onlinejournalismblog.com/2008/04/15/basic-principles-of-online-journalism-i-is-for-interactivity/">opens up all sorts of possibilities around interactivity</a>. When it comes to data journalism that means that the user can, potentially, control what information is presented to them based on various inputs.</p>
<p>There are some relatively well-established forms of this. For example, when a government presents its latest budget, news websites often <a href="http://www.bbc.co.uk/news/business-12773565" onclick="urchinTracker('/outgoing/www.bbc.co.uk/news/business-12773565?referer=');">invite the user to input their own details</a> (for example, their earnings, or their family make up) to find out how the budget affects them. A recent variant of this are those interactives which invite the user to <a href="http://www.bbc.co.uk/news/10373060" onclick="urchinTracker('/outgoing/www.bbc.co.uk/news/10373060?referer=');">make their own decisions on how they might cut the deficit</a> (<a href="http://www.ft.com/cms/s/0/abe91fdc-4e08-11df-b437-00144feab49a.html#axzz1Ru7KsxRG" onclick="urchinTracker('/outgoing/www.ft.com/cms/s/0/abe91fdc-4e08-11df-b437-00144feab49a.html_axzz1Ru7KsxRG?referer=');">the FT&#8217;s version took this further, adding in party strategies and policies</a>).</p>
<p>Another common form is geographical personalisation: the user is <a href="http://www.guardian.co.uk/society/2006/nov/23/health.newmedia" onclick="urchinTracker('/outgoing/www.guardian.co.uk/society/2006/nov/23/health.newmedia?referer=');">invited to enter their postcode</a>, zip code or other geographical information to find out how a particular issue is playing out in their home town.</p>
<p>A third is simply &#8216;your interests&#8217;, as demonstrated by <a href="http://www.bivingsreport.com/2011/it-takes-a-website-of-millions-popvox-and-the-modern-congress/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+TheBivingsReport+%28The+Bivings+Report%29" onclick="urchinTracker('/outgoing/www.bivingsreport.com/2011/it-takes-a-website-of-millions-popvox-and-the-modern-congress/?utm_source=feedburner_amp_utm_medium=feed_amp_utm_campaign=Feed_3A+TheBivingsReport+_28The+Bivings+Report_29&amp;referer=');">Popvox&#8217;s approach to political engagement</a> and the <a href="http://www.latimes.com/news/newsmatch/" onclick="urchinTracker('/outgoing/www.latimes.com/news/newsmatch/?referer=');">LA Times&#8217; Newsmatch</a>.</p>
<p>As more and more personal data is held by third party sites, the possibilities for personalisation expand. The <a href="http://onlinejournalismblog.com/2011/07/04/can-we-go-beyond-share-on-facebook/">ProPublica example</a> given above, for example, demonstrates how Facebook profile information can be used to automatically personalise the experience of a story. And there are various apps that offer to <a href="http://m.themediabriefing.com/article/2011-07-11/how-autotrader-proves-the-location-based-mobile-business-model-works?utm_source=newsletter&amp;utm_medium=email&amp;utm_campaign=consumer-mags" onclick="urchinTracker('/outgoing/m.themediabriefing.com/article/2011-07-11/how-autotrader-proves-the-location-based-mobile-business-model-works?utm_source=newsletter_amp_utm_medium=email_amp_utm_campaign=consumer-mags&amp;referer=');">present information based on location data</a> provided via GPS.</p>
<p>This also indicates that there may be various ways in which personalisation and social strategies might be combined. Personalised stories can, in many ways, be used as an expression of our identity: this is where I live; this is how I am affected; this is what I&#8217;m interested in.</p>
<p>And when the COO of Facebook is <a href="http://www.readwriteweb.com/archives/how_media_will_relate_to_facebook_in_the_future.php" onclick="urchinTracker('/outgoing/www.readwriteweb.com/archives/how_media_will_relate_to_facebook_in_the_future.php?referer=');">predicting that all media will be personalised in 3-5 years</a>, it&#8217;s clear that this is something the social networks are going to drive towards too.</p>
<h2>6. Utilise</h2>
<p>The most complex way of communicating the results of data journalism is to create some sort of tool based on the data. Calculators are popular choices, as are GPS-driven tools, but there is a lot of scope for more complex applications as more data becomes available both from the publisher and the user.</p>
<p>Again, there is overlap here with personalisation &#8211; but it is possible to provide utility without personalisation. And quite often, the complexity and consequent barrier to competitors presents commercial opportunities too.</p>
<p>At Reed Business Information, for example, their model is geared towards this sort of utility: attracting users at various points of the communication chain &#8211; online updates, printed magazines, mobile news &#8211; and steering them towards the point where they are closest to a purchasing decision. The idea is that the closer your information is to their action, the more valuable it is to the user.</p>
<p>Creating utility from data is currently relatively costly &#8211; but those costs are going down as a result of competition and standardisation. For example, as increasing numbers of news organisations adopt standard ways of storing story data (e.g. XML files), it is easier to create apps that pull data from datasets. Meanwhile, app creation becomes increasingly templated (in many ways you can see the process following a similar path to that of web design) and platform independent.</p>
<h2>A medium up for grabs</h2>
<p>What all of the above makes apparent &#8211; and I may have missed other methods of communicating data journalism (please let me know if you can think of any) &#8211; is that there are whole areas of online journalism that have yet to be properly explored, and certainly most have yet to establish clear conventions or ideas of best practice.</p>
<p>I&#8217;ve tried to scope out an overview of those conventions that are emerging, and the best practice that&#8217;s currently available, but it would be great if you could add more. What makes for good humanisation? Utility? What are great examples of personalisation or data journalism that involves a social dimension? Comments below please.</p>
<p>Meanwhile, here are both parts of the model shown together (click to magnify):</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2011/07/DataJournalism_5Cs6comm.gif"><img class="alignnone size-full wp-image-14903" title="The inverted pyramid of data journalism and data journalism communication pyramid" src="http://onlinejournalismblog.com/wp-content/uploads/2011/07/DataJournalism_5Cs6comm.gif" alt="The inverted pyramid of data journalism and data journalism communication pyramid" width="492" height="383" /></a></p>
<p>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F13%2Fthe-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/07/13/the-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Visualising Twitter Friend Connections Using Gephi: An Example Using the @WiredUK Friends Network</title>
		<link>http://blog.ouseful.info/2011/07/07/visualising-twitter-friend-connections-using-gephi-an-example-using-wireduk-friends-network/</link>
		<comments>http://blog.ouseful.info/2011/07/07/visualising-twitter-friend-connections-using-gephi-an-example-using-wireduk-friends-network/#comments</comments>
		<pubDate>Thu, 07 Jul 2011 09:30:49 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[Uncourse]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5770</guid>
		<description><![CDATA[To corrupt a well known saying, &#8220;cook a man a meal and he&#8217;ll eat it; teach a man a recipe, and maybe he&#8217;ll cook for you&#8230;&#8221;, I thought it was probably about time I posted the recipe I&#8217;ve been using for laying out Twitter friends networks using Gephi, not least because I&#8217;ve been generating quite [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5770&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>To corrupt a well known saying, &#8220;cook a man a meal and he&#8217;ll eat it; teach a man a recipe, and maybe he&#8217;ll cook for you&#8230;&#8221;, I thought it was probably about time I posted the recipe I&#8217;ve been using for laying out Twitter friends networks using Gephi, not least because I&#8217;ve been generating quite a few network files for folk lately, giving them copies, and then not having a tutorial to point them to. So here&#8217;s that tutorial&#8230;</p>
<p>The starting point is actually quite a long way down the &#8220;how did you that?&#8221; chain, but I have to start somewhere, and the middle&#8217;s easier than the beginning, so that&#8217;s where we&#8217;ll step in (I&#8217;ll give some clues as to how the beginning works at the end&#8230;;-)</p>
<p>Here&#8217;s what we&#8217;ll be working towards: a diagram that shows how the people on Twitter that @wiredUK follows follow each other:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911292723/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911292723/?referer=');"><img src="http://farm7.static.flickr.com/6028/5911292723_05974d6136.jpg" width="500" height="407" alt="@wireduk innerfriends" /></a></p>
<p>The tool we&#8217;re going to use to layout this graph from a data file is a free, extensible, open source, cross platform Java based tool called <a href="http://gephi.org" onclick="urchinTracker('/outgoing/gephi.org?referer=');">Gephi</a>. If you want to play along, <a href="http://dl.dropbox.com/u/1156404/wiredUK-friends_innerfriendsNet_2011-07-05-18-56-19.gdf" onclick="urchinTracker('/outgoing/dl.dropbox.com/u/1156404/wiredUK-friends_innerfriendsNet_2011-07-05-18-56-19.gdf?referer=');">download the datafile</a>. (Or try with a network of your own, such as <a href="http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/?referer=');">your Facebook network</a>.)</p>
<p>From the Gephi file menu, <tt>Open</tt> the appropriate graph file:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909853786/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909853786/?referer=');"><img src="http://farm7.static.flickr.com/6025/5909853786_59eaacd3b5.jpg" width="326" height="327" alt="Gephi - file open" /></a></p>
<p>Import the file as a <em>Directed Graph</em>:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909333629/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909333629/?referer=');"><img src="http://farm7.static.flickr.com/6018/5909333629_1f66262787.jpg" width="500" height="358" alt="Gephi - import directed graph" /></a></p>
<p>The <em>Graph</em> window displays the graph in a raw form:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909900136/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909900136/?referer=');"><img src="http://farm6.static.flickr.com/5313/5909900136_8b3cb08a7a.jpg" width="500" height="370" alt="Gephi -graph view of imported graph" /></a></p>
<p>Sometimes a graph may contain nodes that are not connected to any other nodes. (For example, protected Twitter accounts do not publish &#8211; and are not published in &#8211; friends or followers lists publicly via the Twitter API.) Some layout algorithms may push unconnected nodes far away from the rest of the graph, which can affect generation of presentation views of the network, so we need to filter out these unconnected nodes. The easiest way of doing this is to filter the graph using the <em>Giant Component</em> filter.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909926042/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909926042/?referer=');"><img src="http://farm7.static.flickr.com/6015/5909926042_8b6eb7ef76.jpg" width="500" height="284" alt="Gephi - filter on Giant Component" /></a></p>
<p>To colour the graph, I often make us of the <em>modularity</em> statistic. This algorithm attempts to find clusters in the graph by identifying components that are highly interconnected.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909381781/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909381781/?referer=');"><img src="http://farm6.static.flickr.com/5039/5909381781_a60b46a0a7.jpg" width="500" height="270" alt="Gephi - modularity statistic" /></a></p>
<p>This algorithm is a random one, so it&#8217;s often worth running it several times to see how many communities typically get identified.</p>
<p>A brief report is displayed after running the statistic:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909946272/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909946272/?referer=');"><img src="http://farm7.static.flickr.com/6041/5909946272_b44cfc1c58.jpg" width="500" height="429" alt="Gephi - modularity statistic report" /></a></p>
<p>While we have the Statistics panel open, we can take the opportunity to run another measure: <em>the HITS algorithm</em>. This generates the well known Authority and Hub values which we can use to size nodes in the graph.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909954074/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909954074/?referer=');"><img src="http://farm6.static.flickr.com/5234/5909954074_981857b669.jpg" width="500" height="253" alt="Gephi - HITS statistic" /></a></p>
<p>The next step is to actually colour the graph. In the <em>Partition</em> panel, refresh the partition options list and then select <tt>Modularity Class</tt>.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909401575/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909401575/?referer=');"><img src="http://farm7.static.flickr.com/6014/5909401575_3d78c92c94.jpg" width="499" height="222" alt="Gephi - select modularity partition" /></a></p>
<p>Choose appropriate colours (right click on each colour panel to select an appropriate colour for each class &#8211; I often select pastel colours) and apply them to the graph.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5909974606/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5909974606/?referer=');"><img src="http://farm7.static.flickr.com/6045/5909974606_e7ddd042e8.jpg" width="500" height="244" alt="Gephi - colour nodes by modularity class" /></a></p>
<p>The next thing we want to do is lay out the graph. The Layout panel contains several different layout algorithms that can be used to support the visual analysis of the structures inherent in the network; (try some of them &#8211;  each works in a slightly different way; some are also better than others for coping with large networks). For a network this size and this densely connected,I&#8217;d typically start out with one of the force directed layouts, that positions nodes according to how tightly linked they are to each other.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911195803/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911195803/?referer=');"><img src="http://farm6.static.flickr.com/5038/5911195803_dc8ba7e22c.jpg" width="500" height="177" alt="Gephi select a layout" /></a></p>
<p>When you select the layout type, you will notice there are several parameters you can play with. The default set is often a good place to start&#8230;</p>
<p>Run the layout tool and you should see the network start to lay itself out. Some algorithms require you to actually Stop the layout algorithm; others terminate themselves according to a stopping criterion, or because they are a &#8220;one-shot&#8221; application (such as the Expansion algorithm, which just scales the x and y values by a given factor).</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911761740/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911761740/?referer=');"><img src="http://farm6.static.flickr.com/5238/5911761740_08fc360e01.jpg" width="500" height="231" alt="Gephi - forceAtlas 2" /></a></p>
<p>We can zoom in and out on the layout of the graph using a mouse wheel (on my MacBook trackpad, I use a two finger slide up and down), or use the zoom slider from the &#8220;More options&#8221; tab:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911774168/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911774168/?referer=');"><img src="http://farm7.static.flickr.com/6034/5911774168_2d160522ee.jpg" width="500" height="417" alt="Gephi zoom" /></a></p>
<p>To see which Twitter ID each node corresponds to, we can turn on the labels:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911778962/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911778962/?referer=');"><img src="http://farm7.static.flickr.com/6039/5911778962_654537196c.jpg" width="500" height="385" alt="Gephi - labels" /></a></p>
<p>This view is very cluttered &#8211; the nodes are too close to each other to see what&#8217;s going on. The labels and the nodes are also all the same size, giving the same visual weight to each node and each label. One thing I like to do is resize the nodes relative to some property, and then scale the label size to be proportional to the node size.</p>
<p>Here&#8217;s how we can scale the node size and then set the text label size to be proportional to node size. In the Ranking panel, select the node size property, and the attribute you want to make the size proportional to. I&#8217;m going to use Authority, which is a network property that we calculated when we ran the HITS algorithm. Essentially, it&#8217;s a measure of how well linked to a node is.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911784464/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911784464/?referer=');"><img src="http://farm6.static.flickr.com/5274/5911784464_98a7b1fae9.jpg" width="438" height="274" alt="Gephi - node sizing" /></a></p>
<p>The min size/max size slider lets us define the minimum and maximum node sizes. By default, a linear mapping from attribute value to size is used, but the <em>spline</em> option lets us use a non-linear mappings.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911228445/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911228445/?referer=');"><img src="http://farm7.static.flickr.com/6045/5911228445_4695cdeb70.jpg" width="500" height="288" alt="Gephi - node sizing spilne" /></a></p>
<p>I&#8217;m going with the default linear mapping&#8230;</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911800428/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911800428/?referer=');"><img src="http://farm6.static.flickr.com/5316/5911800428_696351686b.jpg" width="500" height="193" alt="Gephi - size nodes" /></a></p>
<p>We can now scale the labels according to node size:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911798924/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911798924/?referer=');"><img src="http://farm6.static.flickr.com/5032/5911798924_6d6a7d2a02.jpg" width="500" height="347" alt="Gephi - scale labels" /></a></p>
<p>Note that you can continue to use the text size slider to scale the size of all the displayed labels together.</p>
<p>This diagram is now looking quite cluttered &#8211; to make it easier to read, it would be good if we could spread it out a bit. The Expansion layout algorithm can help us do this:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911805150/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911805150/?referer=');"><img src="http://farm7.static.flickr.com/6054/5911805150_948ef39e27.jpg" width="500" height="228" alt="Gephi - expansion" /></a></p>
<p>A couple of other layout algorithms that are often useful: the Transformation layout algorithm lets us scale the x and y axes independently (compared to the Expansion algorithm, which scales both axes by the same amount); and the Clockwise Rotate and Counter-Clockwise Rotate algorithm lets us rotate the whole layout (this can be useful if you want to rotate the graph so that it fits neatly into a landscape view.</p>
<p>The expanded layout is far easier to read, but some of the labels still overlap. The Label Adjust layout tool can jiggle the nodes so that they don&#8217;t overlap.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911815714/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911815714/?referer=');"><img src="http://farm6.static.flickr.com/5316/5911815714_679fdf3523.jpg" width="500" height="229" alt="gephi - label adjust" /></a></p>
<p>(Note that you can also move individual nodes by clicking on them and dragging them.)</p>
<p>So &#8211; nearly there&#8230; The final push is to generate a good quality output. We can do this from the preview window:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911258267/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911258267/?referer=');"><img src="http://farm6.static.flickr.com/5031/5911258267_88d6e229ff.jpg" width="500" height="219" alt="Gephi preview window" /></a></p>
<p>The preview window is where we can generate good quality SVG renderings of the graph. The node size, colour and scaled label sizes are determined in the original Overview area (the one we were working in), although additional customisations are possible in the Preview area.</p>
<p>To render our graph, I just want to make a couple of tweaks to the original Default preview settings: <em>Show Labels</em> and set the base font size.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911826508/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911826508/?referer=');"><img src="http://farm7.static.flickr.com/6008/5911826508_c617e24ea4.jpg" width="500" height="391" alt="Gephi - preview settings" /></a></p>
<p>Click on the Refresh button to render the graph:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911828682/" title="Gephi - preview refresh by psychemedia, on Flickr" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911828682/?referer=');"><img src="http://farm6.static.flickr.com/5238/5911828682_e3c754eb6f_z.jpg" width="640" height="366" alt="Gephi - preview refresh"></a></p>
<p>Oops &#8211; I overdid the font size&#8230; let&#8217;s try again:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911831064/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911831064/?referer=');"><img src="http://farm6.static.flickr.com/5239/5911831064_f6960c9dee.jpg" width="500" height="237" alt="gephi - preview resize" /></a></p>
<p>Okay &#8211; so that&#8217;s a good start. Now I find I often enter into a dance between the Preview ad Overview panels, tweaking the layout until I get something I&#8217;m satisfied with, or at least, that&#8217;s half-way readable.</p>
<p><em>How</em> to read the graph is another matter of course, though by using colour, sizing and placement, we can hopefully draw out in a visual way some interesting properties of the network. The recipe described above, for example, results in a view of the network that shows:</p>
<p>- groups of people who are tightly connected to each other, as identified by the modularity statistic and consequently group colour; this often defines different sorts of interest groups. (<a href="http://blog.ouseful.info/2011/06/11/a-map-of-my-twitter-follower-network/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/06/11/a-map-of-my-twitter-follower-network/?referer=');">My follower network</a> shows distinct groups of people from the Open University, and JISC, the HE library and educational technology sectors, UK opendata and data journalist types, for example.)<br />
- people who are well connected in the graph, as displayed by node and label size.</p>
<p>Here&#8217;s my final version of the @wiredUK &#8220;inner friends&#8221; network:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5911292723/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5911292723/?referer=');"><img src="http://farm7.static.flickr.com/6028/5911292723_05974d6136.jpg" width="500" height="407" alt="@wireduk innerfriends" /></a></p>
<p>You can probably do better though&#8230;;-)</p>
<p>To recap, here&#8217;s the recipe again:</p>
<blockquote><p>- filter on connected component (private accounts don&#8217;t disclose friend/follower detail to the api key i use) to give a connected graph;<br />
- run the modularity statistic to identify clusters; sometimes I try several attempts<br />
- colour by modularity class identified in previous step, often tweaking colours to use pastel tones<br />
- I often use a force directed layout, then Expansion to spread to network out a bit if necessary; the Clockwise Rotate or Counter-Clockwise rotate will rotate the network view; I often try to get a landscape format; the Transformation layout lets you expand or contract the graph along a single axis, or both axes by different amounts.<br />
- run HITS statistic and size nodes by authority<br />
- size labels proportional to node size<br />
- use label adjust and expand to to tweak the layout<br />
- use preview with proportional labels to generate a nice output graph<br />
- iterate previous two steps to a get a layout that is hopefully not completely unreadable&#8230;</p></blockquote>
<p><em>Got that?!;-)</em></p>
<p>Finally, to the return beginning. The recipe I use to generate the data is as follows:</p>
<ol>
<li>grab a list of twitter IDs (call it <em>L</em>); there are several ways of doing this, for example: obtain a list of tweets on a particular topic by searching for a particular hashtag, then grab the set of unique IDs of people using the hashtag; grab the IDs of the members of one or more Twitter lists; grab the IDs of people following or followed by a particular person; grab the IDs of people sending geo-located tweets in a particular area;</li>
<li>for each person <em>P</em> in <em>L</em>, add them as a node to a graph;</li>
<li>for each person <em>P</em> in <em>L</em>, get a list of people followed by the corresponding person, e.g. <em>Fr(P)</em></li>
<li>for each <em>X</em> in e.g. <em>Fr(P)</em>: if <em>X</em> in <em>Fr(P)</em> and <em>X</em> in <em>L</em>, create an edge <em>[P,X]</em> and add it to the graph</li>
<li>save the graph in a format that can be visualised in Gephi.</li>
</ol>
<p>To make this recipe, I use Tweepy and a Python script to call the Twitter API and get the friends lists from there, but you could use the Google Social API to get the same data. There&#8217;s an example of calling that API using Javscript in my &#8220;live&#8221; Twitter friends visualisation script (<a href="http://blog.ouseful.info/2011/04/12/using-protovis-to-visualise-connections-between-people-tweeting-a-particular-term/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/04/12/using-protovis-to-visualise-connections-between-people-tweeting-a-particular-term/?referer=');">Using Protovis to Visualise Connections Between People Tweeting a Particular Term</a>) as well as in the <a href="http://blog.ouseful.info/2011/05/30/a-bit-of-newsjam-mojo-socialgeo-twitter-map/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/05/30/a-bit-of-newsjam-mojo-socialgeo-twitter-map/?referer=');">A Bit of NewsJam MoJo – SocialGeo Twitter Map</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5770/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5770/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5770/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5770/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5770/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5770/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5770/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5770/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5770/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5770/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5770/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5770/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5770/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5770/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5770/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5770/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5770/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5770/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5770/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5770/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5770/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5770&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/07/07/visualising-twitter-friend-connections-using-gephi-an-example-using-wireduk-friends-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6028/5911292723_05974d6136.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6025/5909853786_59eaacd3b5.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6018/5909333629_1f66262787.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5313/5909900136_8b3cb08a7a.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6015/5909926042_8b6eb7ef76.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5039/5909381781_a60b46a0a7.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6041/5909946272_b44cfc1c58.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5234/5909954074_981857b669.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6014/5909401575_3d78c92c94.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6045/5909974606_e7ddd042e8.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5038/5911195803_dc8ba7e22c.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5238/5911761740_08fc360e01.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6034/5911774168_2d160522ee.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6039/5911778962_654537196c.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5274/5911784464_98a7b1fae9.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6045/5911228445_4695cdeb70.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5316/5911800428_696351686b.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5032/5911798924_6d6a7d2a02.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6054/5911805150_948ef39e27.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5316/5911815714_679fdf3523.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5031/5911258267_88d6e229ff.jpg" length="" type="" />
<enclosure url="http://farm7.static.flickr.com/6008/5911826508_c617e24ea4.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5238/5911828682_e3c754eb6f_z.jpg" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5239/5911831064_f6960c9dee.jpg" length="" type="" />
		</item>
		<item>
		<title>First Play With R and R-Studio – F1 Lap Time Box Plots</title>
		<link>http://blog.ouseful.info/2011/05/05/first-play-with-r-and-r-studio-f1-lap-time-box-plots/</link>
		<comments>http://blog.ouseful.info/2011/05/05/first-play-with-r-and-r-studio-f1-lap-time-box-plots/#comments</comments>
		<pubDate>Thu, 05 May 2011 13:56:03 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[tony hirst]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5416</guid>
		<description><![CDATA[Last summer, at the European Centre for Journalism round table on data driven journalism, I remember saying something along the lines of &#8220;your eyes can often do the stats for you&#8221;, the implication being that our perceptual apparatus is good at pattern detection, and can often see things in the data that most of us [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5416&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last summer, at the European Centre for Journalism round table on data driven journalism, I remember saying something along the lines of &#8220;your eyes can often do the stats for you&#8221;, the implication being that our perceptual apparatus is good at pattern detection, and can often see things in the data that most of us would miss using the very limited range of statistical tools that we are either aware of, or are comfortable using.</p>
<p>I don&#8217;t know how good a statistician you need to be to distinguish between Anscombe&#8217;s quartet, but the differences are obvious to the eye:</p>
<div class="wp-caption alignnone" style="width: 690px"><a href="http://en.wikipedia.org/wiki/Anscombe's_quartet" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/Anscombe_s_quartet?referer=');"><img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Anscombe's_quartet_3.svg/800px-Anscombe's_quartet_3.svg.pngA" width="670" height="500" /></a><p class="wp-caption-text">Anscombe&#039;s quartet /via Wikipedia</p></div>
<p>Another shamistician (h/t @daveyp) heuristic (or maybe it&#8217;s a crapistician rule of thumb?!) might go something along the lines of: &#8220;if you use the right visualisations, you don&#8217;t necessarily need to do any statistics yourself&#8221;. In this case, the implication is that if you choose a viualisation technique that embodies or implements a statistical process in some way, the maths is done for you, and you get to see what the statistical tool has uncovered.</p>
<p>Now I know that as someone working in education, I&#8217;m probably supposed to uphold the &#8220;should learn it properly&#8221; principle&#8230; But needing to know statistics in order to benefit from the use of statistical tools seems to me to be a massive barrier to entry in the use of this technology (statistics is a technology&#8230;) You just need to know how to use the technology appropriately, or at least, not use it &#8220;dangerously&#8221;&#8230;</p>
<p>So to this end (&#8220;democratising access to technology&#8221;), I thought it was about time I started to play with R, the statistical programming language (and rival to SPSS?) that appears to have a certain amount of traction at the moment given the number of books about to come out around it&#8230; R is a command line language, but the recently released <a href="http://www.rstudio.org/" onclick="urchinTracker('/outgoing/www.rstudio.org/?referer=');">R-Studio</a> seems to offer an easier way in, so I thought I&#8217;d go with that&#8230;</p>
<p>Flicking through <a href="http://www.amazon.co.uk/First-Course-Statistical-Programming/dp/0521694248?tag=ouseful-21" onclick="urchinTracker('/outgoing/www.amazon.co.uk/First-Course-Statistical-Programming/dp/0521694248?tag=ouseful-21&amp;referer=');">A First Course in Statistical Programming with R</a>, a book I bought a few weeks ago in the hope that the osmotic reading effect would give me some idea as to what it&#8217;s possible to do with R, I found a command line example showing how to create a simple box plot (box and whiskers plot) that I could understand enough to feel confident I could change&#8230;</p>
<p>Having an F1 data set/CSV file to hand (laptimes and <a href="http://www.raeng.org.uk/education/diploma/maths/pdf/exemplars_advanced/14_Car_Racing.pdf" onclick="urchinTracker('/outgoing/www.raeng.org.uk/education/diploma/maths/pdf/exemplars_advanced/14_Car_Racing.pdf?referer=');">fuel adjusted laptimes</a>) from the China 2001 grand prix, I thought I&#8217;d see how easy it was to just dive in&#8230; And it was 2 minutes easy&#8230; (If you want to play along, <a href="https://gist.github.com/957008" onclick="urchinTracker('/outgoing/gist.github.com/957008?referer=');">here&#8217;s the data file</a>).</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/rstudio-f1-laptimes-demo.jpg?w=700&#038;h=540" alt="" width="700" height="540" class="alignnone size-full wp-image-5425" /></p>
<p>Here&#8217;s the command I used:<br />
<tt>boxplot(Lap.Time ~ Driver, data=lapTimeFuel)</tt></p>
<p>Remembering a comment in a Making up the Numbers blogpost (<a href="http://f1numbers.wordpress.com/2010/03/16/driver-consistency-bahrain-2010/" onclick="urchinTracker('/outgoing/f1numbers.wordpress.com/2010/03/16/driver-consistency-bahrain-2010/?referer=');">Driver Consistency – Bahrain 2010</a>) about the effect on laptime distributions from removing opening, in and out lap times, a  quick Google turned up a way of quickly stripping out slow times. (This isn&#8217;t as clean as removing the actual opening, in and out lap times &#8211; it also removes mistake laps, for example, but I&#8217;m just exploring, right? Right?!;-)</p>
<p><tt>lapTime2 &lt;- subset(lapTimeFuel, Lap.Time &lt; 110.1)</tt></p>
<p>I could then plot the distribution in the reduced <em>lapTime2</em> dataset by changing the original boxplot command to use (<tt>data=lapTime2</tt>). (Note that as with many interactive editors, using your keyboard&#8217;s up arrow displays previously entered commands in the current command line; so you can re-enter a previously entered command by hitting the up arrow a few times, then entering return. You can also edit the current command line, using the left and right arrow keys to move the cursor, and the delete key to delete text.)</p>
<p>Prior programming experience suggests this should also work&#8230;</p>
<p><tt>boxplot(Lap.Time ~ Driver, data=subset(lapTimeFuel, Lap.Time &lt; 110))</tt></p>
<p>Something else I tried was to look at the distribution of fuel weight adjusted laptimes (where the time penalty from the weight of the fuel in the car is removed):</p>
<p><tt>boxplot(Fuel.Adjusted.Laptime ~ Driver, data=lapTimeFuel)</tt></p>
<p>Looking at the <a href="http://www.rstudio.org/docs/release_notes_v0.93" onclick="urchinTracker('/outgoing/www.rstudio.org/docs/release_notes_v0.93?referer=');">release notes</a> for the latest version of R-Studio suggests that you can build interactive controls into your plots (a bit like Mathematica supports?). The example provided shows how to change the x-range on a plot:<br />
<tt>manipulate(<br />
  plot(cars, xlim=c(0,x.max)),<br />
  x.max=slider(15,25))</tt></p>
<p>Hmm&#8230; can we set the filter value dynamically I wonder?</p>
<p><tt>manipulate(<br />
boxplot(Lap.Time ~ Driver, data=subset(lapTimeFuel, Lap.Time &lt; maxval)),<br />
 maxval=slider(100,140))</tt></p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/r-studio-manipulate-interactive-component.png?w=700&#038;h=396" alt="" width="700" height="396" class="alignnone size-full wp-image-5426" /></p>
<p>Seems like it&#8230;?:-) We can also combine interactive controls:</p>
<p><tt>manipulate(boxplot(Lap.Time ~ Driver, data=subset(lapTimeFuel, Lap.Time &lt; maxval),outline=outline),maxval=slider(100,140),outline = checkbox(FALSE, &quot;Show outliers&quot;))</tt></p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/r-studio-combining-interactive-controls.png?w=700&#038;h=372" alt="" width="700" height="372" class="alignnone size-full wp-image-5427" /></p>
<p>Okay &#8211; that&#8217;s enough for now&#8230; I reckon that with a handful of commands on a crib sheet, you can probably get quite a lot of chart plot visualisations done, as well as statistical visualisations, in the R-Studio environment; it also seems easy enough to build in interactive controls that let you play with the data in a visually interactive way&#8230;</p>
<p>The trick comes from choosing visual statistics approaches to analyse your data that don&#8217;t break any of the assumptions about the data that the particular statistical approach relies on in order for it to be applied in any sensible or meaningful way.</p>
<p>[This blog post is written, in part, as a way for me to try to come up with something to say at the OU Statistics Group's one day conference on <a href="http://mcs.open.ac.uk/su379/VIPS/" onclick="urchinTracker('/outgoing/mcs.open.ac.uk/su379/VIPS/?referer=');">Visualisation and Presentation in Statistics</a>. One idea I wanted to explore was: visualisations are powerful; visualisation techniques may incorporate statistical methods or let you "see" statistical patterns; most people know very little statistics; that shouldnlt stop them being able to use statistics as a technology; so what are we going to do about it? Feedback welcome... Err....?!]</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5416/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5416/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5416/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5416/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5416/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5416/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5416/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5416/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5416/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5416/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5416/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5416/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5416/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5416/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5416/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5416/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5416/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5416/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5416/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5416/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5416/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5416&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/05/05/first-play-with-r-and-r-studio-f1-lap-time-box-plots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ouseful.files.wordpress.com/2011/05/r-studio-combining-interactive-controls.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/rstudio-f1-laptimes-demo.jpg" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/r-studio-manipulate-interactive-component.png" length="" type="" />
<enclosure url="http://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Anscombe's_quartet_3.svg/800px-Anscombe's_quartet_3.svg.pngA" length="" type="" />
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>UK Journalists on Twitter</title>
		<link>http://blog.ouseful.info/2011/04/10/uk-journalists-on-twitter/</link>
		<comments>http://blog.ouseful.info/2011/04/10/uk-journalists-on-twitter/#comments</comments>
		<pubDate>Sun, 10 Apr 2011 11:16:07 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[newsrw]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[tony hirst]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5221</guid>
		<description><![CDATA[A post on the Guardian Datablog earlier today took a dataset collected by the Tweetminster folk and graphed the sorts of thing that journalists tweet about ( Journalists on Twitter: how do Britain&#8217;s news organisations tweet?). Tweetminster maintains separate lists of tweeting journalists for several different media groups, so it was easy to grab the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5221&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A post on the Guardian Datablog earlier today took a dataset collected by the <a href="http://tweetminster.co.uk/" onclick="urchinTracker('/outgoing/tweetminster.co.uk/?referer=');">Tweetminster</a> folk and graphed the sorts of thing that journalists tweet about (<a href="http://www.guardian.co.uk/news/datablog/2011/apr/08/twitter-journalists-tweets?commentpage=last#end-of-comments" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/apr/08/twitter-journalists-tweets?commentpage=last_end-of-comments&amp;referer=');"> Journalists on Twitter: how do Britain&#8217;s news organisations tweet?</a>).</p>
<p><a href="http://www.guardian.co.uk/news/datablog/2011/apr/08/twitter-journalists-tweets" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/apr/08/twitter-journalists-tweets?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/what-do-journalists-tweet-about.png?w=477&#038;h=451" alt="" width="477" height="451" class="alignnone size-full wp-image-5222" /></a></p>
<p>Tweetminster maintains separate lists of tweeting journalists for several different media groups, so it was easy to grab the names on each list, use the Twitter API to pull down the names of people followed by each person on the list, and then graph the friend connections between folk on the lists. The result shows that the hacks are follow each other quite closely:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5600218141/" title="UK Media Twitter echochamber (via tweetminster lists) by psychemedia, on Flickr" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5600218141/?referer=');"><img src="http://farm6.static.flickr.com/5230/5600218141_9ac615ce9b_z.jpg" width="640" height="496" alt="UK Media Twitter echochamber (via tweetminster lists)" /></a></p>
<p>Nodes are coloured by media group/Tweetminster list, and sized by PageRank, as calculated over the network using the Gephi PageRank statistic.</p>
<p>The force directed layout shows how folk within individual media groups tend to follow each other more intensely than they do people from other groups, but that said, inter-group following is still high. The major players across the media tweeps as a whole seem to be @arusbridger, @r4today, @skynews, @paulwaugh and @BBCLauraK.</p>
<p>I can generate an SVG version of the chart, and post a copy of the raw Gephi GDF data file, if anyone&#8217;s interested&#8230;</p>
<p>PS if you&#8217;re interested in trying out Gephi for yourself, you can download it from <a href="http://gephi.org" onclick="urchinTracker('/outgoing/gephi.org?referer=');">gephi.org</a>. One of the easiest ways in is to <a href="http://blog.ouseful.info/?s=facebook+gephi&amp;order=asc" onclick="urchinTracker('/outgoing/blog.ouseful.info/?s=facebook+gephi_amp_order=asc&amp;referer=');">explore your Facebook network</a></p>
<p>PPS for details on how the above was put together, here&#8217;s a related approach: <a href="http://blog.ouseful.info/2010/08/25/doodlings-around-the-data-driven-journalism-round-table-event-hashtag-community/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2010/08/25/doodlings-around-the-data-driven-journalism-round-table-event-hashtag-community/?referer=');"><br />
Trying to find useful things to do with emerging technologies in open education<br />
Doodlings Around the Data Driven Journalism Round Table Event Hashtag Community</a>.</p>
<p>For a slightly different view over the UK political Twittersphere, see <a href="http://blog.ouseful.info/2011/01/04/sketching-the-structure-of-the-uk-political-media-twittersphere/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/01/04/sketching-the-structure-of-the-uk-political-media-twittersphere/?referer=');">Sketching the Structure of the UK Political Media Twittersphere</a>. And for the House and Senate in the US: <a href="http://blog.ouseful.info/2011/01/05/sketching-connections-between-us-house-and-senate-tweeps/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/01/05/sketching-connections-between-us-house-and-senate-tweeps/?referer=');"> Sketching Connections Between US House and Senate Tweeps </a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5221/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5221/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5221/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5221/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5221/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5221/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5221/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5221/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5221/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5221/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5221/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5221/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5221/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5221/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5221/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5221&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/04/10/uk-journalists-on-twitter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ouseful.files.wordpress.com/2011/04/what-do-journalists-tweet-about.png" length="" type="" />
<enclosure url="http://farm6.static.flickr.com/5230/5600218141_9ac615ce9b_z.jpg" length="" type="" />
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>A First Quick Viz of UK University Fees</title>
		<link>http://blog.ouseful.info/2011/04/08/a-first-quick-viz-of-uk-university-fees/</link>
		<comments>http://blog.ouseful.info/2011/04/08/a-first-quick-viz-of-uk-university-fees/#comments</comments>
		<pubDate>Fri, 08 Apr 2011 22:21:01 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[datastore]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[tony hirst]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5226</guid>
		<description><![CDATA[Regular readers will know how I do quite like to dabble with visual analysis, so here are a couple of doodles with some of the university fees data that is starting to appear. The data set I&#8217;m using is a partial one, taken from the Guardian Datastore: Tuition fees 2012: what are the universities charging?. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5226&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Regular readers will know how I do quite like to dabble with visual analysis, so here are a couple of doodles with some of the university fees data that is starting to appear.</p>
<p>The data set I&#8217;m using is a partial one, taken from the Guardian Datastore: <a href="http://www.guardian.co.uk/news/datablog/2011/mar/25/higher-education-universityfunding" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/mar/25/higher-education-universityfunding?referer=');">Tuition fees 2012: what are the universities charging?</a>. (If you know where there&#8217;s a full list of UK course fees data by HEI and course, please let me know in a comment below, or even better, via an answer to this <a href="http://getthedata.org/questions/542/uk-university-course-fees" onclick="urchinTracker('/outgoing/getthedata.org/questions/542/uk-university-course-fees?referer=');">Where&#8217;s the fees data?</a> question on GetTheData.)</p>
<p>My first thought was to go for a proportional symbol map. (Does anyone know of a javascript library that can generate proportional symbol overlays on a Google Map or similar, even better if it can trivially pull in data from a Google spreadsheet via the Google visualisation? I have an old hack (<a href="http://ouseful.open.ac.uk/blogarchive/gmaptest-c.php" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/blogarchive/gmaptest-c.php?referer=');">supermarket catchment areas</a>), but there must be something nicer to use by now, surely? [UPDATE: ah - forgot this: <a href="http://polymaps.org/" onclick="urchinTracker('/outgoing/polymaps.org/?referer=');">Polymaps</a>])</p>
<p>In the end, I took the easy way out, and opted for <a href="http://geocommons.com/" onclick="urchinTracker('/outgoing/geocommons.com/?referer=');">Geocommons</a>. I downloaded the data from the Guardian datastore, and tidied it up a little in <a href="http://code.google.com/p/google-refine/" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/?referer=');">Google Refine</a>, removing non-numerical entries (including ranges, such 4,500-6,000) in the Fees column and replacing them with <em>minumum</em> fee values. Sorting the fees column as a numerical type with errors at the top made the columns that needed tweaking easy to find:</p>
<p><img src="http://ouseful.files.wordpress.com/2011/04/google-refine-sorting-by-error.png?w=398&#038;h=320" alt="" width="398" height="320" class="alignnone size-full wp-image-5227" /></p>
<p>The Guardian data included an address column, which I thought Geocommons should be able to cope with. It didn&#8217;t seem to work out for me though (I&#8217;m sure I checked the UK territory, but only seemed to get US geocodings?) so in the end I used a trick posted to the OnlineJournalism blog to geocode the addresses (<a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/">Getting full addresses for data from an FOI response (using APIs)</a>; rather than use the <em>value.parseJson().results[0].formatted_address</em> construct, I generated a couple of columns from the JSON results column using <em>value.parseJson().results[0].geometry.location.lng</em> and <em>value.parseJson().results[0].geometry.location.lat</em>).</p>
<p>Uploading the data to Geocommons and clicking where prompted, it was quite easy to generate this <a href="http://geocommons.com/maps/62983" onclick="urchinTracker('/outgoing/geocommons.com/maps/62983?referer=');">map of the fees to date</a>:</p>
<p><a href="http://geocommons.com/maps/62983" onclick="urchinTracker('/outgoing/geocommons.com/maps/62983?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/geocommons-fees-map.png?w=700&#038;h=384" alt="" width="700" height="384" class="alignnone size-full wp-image-5228" /></a></p>
<p>Anyone know if there&#8217;s a way of choosing the order of fields in the pop-up info box? And maybe even a way of selecting which ones to display? Or do I have to generate a custom dataset and then create a map over that?</p>
<p>What I had hoped to be able to do was use coloured proportional symbols to generate a two dimensional data plot, e.g. comparing fees with drop out rates, but Geocommons doesn&#8217;t seem to support that (yet?). It would also be nice to have an interactive map where the user could select which numerical value(s) are displayed, but again, I missed that option if it&#8217;s there&#8230;</p>
<p>The second thing I thought I&#8217;d try would be an <a href="http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/university-fees-is-there-value-for" onclick="urchinTracker('/outgoing/www-958.ibm.com/software/data/cognos/manyeyes/visualizations/university-fees-is-there-value-for?referer=');">interactive scatterplot on Many Eyes</a>. Here&#8217;s one view that I thought might identify what sort of return on value you might get for you course fee&#8230;;-)</p>
<p><a href="http://blog.ouseful.info/Users/ajh59/Documents/screenshots/Many%20Eyes%20-%20UK%20Uni%20fees.png" onclick="urchinTracker('/outgoing/blog.ouseful.info/Users/ajh59/Documents/screenshots/Many_20Eyes_20-_20UK_20Uni_20fees.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/many-eyes-uk-uni-fees.png?w=700&#038;h=427" alt="" width="700" height="427" class="alignnone size-full wp-image-5229" /></a></p>
<p>Click thru&#8217; to have a play with the chart yourself;-)</p>
<p>PS I can;t not say this, really &#8211; <em>you&#8217;ve let me down again, @datastore folks&#8230;.</em> where&#8217;s a university ID column using some sort of standard identifier for each university? I know you have them, because they&#8217;re in the <a href="http://www.guardian.co.uk/news/datablog/2009/nov/24/iso-country-codes-reference-guide-rosetta-stone" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2009/nov/24/iso-country-codes-reference-guide-rosetta-stone?referer=');">Rosetta sheet</a>&#8230; although that is lacking a <a href="http://www.hesa.ac.uk/index.php?option=com_collns&amp;task=show_manuals&amp;Itemid=233&amp;r=06011&amp;f=002" onclick="urchinTracker('/outgoing/www.hesa.ac.uk/index.php?option=com_collns_amp_task=show_manuals_amp_Itemid=233_amp_r=06011_amp_f=002&amp;referer=');">HESA INST-ID</a> column, which might be handy in certain situations&#8230; ;-) [UPDATE - apparently, HESA codes are in the spreadsheet.... ;-0]</p>
<p>PPS Hmm&#8230; that Rosetta sheet got me thinking &#8211; what identifier scheme does the <a href="http://www.jiscmu.ac.uk/api/" onclick="urchinTracker('/outgoing/www.jiscmu.ac.uk/api/?referer=');">JISC MU API</a> use?</p>
<p>PPPS If you&#8217;re looking for a degree, why not give the <a href="http://blog.ouseful.info/2011/04/08/oer-hack-day-uk-universities-prospectus-search-course-detective/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/04/08/oer-hack-day-uk-universities-prospectus-search-course-detective/?referer=');">Course Detective</a> search engine a go? It searches over as many of the UK university online prospectus web pages that we could find and offer up as a sacrifice to a Google Custom search engine ;-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5226/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5226/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5226/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5226/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5226/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5226/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5226/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5226/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5226/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5226/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5226/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5226/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5226/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5226/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5226/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5226/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5226/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5226/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5226/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5226/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5226/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5226&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/04/08/a-first-quick-viz-of-uk-university-fees/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ouseful.files.wordpress.com/2011/04/geocommons-fees-map.png" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/many-eyes-uk-uni-fees.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/google-refine-sorting-by-error.png" length="" type="" />
		</item>
	</channel>
</rss>

