<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; data journalism</title>
	<atom:link href="http://onlinejournalismblog.com/category/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Sat, 11 Feb 2012 12:06:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>FAQ: The stream as an interface; starting out in data journalism</title>
		<link>http://onlinejournalismblog.com/2012/02/11/faq-the-stream-as-an-interface-starting-out-in-data-journalism/</link>
		<comments>http://onlinejournalismblog.com/2012/02/11/faq-the-stream-as-an-interface-starting-out-in-data-journalism/#comments</comments>
		<pubDate>Sat, 11 Feb 2012 12:06:28 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[faq]]></category>
		<category><![CDATA[navigation]]></category>
		<category><![CDATA[streams]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15847</guid>
		<description><![CDATA[Here&#8217;s the latest answers to some questions - this time relating to these predictions for 2012: Q: What are the advantages of &#8220;stream” as an interface for news website homepages?  The main advantages are that it&#8217;s very sticky &#8211; users tend to leave streams on in the same way that they leave 24 hour news channels on, or keep checking back to<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/02/11/faq-the-stream-as-an-interface-starting-out-in-data-journalism/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F11%2Ffaq-the-stream-as-an-interface-starting-out-in-data-journalism%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F02_2F11_2Ffaq-the-stream-as-an-interface-starting-out-in-data-journalism_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F11%2Ffaq-the-stream-as-an-interface-starting-out-in-data-journalism%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>Here&#8217;s the latest <a href="http://onlinejournalismblog.com/category/faq">answers to some questions</a> - this time relating to <a href="http://www.niemanlab.org/2011/12/paul-bradshaw-collaboration-data-2012-will-see-news-outlets-turning-talk-into-action/" onclick="urchinTracker('/outgoing/www.niemanlab.org/2011/12/paul-bradshaw-collaboration-data-2012-will-see-news-outlets-turning-talk-into-action/?referer=');">these predictions for 2012</a>:</em></p>
<h2><strong>Q: What are the advantages of &#8220;stream” as an interface for news website homepages? </strong></h2>
<p>The main advantages are that it&#8217;s very sticky &#8211; users tend to leave streams on in the same way that they leave 24 hour news channels on, or keep checking back to Facebook and Twitter (which have helped popularise the &#8216;stream&#8217; interface).</p>
<p>If you compare that to the traditional story layout format, where users scan across the page but then leave the site if there&#8217;s nothing obviously of interest, you can see the difference.</p>
<p>I think there&#8217;s room for both, but if you want to know what&#8217;s new since the last time you looked, the stream works very well. And it&#8217;s not difficult to combine that with subject or region pages that show the most important news of that day, for example.</p>
<p>I think it can work for every kind of news: the stream says &#8216;Here&#8217;s what&#8217;s new&#8217; across all topics; the &#8216;layout&#8217; says &#8216;Here&#8217;s what we think is important&#8217; &#8211; in other words, it performs a more traditional &#8216;snapshot&#8217; function akin to the daily newspaper layout.</p>
<h2>2) What are the skills a reporter should have in order to be a top-notch, first-rate data journalist?</h2>
<p>The basic skills are the same as any journalist: a nose for a story, and the ability to communicate that clearly. In data journalism terms that means being able to interrogate data quickly and then focus on the most important facts within it.</p>
<p>That will most likely involve being able to use spreadsheet formulae to work out, for example, the proportion of time or money being spent on something, or to combine different datasets to gain new insights or overcome obstacles put in your way by those publishing the data.</p>
<p>You also need to be able to avoid mistakes by cleaning data, for example (often the same person or organisation will be named differently, for example), and by understanding the context of the data (for example, population size, or methodology used to gather it).</p>
<p>Finally, as I say, you need to be able to communicate the results clearly, which often means pulling back from the data and not trying to use it all in your telling of the story (just as you wouldn&#8217;t use every quote you got from a source) but keeping it simple.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F11%2Ffaq-the-stream-as-an-interface-starting-out-in-data-journalism%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/02/11/faq-the-stream-as-an-interface-starting-out-in-data-journalism/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/02/11/faq-the-stream-as-an-interface-starting-out-in-data-journalism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Video: Heather Brooke&#8217;s tips on investigating, and using the FOI and Data Protection Acts</title>
		<link>http://onlinejournalismblog.com/2012/02/03/video-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts/</link>
		<comments>http://onlinejournalismblog.com/2012/02/03/video-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts/#comments</comments>
		<pubDate>Fri, 03 Feb 2012 08:15:15 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[Data Protection Act]]></category>
		<category><![CDATA[foi]]></category>
		<category><![CDATA[Health]]></category>
		<category><![CDATA[heather brooke]]></category>
		<category><![CDATA[help me investigate]]></category>
		<category><![CDATA[investigative journalism]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[welfare]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15784</guid>
		<description><![CDATA[The following 3 videos first appeared on the Help Me Investigate blog, Help Me Investigate: Health and Help Me Investigate: Welfare. I thought I&#8217;d collect them together here too. As always, these are published under a Creative Commons licence, so you are welcome to re-use, edit and combine with other video, with attribution (and a link!). First, Heather Brooke&#8217;s tips<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/02/03/video-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F03%2Fvideo-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F02_2F03_2Fvideo-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F03%2Fvideo-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The following 3 videos first appeared <a href="http://helpmeinvestigate.posterous.com/video-heather-brooke-tips-for-starting-to-inv" title="Help Me Investigate blog - Heather Brooke" onclick="urchinTracker('/outgoing/helpmeinvestigate.posterous.com/video-heather-brooke-tips-for-starting-to-inv?referer=');">on the Help Me Investigate blog</a>, <a href="http://helpmeinvestigate.com/health/tag/heather-brooke/" title="Help Me Investigate Health - Heather Brooke" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/health/tag/heather-brooke/?referer=');">Help Me Investigate: Health</a> and <a href="http://helpmeinvestigate.com/welfare/tag/heather-brooke" title="Help Me Investigate Health - Heather Brooke" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/welfare/tag/heather-brooke?referer=');">Help Me Investigate: Welfare</a>. I thought I&#8217;d collect them together here too. As always, these are published under a Creative Commons licence, so you are welcome to re-use, edit and combine with other video, with attribution (and a link!).</p>
<p>First, Heather Brooke&#8217;s tips for starting to investigate public bodies:</p>
<p><iframe width="600" height="338" src="http://www.youtube.com/embed/TS6F9wcakSc?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>Her advice on investigating health, welfare and crime:</p>
<p><iframe width="600" height="338" src="http://www.youtube.com/embed/QKSQx3f5xdw?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>And on using the Data Protection Act:</p>
<p><iframe width="600" height="338" src="http://www.youtube.com/embed/zuag5-0FS2Q?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F03%2Fvideo-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/02/03/video-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/02/03/video-heather-brookes-tips-on-investigating-and-using-the-foi-and-data-protection-acts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Moving away from &#8216;the story&#8217;: 5 roles of an online investigations team</title>
		<link>http://onlinejournalismblog.com/2012/02/02/moving-away-from-the-story-5-roles-of-an-online-investigations-team/</link>
		<comments>http://onlinejournalismblog.com/2012/02/02/moving-away-from-the-story-5-roles-of-an-online-investigations-team/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 13:39:22 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[birmingham city university]]></category>
		<category><![CDATA[churnalism]]></category>
		<category><![CDATA[investigative journalism]]></category>
		<category><![CDATA[organisation]]></category>
		<category><![CDATA[team roles]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15793</guid>
		<description><![CDATA[In almost a decade of teaching online journalism I repeatedly come up against the same two problems: people who are so wedded to the idea of the self-contained &#8216;story&#8217; that they struggle to create journalism outside of that (e.g. the journalism of linking, liveblogging, updating, explaining, or saying what they don&#8217;t know); and people stuck in the habit of churning<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/02/02/moving-away-from-the-story-5-roles-of-an-online-investigations-team/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F02%2Fmoving-away-from-the-story-5-roles-of-an-online-investigations-team%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F02_2F02_2Fmoving-away-from-the-story-5-roles-of-an-online-investigations-team_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F02%2Fmoving-away-from-the-story-5-roles-of-an-online-investigations-team%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>In almost a decade of teaching online journalism I repeatedly come up against the same two problems:</p>
<ul>
<li>people who are so wedded to the idea of <strong>the self-contained &#8216;story&#8217;</strong> that they struggle to create journalism outside of that (e.g. the journalism of linking, liveblogging, updating, explaining, or saying <em>what they don&#8217;t know</em>);</li>
<li>and people stuck in the habit of <strong>churning out easy-win articles</strong> rather than investing a longer-term effort in something of depth.</li>
</ul>
<p>Until now I&#8217;ve addressed these problems largely through teaching and individual feedback. But for the next 3 months I&#8217;ll be trying a new way of <em>organising</em> students that hopes to address those two problems. As always, I thought I&#8217;d share it here to see what you think.</p>
<h2>Roles in a team: moving from churnalism to depth</h2>
<p>Here’s what I’m trying (for context: this is on an undergraduate module <a href="http://www.bcu.ac.uk/courses/media-and-communication-journalism" onclick="urchinTracker('/outgoing/www.bcu.ac.uk/courses/media-and-communication-journalism?referer=');">at Birmingham City University</a>):</p>
<p>Students are allocated one of 5 roles within a group, investigating a particular public interest question. They investigate that for 6 weeks, at which point they are rotated to a different role and a new investigation (I&#8217;m weighing up whether to have some sort of job interview at that point).</p>
<p>The group format allows &#8211; I hope &#8211; for something interesting to happen: students are not under pressure to deliver &#8216;stories&#8217;, but instead blog about their investigation, as explained below. They are still learning newsgathering techniques, and production techniques, but the team structure makes these explicitly different to those that they would learn elsewhere.</p>
<p>The hope is that it will be much more difficult for them to just transfer print-style stories online, or to reach for he-said/she-said sources to fill the space between ads. With only one story to focus on, students should be forced to engage more, to do deeper and deeper into an issue, and to be more creative in how they communicate what they find out.</p>
<p><em>(It&#8217;s interesting to note that<a href="http://www.knightdigitalmediacenter.org/leadership_blog/comments/20111128_at_the_new_haven_register_reorganization_emphasizes_investigative_/" onclick="urchinTracker('/outgoing/www.knightdigitalmediacenter.org/leadership_blog/comments/20111128_at_the_new_haven_register_reorganization_emphasizes_investigative_/?referer=');"> at least one news organisation is attempting something similar with a restructuring late last year</a>)</em></p>
<p>Only one member of the team is primarily concerned with the story, and that is the editor:</p>
<h2>The Editor (ED)</h2>
<p>It is the editor&#8217;s role to identify what exactly the story is that the team is pursuing, and plan how the resources of the team should be best employed in pursuing that. It will help if they form the story as a hypothesis to be tested by the team gathering evidence &#8211; following <a href="http://unesdoc.unesco.org/images/0019/001930/193078e.pdf" onclick="urchinTracker('/outgoing/unesdoc.unesco.org/images/0019/001930/193078e.pdf?referer=');">Mark Lee Hunter’s story based inquiry method (PDF)</a>.</p>
<p>Qualities needed and developed by the editor include:</p>
<ul>
<li>A nose for a story</li>
<li>Project management skills</li>
<li>Newswriting &#8211; the ability to communicate a story effectively</li>
</ul>
<h2>The Community Manager (CM)</h2>
<p>The community manager’s focus is on the communities affected by the story being pursued. They should be engaging regularly with those communities &#8211; contributing to forums, having conversations with members on Twitter; following updates on Facebook; attending real world events; commenting on blogs or photo/video sharing sites, and so on.</p>
<p>They are the two-way channel between that community and the news team: feeding leads from the community to the editor, and taking a lead from the editor in finding contacts from the community (experts, case studies, witnesses).</p>
<p>Qualities needed and developed by the community manager include:</p>
<ul>
<li>Interpersonal skills &#8211; the ability to listen to and communicate with different people</li>
<li>A nose for a story</li>
<li>Contacts in the community</li>
<li>Social network research skills &#8211; the ability to find sources and communities online</li>
</ul>
<h2>The Data Journalist (DJ)</h2>
<p>While the community manager is focused on people, the data journalist is focused on documentation: datasets, reports, documents, regulations, and anything that frames the story being pursued.</p>
<p>It is their role to find that documentation &#8211; and to make sense of it. This is a key role because <a href="http://www.niemanlab.org/2011/12/nprs-stateimpact-project-explores-regional-topics-through-focused-data-driven-journalism/" onclick="urchinTracker('/outgoing/www.niemanlab.org/2011/12/nprs-stateimpact-project-explores-regional-topics-through-focused-data-driven-journalism/?referer=');">stories often come from signs being ignored</a> (data) or regulations being ignored (documents).</p>
<p>Qualities needed and developed by the data journalist include:</p>
<ul>
<li>Research skills &#8211; advanced online search and use of libraries</li>
<li>Analysis skills &#8211; such as using spreadsheets</li>
<li>Ability to decipher jargon &#8211; often by accessing experts (the CM can help)</li>
</ul>
<h2>The Multimedia Journalist (MMJ)</h2>
<p>The multimedia journalist is focused on the sights, sounds and people that bring a story to life. In an investigation, these will typically be the &#8216;victims&#8217; and the &#8216;targets&#8217;.</p>
<p>They will film interviews with case studies; organise podcasts where various parties play the story out; collect galleries of images to illustrate the reality behind the words.</p>
<p>They will work closely with the CM as their roles can overlap, especially when accessing sources. The difference is that the CM is concerned with a larger quantity of interactions and information; the MM is concerned with quality: much fewer interactions and richer detail.</p>
<p>Qualities needed and developed by the MMJ include:</p>
<ul>
<li>Ability to find sources: experts, witnesses, case studies</li>
<li>Technical skills: composition; filming or recording; editing</li>
<li>Planning: pre-interviewing, research, booking kit</li>
</ul>
<h2>The Network Aggregator (NA)</h2>
<p>The NA is the person who keeps the site ticking over while the rest of the team is working on the bigger story.</p>
<p>They publish regular links to related stories around the country. They are also the person who provides the wider context of that story: what else is happening in that field or around that issue; are similar issues arising in other places around the country. Typical content includes backgrounders, explainers, and updates from around the world.</p>
<p>This is the least demanding of the roles, so they should also be available to support other members of the team when required, following up minor leads on related stories. They should not be ‘just linking’, but getting original stories too, particularly by &#8216;joining the dots&#8217; on information coming in.</p>
<p>Qualities needed and developed by the NA include:</p>
<ul>
<li>Information management &#8211; following as many feeds, newsletters and other relevant soures of information</li>
<li>Wide range of contacts &#8211; speaking to the usual suspects regularly to get a feel for the pulse of the issue/sector</li>
<li>Ability to turn around copy quickly</li>
</ul>
<h2>Publish regular pieces that come together in a larger story</h2>
<p>If this works, I&#8217;m hoping students will produce different types of content on their way to that &#8216;big story&#8217;, as follows:</p>
<ul>
<li>Linkblogging &#8211; simple posts that link to related articles elsewhere with a key quote (rather than wasting resources rewriting them)</li>
<li>Profiles of key community members</li>
<li>Backgrounders and explainers on key issues</li>
<li>Interviews with experts, case studies and witnesses, published individually first, then edited together later</li>
<li>Aggregation and curation &#8211; pulling together a gallery of images, for example; or key tweets on an issue; or key facts on a particular area (who, what, where, when, how); or rounding up an event or discussion</li>
<li>Datablogging &#8211; finding and publishing key datasets and documents and translating them/pulling out key points for a wider audience.</li>
<li>The story so far &#8211; taking users on a journey of what facts have been discovered, and what remains to be done.</li>
</ul>
<p>You can <a href="https://docs.google.com/document/d/1fcjsF7R0efV3ZacIR78nlSaOB1M9RQo0UMghdo2AQUo/edit" onclick="urchinTracker('/outgoing/docs.google.com/document/d/1fcjsF7R0efV3ZacIR78nlSaOB1M9RQo0UMghdo2AQUo/edit?referer=');">read more on the expectations of each role in this document</a>. And there&#8217;s a diagram indicating how group members might interact below:</p>
<div id="attachment_15801" class="wp-caption alignnone" style="width: 589px"><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/02/OJ_investigations_team.jpg"><img class=" wp-image-15801 " src="http://onlinejournalismblog.com/wp-content/uploads/2012/02/OJ_investigations_team.jpg" alt="Investigations team flowchart" width="579" height="454" /></a><p class="wp-caption-text">Investigations team flowchart</p></div>
<p>What will make the difference is how disciplined the editor is in ensuring that their team keeps moving towards the ultimate aim, and that they can combine the different parts into a significant whole.</p>
<p>UPDATE: A commenter has asked about the end result. Here&#8217;s how it&#8217;s explained to students:</p>
<div>
<blockquote><p>&#8220;At an identified point, the Editor will need to organise his or her team to bring those ingredients into that bigger story &#8211; and it may be told in different ways, for example:</p>
<ul>
<li>A longform text narrative with links to the source material and embedded multimedia</li>
<li>An edited multimedia package with links to source material in the accompanying description</li>
<li>A map made with Google Maps, Fusion Tables or another tool, where pins include images or video, and links to each story&#8221;</li>
</ul>
</blockquote>
</div>
<p><strong><em>If you&#8217;ve any suggestions or experiences on how this might work better, I&#8217;d very much welcome them.</em></strong></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F02%2Fmoving-away-from-the-story-5-roles-of-an-online-investigations-team%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/02/02/moving-away-from-the-story-5-roles-of-an-online-investigations-team/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/02/02/moving-away-from-the-story-5-roles-of-an-online-investigations-team/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>&#8220;Data laundering&#8221;</title>
		<link>http://onlinejournalismblog.com/2012/02/01/data-laundering/</link>
		<comments>http://onlinejournalismblog.com/2012/02/01/data-laundering/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 20:20:59 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data laundering]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[tony hirst]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15795</guid>
		<description><![CDATA[Wonderful post by Tony Hirst in which he sort-of-coins* a lovely neologism in explaining how data can be &#8220;laundered&#8221;: &#8220;The Deloitte report was used as evidence by Facebook to demonstrate a particular economic benefit made possible by Facebook’s activities. The consultancy firm&#8217;s caveats were ignored, (including the fact that the data may in part at least have come from Facebook itself), in reporting this<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/02/01/data-laundering/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F01%2Fdata-laundering%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F02_2F01_2Fdata-laundering_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F01%2Fdata-laundering%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://blog.ouseful.info/2012/02/01/sleight-of-hand-and-data-laundering-in-evidence-based-policy-making/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2012/02/01/sleight-of-hand-and-data-laundering-in-evidence-based-policy-making/?referer=');">Wonderful post by Tony Hirst</a> in which he sort-of-coins* a lovely neologism in explaining how data can be &#8220;laundered&#8221;:</p>
<blockquote><p>&#8220;The Deloitte report was used <em>as evidence</em> by Facebook to <em>demonstrate</em> a particular economic benefit made possible by Facebook’s activities. The consultancy firm&#8217;s caveats were ignored, (including the fact that the data may in part at least have come from Facebook itself), in reporting this claim.</p>
<p>&#8220;So: this is <em>data laundering</em>, right? We have some dodgy evidence, about which we’re biased, so we give it to an “independent” consultant who re-reports it, albeit with caveats, that we can then report, minus the caveats. Lovely, clean evidence. Our lobbyists can then go to a lazy policy researcher and take this scrubbed evidence, referencing it as finding in the Deloitte report, so that it can make its way into a policy briefing.&#8221;</p></blockquote>
<p>So, perhaps we can now say &#8220;Follow the data&#8221; in the same way that we &#8220;Follow the money&#8221;?</p>
<p><em>*Although a <a href="http://www.google.co.uk/webhp?sourceid=chrome-instant&amp;ix=iea&amp;ie=UTF-8&amp;ion=1#sclient=psy-ab&amp;hl=en&amp;site=webhp&amp;source=hp&amp;q=%22data%20laundering%22&amp;pbx=1&amp;oq=&amp;aq=&amp;aqi=&amp;aql=&amp;gs_sm=&amp;gs_upl=&amp;fp=6a7a29b7fc21776a&amp;ix=iea&amp;ion=1&amp;ix=iea&amp;ion=1&amp;bav=on.2,or.r_gc.r_pw.,cf.osb&amp;fp=6a7a29b7fc21776a&amp;biw=1025&amp;bih=482&amp;ix=iea&amp;ion=1" onclick="urchinTracker('/outgoing/www.google.co.uk/webhp?sourceid=chrome-instant_amp_ix=iea_amp_ie=UTF-8_amp_ion=1_sclient=psy-ab_amp_hl=en_amp_site=webhp_amp_source=hp_amp_q=_22data_20laundering_22_amp_pbx=1_amp_oq=_amp_aq=_amp_aqi=_amp_aql=_amp_gs_sm=_amp_gs_upl=_amp_fp=6a7a29b7fc21776a_amp_ix=iea_amp_ion=1_amp_ix=iea_amp_ion=1_amp_bav=on.2_or.r_gc.r_pw._cf.osb_amp_fp=6a7a29b7fc21776a_amp_biw=1025_amp_bih=482_amp_ix=iea_amp_ion=1&amp;referer=');">search for &#8220;money laundering&#8221; generates thousands of results on Google</a>, most of them seemingly <a href="http://sectorprivate.wordpress.com/2009/03/16/my-definition-of-data-laundering-as-inspired-by-william-gibson-from-mona-lisa-overdrive/" onclick="urchinTracker('/outgoing/sectorprivate.wordpress.com/2009/03/16/my-definition-of-data-laundering-as-inspired-by-william-gibson-from-mona-lisa-overdrive/?referer=');">influenced by serial neologist William Gibson</a>&#8216;s use of the term to refer to using illegally acquired data, I can&#8217;t find an example of it being used in the way that Tony means it.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F02%2F01%2Fdata-laundering%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/02/01/data-laundering/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/02/01/data-laundering/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The £10,000 question: who benefits most from a tax threshold change?</title>
		<link>http://onlinejournalismblog.com/2012/01/27/the-10000-question-who-benefits-most-from-a-tax-threshold-change/</link>
		<comments>http://onlinejournalismblog.com/2012/01/27/the-10000-question-who-benefits-most-from-a-tax-threshold-change/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 16:34:05 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[guido fawkes]]></category>
		<category><![CDATA[IFS]]></category>
		<category><![CDATA[james ball]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[tax threshold]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15755</guid>
		<description><![CDATA[Here&#8217;s a great test for eagle-eyed journalists, tweeted by Guardian&#8217;s James Ball. It&#8217;s a tale of two charts that claim to show the impact of a change in the income tax threshold to £10,000. Here&#8217;s the first: And here&#8217;s the second: So: same change, very different stories. In one story (Institute for Fiscal Studies) it is the the wealthiest that<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/01/27/the-10000-question-who-benefits-most-from-a-tax-threshold-change/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fthe-10000-question-who-benefits-most-from-a-tax-threshold-change%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F01_2F27_2Fthe-10000-question-who-benefits-most-from-a-tax-threshold-change_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fthe-10000-question-who-benefits-most-from-a-tax-threshold-change%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Here&#8217;s a great test for eagle-eyed journalists, <a href="https://twitter.com/#!/jamesrbuk/status/162927850683514880" onclick="urchinTracker('/outgoing/twitter.com/_/jamesrbuk/status/162927850683514880?referer=');">tweeted by Guardian&#8217;s James Ball</a>. It&#8217;s a tale of two charts that claim to show the impact of a <a href="http://www.egovmonitor.com/node/45843" onclick="urchinTracker('/outgoing/www.egovmonitor.com/node/45843?referer=');">change in the income tax threshold to £10,000</a>. Here&#8217;s the first:</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/01/threshold-change-by-income-decile.jpg"><img class="alignnone size-full wp-image-15756" src="http://onlinejournalismblog.com/wp-content/uploads/2012/01/threshold-change-by-income-decile.jpg" alt="Change in post-tax income as a percentage of gross income" width="448" height="252" /></a></p>
<p>And here&#8217;s the second:</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/01/income_tax_impact_IFS.jpg"><img class="alignnone  wp-image-15757" src="http://onlinejournalismblog.com/wp-content/uploads/2012/01/income_tax_impact_IFS.jpg" alt="Net impact of income tax threshold change on incomes - IFS" width="420" height="305" /></a></p>
<p>So: same change, very different stories. In one story (Institute for Fiscal Studies) it is the the wealthiest that appear to benefit the most; but in the other (<a href="http://order-order.com/2012/01/26/cleggs-progressive-10000-threshold-hike-benefits-low-income-earners-most/" onclick="urchinTracker('/outgoing/order-order.com/2012/01/26/cleggs-progressive-10000-threshold-hike-benefits-low-income-earners-most/?referer=');">Taxpayers&#8217; Alliance via Guido Fawkes</a>) it&#8217;s the poorest who are benefiting.</p>
<p>Did you spot the difference? The different y axis is a slight clue &#8211; the first chart covers a wider range of change &#8211; but it&#8217;s the legend that gives the biggest hint: one is measuring change as a percentage of <em>gross</em> income (before, well, taxes); the other as a change in <em>net</em> income (after tax).</p>
<p>James&#8217;s colleague Mary Hamilton <a href="https://twitter.com/#!/newsmary/status/162939489285713920" onclick="urchinTracker('/outgoing/twitter.com/_/newsmary/status/162939489285713920?referer=');">put it</a> like this: &#8220;4.5% of very little is of course much less than 1% of loads.&#8221; Or, more specifically: 4.6% of £10,853 (the second decile mentioned in Fawkes&#8217; post) is £499.24; 1.1% of £47,000 (the 9th decile according to <a href="https://docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdGJNb1dXcjdmTTgyQ2h5R1lFSUp3TlE" onclick="urchinTracker('/outgoing/docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdGJNb1dXcjdmTTgyQ2h5R1lFSUp3TlE&amp;referer=');">the same ONS figures</a>) is £517. (Without raw data, it&#8217;s hard to judge what figures are being used &#8211; if you include earnings over that £47k marker then it changes things, for example, and there&#8217;s no link to the net earnings).</p>
<p>In a nutshell, like James, I&#8217;m not entirely sure why they differ so strikingly. So, further statistical analysis welcome.</p>
<p>UPDATE: Seems a bit of a Twitter fight erupted between Guido Fawkes and James Ball over the source of the IFS data. James links to <a href="http://www.ifs.org.uk/election/launch_browne_phillips.pdf" onclick="urchinTracker('/outgoing/www.ifs.org.uk/election/launch_browne_phillips.pdf?referer=');">this pre-election document</a> containing the chart and <a href="http://www.ifs.org.uk/budgets/budget2011/budget2011_jb.pdf" onclick="urchinTracker('/outgoing/www.ifs.org.uk/budgets/budget2011/budget2011_jb.pdf?referer=');">this one on &#8216;Budget 2011&#8242;</a>. Guido <a href="https://twitter.com/#!/GuidoFawkes/status/163003578468925440" onclick="urchinTracker('/outgoing/twitter.com/_/GuidoFawkes/status/163003578468925440?referer=');">says</a> the chart&#8217;s &#8220;projections were based on policy forecasts that didn&#8217;t pan out&#8221;. I&#8217;ve not had the chance to properly scrutinise the claims of either James or Guido. I&#8217;ve also yet to see a direct link to the Taxpayers&#8217; Alliance data, so that is equally in need of unpicking.</p>
<p>In this post, however, my point isn&#8217;t to do with the specific issue (or who is &#8216;right&#8217;) but rather how it can be presented in different ways, and the importance of having access to the raw data to &#8216;unspin&#8217; it.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fthe-10000-question-who-benefits-most-from-a-tax-threshold-change%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/01/27/the-10000-question-who-benefits-most-from-a-tax-threshold-change/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/01/27/the-10000-question-who-benefits-most-from-a-tax-threshold-change/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A new Scottish datablog (and a treemap in Liverpool)</title>
		<link>http://onlinejournalismblog.com/2012/01/27/a-new-scottish-datablog-and-a-treemap-in-liverpool/</link>
		<comments>http://onlinejournalismblog.com/2012/01/27/a-new-scottish-datablog-and-a-treemap-in-liverpool/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 15:24:03 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[datablog]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Jennifer O'Mahony]]></category>
		<category><![CDATA[ofsted]]></category>
		<category><![CDATA[scotsman]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15748</guid>
		<description><![CDATA[The Scotsman has a newish data blog, set up (I&#8217;m rather proud to say) by one of my former PA/Telegraph trainees: Jennifer O&#8217;Mahony. This is particularly important as so much data covered in the &#8216;national&#8217; press tends to be English-only due to devolution. The Department of Education, for example, only publishes English education data. If you want Scottish education data you need<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/01/27/a-new-scottish-datablog-and-a-treemap-in-liverpool/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fa-new-scottish-datablog-and-a-treemap-in-liverpool%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F01_2F27_2Fa-new-scottish-datablog-and-a-treemap-in-liverpool_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fa-new-scottish-datablog-and-a-treemap-in-liverpool%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The Scotsman <a href="http://thesteamie.scotsman.com/viewtags.aspx?id=25" onclick="urchinTracker('/outgoing/thesteamie.scotsman.com/viewtags.aspx?id=25&amp;referer=');">has a newish data blog</a>, set up (I&#8217;m rather proud to say) by one of my former PA/Telegraph trainees: Jennifer O&#8217;Mahony. This is particularly important as so much data covered in the &#8216;national&#8217; press tends to be English-only due to devolution.</p>
<p>The Department of Education, for example, only<a href="http://www.education.gov.uk/inyourarea/" onclick="urchinTracker('/outgoing/www.education.gov.uk/inyourarea/?referer=');"> publishes English education data</a>. If you want Scottish education data you need to go to the <a href="http://www.scotland.gov.uk/Topics/Statistics/Browse/School-Education" onclick="urchinTracker('/outgoing/www.scotland.gov.uk/Topics/Statistics/Browse/School-Education?referer=');">Scottish Government website</a> or <a href="http://www.ltscotland.org.uk/" onclick="urchinTracker('/outgoing/www.ltscotland.org.uk/?referer=');">Education Scotland</a>. <a href="http://www.ofsted.gov.uk/about-us" onclick="urchinTracker('/outgoing/www.ofsted.gov.uk/about-us?referer=');">Ofsted</a> inspects schools in England; for Scottish schools reports you need to visit <a href="http://www.hmie.gov.uk/AboutUs/InspectionResources/" onclick="urchinTracker('/outgoing/www.hmie.gov.uk/AboutUs/InspectionResources/?referer=');">HM Inspectorate of Education</a>. (Meanwhile, the <a href="http://www.statistics.gov.uk/hub/children-education-skills/school-and-college-education/school-and-colleges/index.html" onclick="urchinTracker('/outgoing/www.statistics.gov.uk/hub/children-education-skills/school-and-college-education/school-and-colleges/index.html?referer=');">National Statistics site, publishes data from England, Scotland, Wales and Northern Ireland</a>).</p>
<p>So if there&#8217;s any Scottish data &#8211; or that of Wales or Northern Ireland &#8211; that you want me to help with, let me or <a href="https://twitter.com/#!/jaomahony" onclick="urchinTracker('/outgoing/twitter.com/_/jaomahony?referer=');">Jennifer</a> know. By way of illustrating the process, here&#8217;s a post <a title="Scraping, mapping scottish education data" href="http://helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/?referer=');">over on Help Me Inves</a><a href="http://helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/?referer=');">tigate: Education on how I helped Jennifer collect data on free school meals in Scotland</a>.</p>
<h2>A treemap in Liverpool</h2>
<p>On the same note of non-national data journalism, here&#8217;s a<a href="http://blogs.liverpooldailypost.co.uk/dalestreetblues/2012/01/infographic-showing-the-huge-s.html" onclick="urchinTracker('/outgoing/blogs.liverpooldailypost.co.uk/dalestreetblues/2012/01/infographic-showing-the-huge-s.html?referer=');"> particularly nice bit of data visualisation at the Liverpool Post</a>. It&#8217;s not often you see treemaps on a local newspaper website &#8211; this one was designed by <a href="https://twitter.com/#!/Ilanimator" onclick="urchinTracker('/outgoing/twitter.com/_/Ilanimator?referer=');">Ilan Sheady</a> based on data gathered by City Editor <a href="https://twitter.com/#!/davidbartlett1/status/162449105266814976" onclick="urchinTracker('/outgoing/twitter.com/_/davidbartlett1/status/162449105266814976?referer=');">David Bartlett</a> after a day&#8217;s <a href="http://onlinejournalismblog.com/2012/01/06/a-days-training-in-data-journalism/">data journalism training</a>.</p>
<p><img src="http://blogs.liverpooldailypost.co.uk/dalestreetblues/assets_c/2012/01/Liverpool%20Waters%20graphic-thumb-450x319-173400.jpg" alt="Infographic showing the huge scale of the £5.5bn Liverpool Waters scheme" /></p>
<p>&nbsp;</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fa-new-scottish-datablog-and-a-treemap-in-liverpool%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/01/27/a-new-scottish-datablog-and-a-treemap-in-liverpool/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/01/27/a-new-scottish-datablog-and-a-treemap-in-liverpool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Word cloud or bar chart?</title>
		<link>http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/</link>
		<comments>http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 07:54:46 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[bar charts]]></category>
		<category><![CDATA[New York Times]]></category>
		<category><![CDATA[tagxedo]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[word clouds]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15743</guid>
		<description><![CDATA[One of the easiest ways to get someone started on data visualisation is to introduce them to word clouds (it also demonstrates neatly how not all data is numerical). Using tools like Wordle and Tagxedo, you can paste in a major speech and see it visualised within a minute or so. But is a word cloud the best way of<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fword-cloud-or-bar-chart%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F01_2F27_2Fword-cloud-or-bar-chart_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fword-cloud-or-bar-chart%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/01/Choice-words1.png"><img class="alignnone  wp-image-15744" src="http://onlinejournalismblog.com/wp-content/uploads/2012/01/Choice-words1.png" alt="Bar charts preferred over word clouds" width="430" height="328" /></a></p>
<p>One of the easiest ways to get someone started on data visualisation is to introduce them to word clouds (it also demonstrates neatly how not all data is numerical).</p>
<p>Using tools like Wordle and Tagxedo, you can paste in a major speech and see it visualised within a minute or so.</p>
<p>But is a word cloud the best way of visualising speeches? The New York Times appear to think otherwise. Their <a href="http://www.nytimes.com/interactive/2012/01/24/us/politics/0124-words.html" onclick="urchinTracker('/outgoing/www.nytimes.com/interactive/2012/01/24/us/politics/0124-words.html?referer=');">visualisation</a> (above) comparing President Obama&#8217;s State of the Union address and speeches by Republican presidential candidates chooses to use something far less fashionable: the bar chart.</p>
<p>Why did they choose a bar chart? The key is the purpose of the chart: <strong>comparison</strong>. If your objective is to capture the spirit of a speech, or its key themes, then a word cloud can still work well, if you clean the data (see <a href="http://www.nytimes.com/interactive/2009/01/17/washington/20090117_ADDRESSES.html" onclick="urchinTracker('/outgoing/www.nytimes.com/interactive/2009/01/17/washington/20090117_ADDRESSES.html?referer=');">this interactive example that appeared on the New York Times in 2009</a>).</p>
<p>But if you want to compare it to speeches of others &#8211; and particularly if you want to compare on specific issues such as employment or tax &#8211; then bar charts are a better choice. Compare, for example, <a href="http://www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php" onclick="urchinTracker('/outgoing/www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php?referer=');">ReadWriteWeb&#8217;s comparison of inaugural speeches</a>, and how effective that is compared to the bar charts.</p>
<p>In short, don&#8217;t always reach for the obvious chart type &#8211; and be clear what you&#8217;re trying to communicate.</p>
<p>UPDATE: <a href="http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/" onclick="urchinTracker('/outgoing/www.niemanlab.org/2011/10/word-clouds-considered-harmful/?referer=');">More criticism of word clouds by New York Times software architect here</a> (<a href="https://twitter.com/#!/harrietebailey/statuses/162885114030858240" onclick="urchinTracker('/outgoing/twitter.com/_/harrietebailey/statuses/162885114030858240?referer=');">via Harriet Bailey</a>)</p>
<div class="wp-caption alignnone" style="width: 437px"><a href="http://www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php" onclick="urchinTracker('/outgoing/www.readwriteweb.com/archives/tag_clouds_of_obamas_inaugural_speech_compared_to_bushs.php?referer=');"><img src="http://rww.readwriteweb.netdna-cdn.com/images/obamaonblack.jpg" alt="Obama inaugural speech word cloud by ReadWriteWeb" width="427" height="239" /></a><p class="wp-caption-text">Obama inaugural speech word cloud by ReadWriteWeb</p></div>
<p><a href="http://flowingdata.com/2012/01/24/words-used-in-sotu-and-republican-presidential-candidates-in-debates/" onclick="urchinTracker('/outgoing/flowingdata.com/2012/01/24/words-used-in-sotu-and-republican-presidential-candidates-in-debates/?referer=');"><em>via Flowing Data</em></a></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F27%2Fword-cloud-or-bar-chart%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/01/27/word-cloud-or-bar-chart/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data journalism awards</title>
		<link>http://onlinejournalismblog.com/2012/01/20/data-journalism-awards/</link>
		<comments>http://onlinejournalismblog.com/2012/01/20/data-journalism-awards/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 08:22:40 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[EJC]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15719</guid>
		<description><![CDATA[Yesterday saw the launch of the first (surprisingly) international data journalism awards, backed by the European Journalism Centre*, Google, and the Global Editors Network. There are 6 awards &#8211; 3 categories, each split into national/international and local/regional subcategories: investigative journalism; visualisation; and apps. Each comes with prize money of 7,500 euros. The closing date for entries is April 10. It&#8217;s<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/01/20/data-journalism-awards/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F20%2Fdata-journalism-awards%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F01_2F20_2Fdata-journalism-awards_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F20%2Fdata-journalism-awards%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Yesterday saw the launch of the first (surprisingly) international <a href="http://datajournalismawards.org/" onclick="urchinTracker('/outgoing/datajournalismawards.org/?referer=');">data journalism awards</a>, backed by the European Journalism Centre*, Google, and the Global Editors Network.</p>
<p>There are <a href="http://datajournalismawards.org/prizes/" onclick="urchinTracker('/outgoing/datajournalismawards.org/prizes/?referer=');">6 awards</a> &#8211; 3 categories, each split into national/international and local/regional subcategories: investigative journalism; visualisation; and apps.</p>
<p>Each comes with prize money of 7,500 euros.</p>
<p>The <a href="http://datajournalismawards.org/selection-process/" onclick="urchinTracker('/outgoing/datajournalismawards.org/selection-process/?referer=');">closing date for entries is April 10</a>. It&#8217;s particularly good to see a <a href="http://datajournalismawards.org/jury/" onclick="urchinTracker('/outgoing/datajournalismawards.org/jury/?referer=');">jury</a> and <a href="http://datajournalismawards.org/selection-process/" onclick="urchinTracker('/outgoing/datajournalismawards.org/selection-process/?referer=');">pre-jury</a> that isn&#8217;t dominated by Anglo-American traditional media, so if your work is unconventionally innovative it stands a decent chance of making it through. There&#8217;s also no specification on where your work is published, so students and independent journalists can enter.</p>
<p>The one thing I&#8217;d like to see in future years is the &#8216;visualisation and storytelling&#8217; category expanded to include non-visual storytelling &#8211; there&#8217;s a tendency to reach for visualisation as a way to communicate data when <a href="http://www.datajournalismblog.com/2011/08/02/6-ways-of-communicating-data-journalism-the-inverted-pyramid-of-data-journalism-part-2/" onclick="urchinTracker('/outgoing/www.datajournalismblog.com/2011/08/02/6-ways-of-communicating-data-journalism-the-inverted-pyramid-of-data-journalism-part-2/?referer=');">other methods could be just as, or more, engaging</a>.</p>
<p><em>*Declaration of interest: I am <a href="http://datadrivenjournalism.net/about/editorial_board" onclick="urchinTracker('/outgoing/datadrivenjournalism.net/about/editorial_board?referer=');">on the editorial board for the EJC&#8217;s Data Driven Journalism project</a>.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F20%2Fdata-journalism-awards%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/01/20/data-journalism-awards/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/01/20/data-journalism-awards/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SFTW: Scraping data with Google Refine</title>
		<link>http://onlinejournalismblog.com/2012/01/13/sftw-scraping-data-with-google-refine/</link>
		<comments>http://onlinejournalismblog.com/2012/01/13/sftw-scraping-data-with-google-refine/#comments</comments>
		<pubDate>Fri, 13 Jan 2012 08:27:12 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[grel]]></category>
		<category><![CDATA[parsehtml]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[Something for the weekend]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15674</guid>
		<description><![CDATA[For the first Something For The Weekend of 2012 I want to tackle a common problem when you&#8217;re trying to scrape a collection of webpage: they have some sort of structure in their URL like this, where part of the URL refers to the name or code of an entity: http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237521 http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237629 http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237823 In this instance, you can see that<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/01/13/sftw-scraping-data-with-google-refine/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F13%2Fsftw-scraping-data-with-google-refine%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F01_2F13_2Fsftw-scraping-data-with-google-refine_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F13%2Fsftw-scraping-data-with-google-refine%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>For the first <a href="http://onlinejournalismblog.com/tag/something-for-the-weekend/">Something For The Weekend</a> of 2012 I want to tackle a common problem when you&#8217;re trying to scrape a collection of webpage: they have some sort of structure in their URL like this, where part of the URL refers to the name or code of an entity:</p>
<ol>
<li><a href="http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237521" onclick="urchinTracker('/outgoing/www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237521&amp;referer=');">http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237521</a></li>
<li><a href="http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237629" onclick="urchinTracker('/outgoing/www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237629&amp;referer=');">http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237629</a></li>
<li><a href="http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237823" onclick="urchinTracker('/outgoing/www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237823&amp;referer=');">http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237823</a></li>
</ol>
<p>In this instance, you can see that the URL is identical apart from a 7 digit code at the end: the ID of the school the data refers to.</p>
<p>There are a number of ways you could scrape this data. You could <a title="using Google Docs and the =importXML formula" href="http://onlinejournalismblog.com/2011/10/14/scraping-data-from-a-list-of-webpages-using-google-docs/">use Google Docs and the =importXML formula</a>, but Google Docs will only let you use this 50 times on any one spreadsheet (you could copy the results and select Edit &gt; Paste Special &gt; Values Only and then use the formula a further 50 times if it&#8217;s not too many &#8211; <a href="https://docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEJ2dFF5YVY0Ml9sX3NURUM5YkdKVHc" onclick="urchinTracker('/outgoing/docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEJ2dFF5YVY0Ml9sX3NURUM5YkdKVHc&amp;referer=');">here&#8217;s one I prepared earlier</a>).</p>
<p>And you could use Scraperwiki to write a powerful scraper &#8211; but you need to understand enough coding to do so quickly (<a href="https://scraperwiki.com/scrapers/free_school_meals_scotland/" onclick="urchinTracker('/outgoing/scraperwiki.com/scrapers/free_school_meals_scotland/?referer=');">here&#8217;s a demo I prepared earlier</a>).</p>
<p>A middle option is to use Google Refine, and here&#8217;s how you do it.</p>
<h2>Assembling the ingredients</h2>
<p>With the <strong>basic URL structure</strong> identified, we already have half of our ingredients. What we need  next is a list of the ID codes that we&#8217;re going to use to complete each URL.</p>
<p>An <a href="http://www.google.co.uk/webhp?rlz=1C1GPCK_enGB454GB455&amp;sourceid=chrome-instant&amp;ix=heb&amp;ie=UTF-8&amp;ion=1#sclient=psy-ab&amp;hl=en&amp;rlz=1C1GPCK_enGB454GB455&amp;site=webhp&amp;source=hp&amp;q=list+seed+number+scottish+schools+filetype:xls&amp;pbx=1&amp;oq=list+seed+number+scottish+schools+filetype:xls&amp;aq=f&amp;aqi=&amp;aql=&amp;gs_sm=e&amp;gs_upl=74020l77151l0l77535l13l12l0l0l0l0l137l1079l7.5l12l0&amp;bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&amp;fp=f9ef8465024f9e21&amp;biw=1280&amp;bih=856&amp;ion=1" onclick="urchinTracker('/outgoing/www.google.co.uk/webhp?rlz=1C1GPCK_enGB454GB455_amp_sourceid=chrome-instant_amp_ix=heb_amp_ie=UTF-8_amp_ion=1_sclient=psy-ab_amp_hl=en_amp_rlz=1C1GPCK_enGB454GB455_amp_site=webhp_amp_source=hp_amp_q=list+seed+number+scottish+schools+filetype_xls_amp_pbx=1_amp_oq=list+seed+number+scottish+schools+filetype_xls_amp_aq=f_amp_aqi=_amp_aql=_amp_gs_sm=e_amp_gs_upl=74020l77151l0l77535l13l12l0l0l0l0l137l1079l7.5l12l0_amp_bav=on.2_or.r_gc.r_pw.r_cp._cf.osb_amp_fp=f9ef8465024f9e21_amp_biw=1280_amp_bih=856_amp_ion=1&amp;referer=');">advanced search for &#8220;list seed number scottish schools filetype:xls</a>&#8221; brings up a link to <a href="http://www.scotland.gov.uk/stats/sources/adds.xls" onclick="urchinTracker('/outgoing/www.scotland.gov.uk/stats/sources/adds.xls?referer=');">this spreadsheet (XLS)</a> which gives us just that.</p>
<p>The spreadsheet will need editing: <strong>remove any rows you don&#8217;t need.</strong> This will reduce the time that the scraper will take in going through them. For example, if you&#8217;re only interested in one local authority, or one type of school, sort your spreadsheet so that you can delete those above or below them.</p>
<p>Now to combine  the ID codes with the base URL.</p>
<h2>Bringing your data into Google Refine</h2>
<p>Open Google Refine and create a new project with the edited spreadsheet containing the school IDs.</p>
<p>At the top of the school ID column click on the drop-down menu and select <strong>Edit column &gt; Add column based on this column&#8230;</strong></p>
<p>In the <em>New column name</em> box at the top call this &#8216;URL&#8217;.</p>
<p>In the <em>Expression</em> box type the following piece of GREL (Google Refine Expression Language):</p>
<p>&#8220;http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=&#8221;+value</p>
<p>(<em>Type in the quotation marks yourself &#8211; if you&#8217;re copying them from a webpage you may have problems</em>)</p>
<p>The &#8216;value&#8217; bit means the value of each cell in the column you just selected. The plus sign adds it to the end of the URL in quotes.</p>
<p>In the <em>Preview</em> window you should see the results &#8211; you can even copy one of the resulting URLs and paste it into a browser to check it works. (<em>On one occasion Google Refine added .0 to the end of the ID number, ruining the URL. You can solve this by changing &#8216;value&#8217; to </em>value.substring(0,7)<em> &#8211; this extracts the first 7 characters of the ID number, omitting the &#8216;.0&#8242;</em>)</p>
<p>Click <strong>OK</strong> if you&#8217;re happy, and you should have a new column with a URL for each school ID.</p>
<h2>Grabbing the HTML for each page</h2>
<p>Now click on the top of this new URL column and select <strong>Edit column &gt; Add column by fetching URLs&#8230;</strong></p>
<p>In the <em>New column name</em> box at the top call this &#8216;HTML&#8217;.</p>
<p>All you need in the <em>Expression</em> window is &#8216;value&#8217;, so leave that as it is.</p>
<p>Click <strong>OK</strong>.</p>
<p>Google Refine will now go to each of those URLs and fetch the HTML contents. As we have a couple thousand rows here, this will take a long time &#8211; hours, depending on the speed of your computer and internet connection (it may not work at all if either isn&#8217;t very fast). So leave it running and come back to it later.</p>
<h2>Extracting data from the raw HTML with parseHTML</h2>
<p>When it&#8217;s finished you&#8217;ll have another column where each cell is a bunch of HTML. You&#8217;ll need to create a new column to extract what you need from that, and you&#8217;ll also <a href="http://code.google.com/p/google-refine/wiki/StrippingHTML" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/StrippingHTML?referer=');">need some GREL expressions explained here</a>.</p>
<p>First you need to identify what data you want, and where it is in the HTML. To find it, right-click on one of the webpages containing the data, and search for a key phrase or figure that you want to extract. Around that data you want to find a HTML tag like &lt;table class=&#8221;destinations&#8221;&gt; or &lt;div id=&#8221;statistics&#8221;&gt;. Keep that open in another window while you tweak the expression we come onto below&#8230;</p>
<p>Back in Google Refine, at the top of the HTML column click on the drop-down menu and select <strong>Edit column &gt; Add column based on this column&#8230;</strong></p>
<p>In the <em>New column name</em> box at the top give it a name describing the data you&#8217;re going to pull out.</p>
<p>In the <em>Expression</em> box type the following piece of GREL (Google Refine Expression Language):</p>
<p><a>value.parseHtml().select(&#8220;table.destinations&#8221;)[0].select(&#8220;tr&#8221;).toString()</a></p>
<p><em>(Again, type the quotation marks yourself rather than copying them from here or you may have problems</em>)</p>
<p>I&#8217;ll break down what this is doing:</p>
<p><a>value.parseHtml()</a></p>
<p><em>parse the HTML in each cell (value)</em></p>
<p><a>.select(&#8220;table.destinations&#8221;)</a></p>
<p><em>find a table with a class (.) of &#8220;destinations&#8221; (in the source HTML this reads &lt;table class=&#8221;destinations&#8221;&gt;. If it was &lt;div id=&#8221;statistics&#8221;&gt; then you would write .select(&#8220;div#statistics&#8221;) &#8211; the hash sign representing an &#8216;id&#8217; and the full stop representing a &#8216;class&#8217;.</em></p>
<p>[0]</p>
<p><em>This zero in square brackets tells Refine to only grab the first table &#8211; a number 1 would indicate the second, and so on. This is because numbering (&#8220;indexing&#8221;) generally begins with zero in programming.</em></p>
<p><a>.select(&#8220;tr&#8221;)</a></p>
<p><em>Now, within that table, find anything within the tag &lt;tr&gt;</em></p>
<p><a>.toString()</a></p>
<p><em>And convert the results into a string of text</em>.</p>
<p>The results of that expression in the <em>Preview</em> window should look something like this:</p>
<p>&lt;tr&gt; &lt;th&gt;&lt;/th&gt; &lt;th&gt;Abbotswell School&lt;/th&gt; &lt;th&gt;Aberdeen City&lt;/th&gt; &lt;th&gt;Scotland&lt;/th&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;Percentage of pupils&lt;/th&gt; &lt;td&gt;25.5%&lt;/td&gt; &lt;td&gt;16.3%&lt;/td&gt; &lt;td&gt;22.6%&lt;/td&gt; &lt;/tr&gt;</p>
<p>This is still HTML, but a much smaller and manageable chunk. You could, if you chose, now export it as a spreadsheet file and use various techniques to get rid of the tags (Find and Replace, for example) and split the data into separate columns (the <a href="http://excelnotes.posterous.com/splitting-a-vote-or-other-piece-of-data-into" onclick="urchinTracker('/outgoing/excelnotes.posterous.com/splitting-a-vote-or-other-piece-of-data-into?referer=');">=SPLIT formula, for example</a>).</p>
<p>Or you could further tweak your GREL code in Refine to drill further into your data, like so:</p>
<p>value.parseHtml().select(&#8220;table.destinations&#8221;)[0].select(&#8220;td&#8221;)[0].toString()</p>
<p>Which would give you this:</p>
<p>&lt;td&gt;25.5%&lt;/td&gt;</p>
<p>Or you can add the .substring function to strip out the HTML like so (assuming that the data you want is always 5 characters long):</p>
<p>value.parseHtml().select(&#8220;table.destinations&#8221;)[0].select(&#8220;td&#8221;)[0].toString().substring(5,10)</p>
<p>When you&#8217;re happy, click <strong>OK</strong> and you should have a new column for that data. You can repeat this for every piece of data you want to extract into a new column.</p>
<p>Then click <strong>Export</strong> in the upper right corner and save as a CSV or Excel file.</p>
<p><em><a title="Help Me Investigate Education - Scottish schools free school meals data" href="http://helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/?referer=');">More on how this data was used on Help Me Investigate Education</a>.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F13%2Fsftw-scraping-data-with-google-refine%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/01/13/sftw-scraping-data-with-google-refine/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/01/13/sftw-scraping-data-with-google-refine/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The test of data journalism: checking the claims of lobbyists via government</title>
		<link>http://onlinejournalismblog.com/2012/01/11/the-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government/</link>
		<comments>http://onlinejournalismblog.com/2012/01/11/the-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 09:58:46 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[lobbying]]></category>
		<category><![CDATA[olympics]]></category>
		<category><![CDATA[Simon Jenkins]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15686</guid>
		<description><![CDATA[While the public image of data journalism tends to revolve around big data dumps and headline-grabbing leaks, there is a more important day-to-day application of data skills: scrutinising the claims regularly made in support of spending public money. I&#8217;m blogging about this now because I recently came across a particularly good illustration of politicians being dazzled by numbers from lobbyists<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2012/01/11/the-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F11%2Fthe-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F01_2F11_2Fthe-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F11%2Fthe-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div class="wp-caption alignnone" style="width: 510px"><a href="http://www.flickr.com/photos/bearpark/3085947627/" onclick="urchinTracker('/outgoing/www.flickr.com/photos/bearpark/3085947627/?referer=');"><img src="http://farm4.staticflickr.com/3205/3085947627_47ff325f27.jpg" alt="Day 341 - Pull The Wool Over My Eyes - image by Simon James" width="500" height="405" /></a><p class="wp-caption-text">Day 341 - Pull The Wool Over My Eyes - image by Simon James</p></div>
<p>While the public image of data journalism tends to revolve around big data dumps and headline-grabbing leaks, there is a more important day-to-day application of data skills: scrutinising the claims regularly made in support of spending public money.</p>
<p>I&#8217;m blogging about this now because I recently came across a particularly good illustration of politicians being dazzled by numbers from lobbyists (that journalists should be checking) in <a href="http://www.guardian.co.uk/commentisfree/2011/dec/20/government-draconian-casual-dodgy-private-cash?cat=commentisfree&amp;type=article" onclick="urchinTracker('/outgoing/www.guardian.co.uk/commentisfree/2011/dec/20/government-draconian-casual-dodgy-private-cash?cat=commentisfree_amp_type=article&amp;referer=');">this article by Simon Jenkins</a>, from which I&#8217;ll quote at length:</p>
<blockquote><p>&#8220;This government, so draconian towards spending in public, is proving as casual towards dodgy money in private as were Tony Blair and Gordon Brown. Earlier this month the Olympics boss, Lord Coe, <a title="" href="http://www.guardian.co.uk/sport/2011/dec/08/lord-coe-london-2012-olympics" onclick="urchinTracker('/outgoing/www.guardian.co.uk/sport/2011/dec/08/lord-coe-london-2012-olympics?referer=');">moseyed into Downing Street and said that his opening and closing ceremonies were looking a bit mean at £40m</a>. Could he double it to £81m for more tinsel? Rather than scream and kick him downstairs, David Cameron said: my dear chap, but of course. I wonder what the prime minister would have said if his lordship had been asking for a care home, a library or a clinic.</p>
<p>&#8220;Much of the trouble comes down to the inexperience of ingenue ministers, and their susceptibility to the pestilence of lobbying now infecting Westminster. On this occasion the hapless Olympics minister, Hugh Robertson, claimed that the extra £41m was &#8220;worth £2-5bn in advertising revenue alone&#8221;, a rate of return so fanciful as to suggest a lobbyist&#8217;s lunch beyond all imagining. Robertson also claimed to need another £271m for games security (not to mention 10,000 troops, warships and surface-to-air missiles), despite it being &#8220;not in response to any specific security threat&#8221;. It was just money.</p>
<p>&#8220;This was merely the climax of naivety. In their first month in office, ministers were told – and believed – that it would be &#8220;more expensive&#8221; to cancel two new aircraft carriers than to build them. Ministers were told it would cost £2bn to cancel Labour&#8217;s crazy NHS computer rather than dump it in the nearest skip. Chris Huhne, darling of the renewables industry, wants to give it £8bn a year to rescue the planet, one of the quickest ways of transferring money from poor consumer to rich landowner yet found. The chancellor, George Osborne, was told by lobbyists he could save £3bn a year by giving away commercial planning permissions. All this was statistical rubbish.</p>
<p>&#8220;If local government behaved as credulously as Whitehall it would be summoned before the audit commission and subject to surcharge.&#8221;</p></blockquote>
<p>And if you want to keep an eye on such claims, <a href="http://www.google.co.uk/search?hl=en&amp;gl=uk&amp;tbm=nws&amp;btnmeta_news_search=1&amp;q=%22more+expensive+to+cancel%22&amp;oq=%22more+expensive+to+cancel%22&amp;aq=f&amp;aqi=d1d-o1&amp;aql=&amp;gs_sm=e&amp;gs_upl=734l52838l0l53032l37l31l0l20l0l0l188l1171l6.5l11l0#sclient=psy-ab&amp;hl=en&amp;gl=uk&amp;tbm=nws&amp;source=hp&amp;q=%22save+*+a+year%22+site:co.uk&amp;pbx=1&amp;oq=%22save+*+a+year%22+site:co.uk&amp;aq=f&amp;aqi=&amp;aql=&amp;gs_sm=e&amp;gs_upl=4263l6608l1l6789l11l9l0l0l0l0l150l744l7.2l9l0&amp;bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&amp;fp=dc3cf9786b613072&amp;biw=1280&amp;bih=899" onclick="urchinTracker('/outgoing/www.google.co.uk/search?hl=en_amp_gl=uk_amp_tbm=nws_amp_btnmeta_news_search=1_amp_q=_22more+expensive+to+cancel_22_amp_oq=_22more+expensive+to+cancel_22_amp_aq=f_amp_aqi=d1d-o1_amp_aql=_amp_gs_sm=e_amp_gs_upl=734l52838l0l53032l37l31l0l20l0l0l188l1171l6.5l11l0_sclient=psy-ab_amp_hl=en_amp_gl=uk_amp_tbm=nws_amp_source=hp_amp_q=_22save+_+a+year_22+site_co.uk_amp_pbx=1_amp_oq=_22save+_+a+year_22+site_co.uk_amp_aq=f_amp_aqi=_amp_aql=_amp_gs_sm=e_amp_gs_upl=4263l6608l1l6789l11l9l0l0l0l0l150l744l7.2l9l0_amp_bav=on.2_or.r_gc.r_pw.r_cp._cf.osb_amp_fp=dc3cf9786b613072_amp_biw=1280_amp_bih=899&amp;referer=');">try a Google News search like this one</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F01%2F11%2Fthe-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/01/11/the-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/01/11/the-test-of-data-journalism-checking-the-claims-of-lobbyists-via-government/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

