<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; data journalism</title>
	<atom:link href="http://onlinejournalismblog.com/category/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Thu, 24 May 2012 08:39:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Create a council ward map with Scraperwiki</title>
		<link>http://onlinejournalismblog.com/2012/05/02/create-a-council-ward-map-with-scraperwiki/</link>
		<comments>http://onlinejournalismblog.com/2012/05/02/create-a-council-ward-map-with-scraperwiki/#comments</comments>
		<pubDate>Wed, 02 May 2012 10:19:56 +0000</pubDate>
		<dc:creator>danielbentley</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[council wards]]></category>
		<category><![CDATA[elections]]></category>
		<category><![CDATA[fusion tables]]></category>
		<category><![CDATA[guest post]]></category>
		<category><![CDATA[KML]]></category>
		<category><![CDATA[mapit]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[scraperwiki]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=16290</guid>
		<description><![CDATA[With local elections looming this is a great 20-30 minute project for any journalist wanting to create an interactive Google map of council ward boundaries. For this you will need: A Google account with Docs A Scraperwiki account Access to webspace to host an html file Firstly we want to scrape the council ward geometry [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F05%2F02%2Fcreate-a-council-ward-map-with-scraperwiki%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F05_2F02_2Fcreate-a-council-ward-map-with-scraperwiki_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F05%2F02%2Fcreate-a-council-ward-map-with-scraperwiki%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.16.41.png"><img src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.16.41.png" alt="Mapping council wards" width="398" height="294" /></a></p>
<p>With local elections looming this is a great 20-30 minute project for any journalist wanting to create an interactive Google map of council ward boundaries.</p>
<p>For this you will need:</p>
<ul>
<li><a href="http://docs.googlecom" target="_blank" onclick="urchinTracker('/outgoing/docs.googlecom?referer=');">A Google account with Docs</a></li>
<li><a href="https://scraperwiki.com/login/#signup" target="_blank" onclick="urchinTracker('/outgoing/scraperwiki.com/login/_signup?referer=');">A Scraperwiki account</a></li>
<li>Access to webspace to host an html file<span id="more-16290"></span></li>
</ul>
<p>Firstly we want to scrape the council ward geometry data held by <a href="http://mapit.mysociety.org" target="_blank" onclick="urchinTracker('/outgoing/mapit.mysociety.org?referer=');">MaPit by mysociety.org</a> and spit it out into a CSV file format that is compatible with Google&#8217;s mapping tools.</p>
<h2>Getting the ID for the council ward</h2>
<p>Go to the MaPit homepage and use the postcode search for a point in the town/city you want the ward data for. In the example I&#8217;ve searched using a Preston postcode.</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.31.07.png"><img class="alignnone size-full wp-image-16295" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.31.07.png" alt="" width="830" height="467" /></a></p>
<p>Then in the results page find the council you want data for and note down the id number next to it.</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.31.43.png"><img class="alignnone size-full wp-image-16294" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.31.43.png" alt="" width="827" height="573" /></a></p>
<h2>Adapting a scraper to scrape that council ward&#8217;s geometry</h2>
<p>Now login to your Scraperwiki and visit this page for <a href="https://scraperwiki.com/scrapers/council_ward_geometry/" target="_blank" onclick="urchinTracker('/outgoing/scraperwiki.com/scrapers/council_ward_geometry/?referer=');">reclosedev&#8217;s council ward scraper</a>. Click &#8216;Copy&#8217; and you&#8217;ll be taken to a code editor page.</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.39.16.png"><img class="alignnone size-full wp-image-16296" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.39.16.png" alt="" width="956" height="147" /></a></p>
<p>On Line 10 of the code you should see:</p>
<pre style="padding-left: 30px">10: PARENT_ID = 2366  # Preston City Council</pre>
<p>Change the 2366 value to the MaPit id of your council and change Preston City Council (anything after <strong>#</strong> is a <strong>comment</strong> and isn&#8217;t important to the code but it is useful to keep track of what you&#8217;re scraping).</p>
<p>Once you&#8217;d done this hit &#8216;Save Scraper&#8217;  the &#8216;Back to scraper overview&#8217;. This will take you to your own Scraperwiki page where the scraper is saved. It would be useful at this point to click on the pen symbol next to the scraper name and rename it &#8216;Your Council Ward geometry data&#8217;.</p>
<p>Then click <strong>RUN</strong> (or CTRL+R) to run your scraper, and wait a while for it to complete (usually no more than a couple of minutes).</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.46.47.png"><img class="alignnone size-medium wp-image-16297" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.46.47-300x57.png" alt="" width="300" height="57" /></a></p>
<p>When it has completed running, click <strong>Back to scraper overview</strong> (upper right) and on the section titled <strong>This scraper&#8217;s datastore</strong> click the <strong>swdata</strong> tab and you should see something similar to this:</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.48.38.png"><img class="alignnone size-full wp-image-16298" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-19.48.38.png" alt="" width="944" height="608" /></a></p>
<p>The first column contains the shape/geometry data of the council ward and the third column contains its name. Does it look right? Then hit <strong>download</strong> (in the upper right of this image) and choose <strong>CSV</strong> as the format.</p>
<p>CSV or comma separate values is an open table format readable by Excel, OpenOffice and Google Docs and any text editor.</p>
<p>We&#8217;re going to use Google Fusion Tables to convert the data to a map.</p>
<h2>Mapping the data</h2>
<p>Head over to your Google Docs account (or Drive if you&#8217;ve been switched over) and hit <strong>Create &gt; Table</strong>. When it asks you to import a new table choose and upload the CSV file you downloaded. On the next page you&#8217;ll be asked to specify the columns to import, leave this page as default and click <strong>Next</strong>.</p>
<p>On the next page name your table and attribute the data to the MaPit. Click <strong>Finish</strong>. We&#8217;re nearly there!</p>
<p>Under the <strong>Visualize</strong> menu item click <strong>Map</strong> and you&#8217;ll probably see a red blob similar to this.</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.02.08.png"><img class="alignnone size-full wp-image-16300" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.02.08.png" alt="" width="1025" height="588" /></a></p>
<p>If you want to make it prettier you can style this clicking &#8216;<strong>Configure styles&#8217;</strong>  and changing the settings for &#8216;<strong>Polygons</strong>&#8216;.</p>
<p>When you&#8217;re happy with the settings for that click inside one of your council wards. You should get a window like this:</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.06.04.png"><img class="alignnone size-full wp-image-16301" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.06.04.png" alt="" width="569" height="414" /></a></p>
<p>Not all that informative is it? To make it a bit more useful click &#8216;Configure info window&#8217; then select the &#8216;Custom&#8217; tab, delete what&#8217;s there and enter this code.</p>
<blockquote>
<pre>&lt;div class='googft-info-window' style='font-family: sans-serif'&gt;
 &lt;h2&gt;&lt;a href="http://mapit.mysociety.org/area/{id}.html" target="_blank"&gt;{name}&lt;/a&gt;&lt;/h2&gt;&lt;br&gt;
 &lt;/div&gt;</pre>
</blockquote>
<p>Which will give you a link to the MapIt within the information box:</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.16.41.png"><img class="alignnone size-full wp-image-16307" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.16.41.png" alt="" width="569" height="420" /></a></p>
<p>The final step is to embed this map in your webpage.</p>
<p>First click &#8216;<strong>Share</strong>&#8216; in the top right corner of the table page (not the Google+ sharebox) and change to either Public or Unlisted. Then hit <strong>File &gt; About</strong> and note down the Numeric ID.</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.19.52.png"><img class="alignnone size-full wp-image-16308" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.19.52.png" alt="" width="584" height="116" /></a></p>
<p>Go to the <a href="http://fusion-tables-api-samples.googlecode.com/svn/trunk/FusionTablesLayerWizard/src/index.html" target="_blank" onclick="urchinTracker('/outgoing/fusion-tables-api-samples.googlecode.com/svn/trunk/FusionTablesLayerWizard/src/index.html?referer=');">FusionTables LayerWizard</a> and enter this id in the first box. For location column select &#8216;<strong>kml</strong>&#8216; then hit &#8216;<strong>Put layer on map</strong>&#8216;.</p>
<p><em>Optional</em>: Click &#8216;<strong>Add a search feature</strong>&#8216; and &#8216;<strong>select based search</strong>&#8216;. For &#8216;<em>Select Label</em>&#8216; enter Search by ward name and for &#8216;<em>Column to query</em>&#8216; select &#8216;name&#8217;. Then <strong>add</strong>. What this allows you to do is to easily select the ward you want to view.</p>
<p>Zoom in to the Preview map to how you want the map displayed. Then copy and paste the code generated below into an html editor or plain text editor of your choice.</p>
<p>Replace lines 4-6</p>
<blockquote>
<pre>&lt;style&gt;
 #map-canvas { width:500px; height:400px; }
 &lt;/style&gt;</pre>
</blockquote>
<p>with this</p>
<blockquote>
<pre>&lt;style type="text/css"&gt;
 html { height: 100% }
 body { height: 100%; margin: 0; padding: 0 }
 #map-canvas { height: 100% }
 &lt;/style&gt;</pre>
</blockquote>
<p>Then cut line 43</p>
<blockquote>
<pre>&lt;div id="map-canvas"&gt;&lt;/div&gt;</pre>
</blockquote>
<p>and paste it at the end of the page between &lt;/div&gt; and &lt;/body&gt; like so&#8230;</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.42.43.png"><img class="alignnone size-full wp-image-16311" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.42.43.png" alt="" width="532" height="226" /></a></p>
<p>Save the file with the .html extension and try opening it in your browser. It should look a little like this.</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.45.32.png"><img class="alignnone size-large wp-image-16312" src="http://onlinejournalismblog.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-01-at-20.45.32-1024x605.png" alt="" width="1024" height="605" /></a></p>
<p>Upload this page to your webspace. Now you can either link to it or embed it in an iframe. This sample code should work for most purposes.</p>
<blockquote>
<pre>&lt;iframe id="ifrm" src="http://your.map.page/here.html" width="NUMBEROFPIXELS" height="NUMBER OF PIXELS"&gt;Your browser does not support iframes.&lt;/iframe&gt;</pre>
</blockquote>
<p><a href="http://openpreston.appspot.com/mapdemo.html" target="_blank" onclick="urchinTracker('/outgoing/openpreston.appspot.com/mapdemo.html?referer=');">Ta da!</a></p>
<p>This is really just scratching the surface of what Scraperwiki and Fusion Tables can do but I hope it served as an easy-ish introduction to them both. If this tutorial did not work for you or if you have any questions then leave a comment and I&#8217;ll help out as much I can.</p>
<p>&nbsp;</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F05%2F02%2Fcreate-a-council-ward-map-with-scraperwiki%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/05/02/create-a-council-ward-map-with-scraperwiki/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/05/02/create-a-council-ward-map-with-scraperwiki/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Free Data Journalism Handbook launched</title>
		<link>http://onlinejournalismblog.com/2012/04/27/free-data-journalism-handbook-launched-tomorrow/</link>
		<comments>http://onlinejournalismblog.com/2012/04/27/free-data-journalism-handbook-launched-tomorrow/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 13:04:40 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[Data Journalism Handbook]]></category>
		<category><![CDATA[ebook]]></category>
		<category><![CDATA[european journalism centre]]></category>
		<category><![CDATA[Liliana Bounegru]]></category>
		<category><![CDATA[open knowledge foundation]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=16267</guid>
		<description><![CDATA[I&#8217;ve contributed to a &#8220;free, open-source book that aims to help journalists to use data to improve the news&#8221; &#8211; and it will be has now been published online tomorrow (Saturday 28th April) The Data Journalism Handbook was coordinated by the European Journalism Centre and the Open Knowledge Foundation (in particular Liliana Bounegru), and includes [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F27%2Ffree-data-journalism-handbook-launched-tomorrow%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F04_2F27_2Ffree-data-journalism-handbook-launched-tomorrow_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F27%2Ffree-data-journalism-handbook-launched-tomorrow%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><img src="http://farm8.staticflickr.com/7115/7038139465_7f4c52e748_o.jpg" alt="Data Journalism Handbook" width="540" height="450" /></p>
<p>I&#8217;ve contributed to a &#8220;free, open-source book that aims to help journalists to use data to improve the news&#8221; &#8211; and it <del>will be</del> has now been published online <del>tomorrow</del> (Saturday 28th April)</p>

<p>The <a href="http://www.datajournalismhandbook.org/" onclick="urchinTracker('/outgoing/www.datajournalismhandbook.org/?referer=');">Data Journalism Handbook</a> was coordinated by the <a href="http://www.ejc.net/" target="_blank" onclick="urchinTracker('/outgoing/www.ejc.net/?referer=');">European Journalism Centre</a> and the <a href="http://www.okfn.org/" target="_blank" onclick="urchinTracker('/outgoing/www.okfn.org/?referer=');">Open Knowledge Foundation</a> (in particular <strong>Liliana Bounegru</strong>), and includes contributions from:</p>
<blockquote><p>&#8220;Dozens of data journalism&#8217;s leading advocates and best practitioners &#8211; including from Australian Broadcasting Corporation, the BBC, the Chicago Tribune, Deutsche Welle, the Guardian, the Financial Times, Helsingin Sanomat, La Nacion, the New York Times, ProPublica, the Washington Post, the Texas Tribune, Verdens Gang, Wales Online, Zeit Online and many others.&#8221;</p></blockquote>
<p>The book <del>will be</del> is available for download at<a href="http://www.datajournalismhandbook.org/" target="_blank" onclick="urchinTracker('/outgoing/www.datajournalismhandbook.org/?referer=');"> datajournalismhandbook.org</a> under a Creative Commons Attribution ShareAlike License. There will also be a printed and e-book version published by O’Reilly Media.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F27%2Ffree-data-journalism-handbook-launched-tomorrow%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/04/27/free-data-journalism-handbook-launched-tomorrow/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/04/27/free-data-journalism-handbook-launched-tomorrow/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Step by step: how to start in a data journalist role</title>
		<link>http://onlinejournalismblog.com/2012/04/23/step-by-step-how-to-start-in-a-data-journalist-role/</link>
		<comments>http://onlinejournalismblog.com/2012/04/23/step-by-step-how-to-start-in-a-data-journalist-role/#comments</comments>
		<pubDate>Mon, 23 Apr 2012 07:23:58 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data blogging]]></category>
		<category><![CDATA[finding data]]></category>
		<category><![CDATA[gathering data]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=16094</guid>
		<description><![CDATA[Following my previous posts on the network journalist and community manager roles as part of an investigation team, this post expands on the first steps a student journalist can take in filling the data journalist role. 1: Brainstorm data that might be relevant to your investigation or field Before you begin digging for data, it&#8217;s worth mapping out the territory you&#8217;re working [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F23%2Fstep-by-step-how-to-start-in-a-data-journalist-role%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F04_2F23_2Fstep-by-step-how-to-start-in-a-data-journalist-role_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F23%2Fstep-by-step-how-to-start-in-a-data-journalist-role%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://onlinejournalismblog.com/2012/02/02/moving-away-from-the-story-5-roles-of-an-online-investigations-team/"><img class="alignnone" src="http://onlinejournalismblog.com/wp-content/uploads/2012/02/OJ_investigations_team.jpg" alt="Investigations team flowchart " width="496" height="389" /></a></p>
<p>Following my previous <a title="How to be a network journalist" href="http://onlinejournalismblog.com/2012/03/13/how-to-be-a-network-journalist/">posts on the network journalist</a> and <a title="How to be a community manager" href="http://onlinejournalismblog.com/2012/03/27/6-ways-to-get-started-in-community-management/">community manager roles</a> as part of an <a title="5 roles in investigations team" href="http://onlinejournalismblog.com/2012/02/02/moving-away-from-the-story-5-roles-of-an-online-investigations-team/">investigation team</a>, this post expands on<strong> the first steps a student journalist can take in filling the data journalist</strong> <strong>role</strong>.</p>
<h2>1: Brainstorm data that might be relevant to your investigation or field</h2>
<p>Before you begin digging for data, it&#8217;s worth mapping out the territory you&#8217;re working in. Some key questions to ask include:</p>
<ul>
<li>Who <strong>measures or monitors</strong> your field? For example:</li>
<ul>
<li>regulators and inspectors</li>
<li>charities (try searching by keyword on the <a href="http://www.charity-commission.gov.uk/" onclick="urchinTracker('/outgoing/www.charity-commission.gov.uk/?referer=');">Charity Commission</a> or <a href="http://opencharities.org/" onclick="urchinTracker('/outgoing/opencharities.org/?referer=');">OpenCharities</a>)</li>
<li>campaigning groups</li>
<li>central government (the <a href="http://www.direct.gov.uk/en/Dl1/Directories/A-ZOfCentralGovernment/index.htm" onclick="urchinTracker('/outgoing/www.direct.gov.uk/en/Dl1/Directories/A-ZOfCentralGovernment/index.htm?referer=');">department and/or agency responsible</a>, e.g. Ministry of Justice, BIS, etc. &#8211; there may also be <a href="http://www.dh.gov.uk/health/about-us/people/ministers/" onclick="urchinTracker('/outgoing/www.dh.gov.uk/health/about-us/people/ministers/?referer=');">specific ministers</a>)</li>
<li>local government (local authorities or <a href="http://www.nhs.uk/ServiceDirectories/Pages/PrimaryCareTrustListing.aspx" onclick="urchinTracker('/outgoing/www.nhs.uk/ServiceDirectories/Pages/PrimaryCareTrustListing.aspx?referer=');">primary care trusts</a>, <a href="http://www.direct.gov.uk/en/Diol1/DoItOnline/DG_4017475" onclick="urchinTracker('/outgoing/www.direct.gov.uk/en/Diol1/DoItOnline/DG_4017475?referer=');">police force, etc</a>.)</li>
<li>select committees (browse <a href="http://www.parliament.uk/topics/Topical-Issues.htm" onclick="urchinTracker('/outgoing/www.parliament.uk/topics/Topical-Issues.htm?referer=');">parliamentary research indexed here</a>, or try a <a href="https://www.google.co.uk/webhp?rlz=1C1GPCK_enGB454GB455&amp;sourceid=chrome-instant&amp;ix=seb&amp;ie=UTF-8&amp;ion=1#hl=en&amp;rlz=1C1GPCK_enGB454GB455&amp;sclient=psy-ab&amp;q=select+committee+on+schools&amp;oq=select+committee+on+schools&amp;aq=f&amp;aqi=g-v2&amp;aql=&amp;gs_l=serp.3..0i15l2.5267l7529l3l7662l16l16l0l9l9l0l124l598l4j3l7l0.frgbld.&amp;bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&amp;fp=3181f87014c84e6c&amp;biw=1280&amp;bih=899" onclick="urchinTracker('/outgoing/www.google.co.uk/webhp?rlz=1C1GPCK_enGB454GB455_amp_sourceid=chrome-instant_amp_ix=seb_amp_ie=UTF-8_amp_ion=1_hl=en_amp_rlz=1C1GPCK_enGB454GB455_amp_sclient=psy-ab_amp_q=select+committee+on+schools_amp_oq=select+committee+on+schools_amp_aq=f_amp_aqi=g-v2_amp_aql=_amp_gs_l=serp.3..0i15l2.5267l7529l3l7662l16l16l0l9l9l0l124l598l4j3l7l0.frgbld._amp_bav=on.2_or.r_gc.r_pw.r_cp.r_qf._cf.osb_amp_fp=3181f87014c84e6c_amp_biw=1280_amp_bih=899&amp;referer=');">specific search</a>)</li>
<li>general statistical/audit bodies such as <a href="http://www.ons.gov.uk/ons/index.html" onclick="urchinTracker('/outgoing/www.ons.gov.uk/ons/index.html?referer=');">ONS</a> or the <a href="http://www.audit-commission.gov.uk/Pages/default.aspx" onclick="urchinTracker('/outgoing/www.audit-commission.gov.uk/Pages/default.aspx?referer=');">Audit Commission</a>.<span id="more-16094"></span></li>
</ul>
<li>Where is <strong>spending</strong> recorded? This might be at both a <a href="http://openlylocal.com/councils/spending" onclick="urchinTracker('/outgoing/openlylocal.com/councils/spending?referer=');">local</a> and <a href="http://wheredoesmymoneygo.org/spending.html" onclick="urchinTracker('/outgoing/wheredoesmymoneygo.org/spending.html?referer=');">national</a> level.</li>
<li>What are the key <strong>things that might be measured</strong> in your field? For example, in prisons they might be interested in reoffending, or overcrowding, or staffing.</li>
<li>Can you find historical data?</li>
<li>What data do you need to provide <strong>basic context</strong>? e.g.</li>
<ul>
<li>Where &#8211; addresses for all institutions in your field (e.g. schools, prisons, etc.)</li>
<li>Codes &#8211; often these are used instead of institution or area names</li>
<li>Who &#8211; names of those responsible for particular aspects of your field</li>
<li>Demographics &#8211; the distribution of age, gender, ethnicity, industries, wealth, property or other elements may be important to your work</li>
<li>Politics &#8211; who is in charge in each area (local authority and local MP)</li>
</ul>
<li>How could you collate data that doesn&#8217;t exist? E.g. public awareness of something; or how the policies of different bodies compare, etc.</li>
</ul>
<p>Sometimes the simplest and quickest way to find out these things is to pick up the phone and speak to someone in a relevant organisation and ask them: what information is collected about your field, and by whom?</p>
<p>You can also make content from this process of research: <strong>post a guide to how your field is regulated and measured</strong> (and what information isn&#8217;t); <strong>who&#8217;s who in your field</strong> - the regulators, monitors, politicians and bodies that all have a hand in keeping it on track.</p>
<h2>2. Learn advanced techniques to obtain that data</h2>
<p>Once you&#8217;ve mapped it all out you can start to prioritise the datasets that are most relevant to your particular investigation. You may need to use different techniques to get hold of these, including:</p>
<ul>
<li>Advanced search techniques (<a href="http://www.googleguide.com/advanced_operators.html" onclick="urchinTracker('/outgoing/www.googleguide.com/advanced_operators.html?referer=');">limit by filetype:, site:, etc.</a>)</li>
<li>Simply picking up the phone to call the relevant department (try to get as much detailed data as possible rather than aggregate, i.e. very general, figures)</li>
<li><a href="http://helpmeinvestigate.posterous.com/tag/foi" onclick="urchinTracker('/outgoing/helpmeinvestigate.posterous.com/tag/foi?referer=');">Using FOI requests</a></li>
</ul>
<p><iframe width="500" height="281" src="http://www.youtube.com/embed/Ciz_FUo2JMg?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<ul>
<li><a title="Converting PDFs to spreadsheets" href="http://helpmeinvestigate.posterous.com/7-ways-to-get-data-out-of-pdfs" onclick="urchinTracker('/outgoing/helpmeinvestigate.posterous.com/7-ways-to-get-data-out-of-pdfs?referer=');">Converting PDFs into spreadsheets</a></li>
<li><a href="http://onlinejournalismblog.com/tag/scraping/">Scraping</a></li>
</ul>
<p>Again, you can make content from this process, for example: &#8220;<strong>How we found&#8230;</strong>&#8221; or &#8220;<strong>Why we&#8217;re asking the MoJ for&#8230;</strong>&#8221; (with a link to the FOI request) or &#8220;<strong><a title="welfare data" href="http://helpmeinvestigate.com/welfare/data-disability-and-other-hate-crime" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/welfare/data-disability-and-other-hate-crime?referer=');">Get the data</a></strong>&#8221; (here&#8217;s <a title="publishing data online" href="http://helpmeinvestigate.posterous.com/how-do-i-publish-my-data-online" onclick="urchinTracker('/outgoing/helpmeinvestigate.posterous.com/how-do-i-publish-my-data-online?referer=');">how to publish data online</a>)</p>
<p>The flow chart below (<a href="http://onlinejournalismblog.com/2011/09/06/gathering-data-a-flow-chart-for-data-journalists/">from this previous post</a>) helps guide you to the relevant techniques for your data:</p>
<div>
<dl>
<dt><a href="http://onlinejournalismblog.com/2011/09/06/gathering-data-a-flow-chart-for-data-journalists/"><img src="http://farm7.static.flickr.com/6208/6078887277_5722f1493c.jpg" alt="Gathering data: a flow chart for data journalist" width="500" height="500" /></a></dt>
<dd>Gathering data: a flow chart for data journalist</dd>
</dl>
</div>
<h2>3. Pull out the parts of data relevant to your field/investigation</h2>
<p>For example:</p>
<ul>
<li>If the data covers every region, pull out <a href="http://helpmeinvestigate.com/education/2011/11/the-price-of-a-university-drop-out-4-time-for-some-numbers/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/education/2011/11/the-price-of-a-university-drop-out-4-time-for-some-numbers/?referer=');">the parts that apply to your locality</a>, or <a href="http://helpmeinvestigate.com/welfare/travel-to-interview-jobseekers-allowance" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/welfare/travel-to-interview-jobseekers-allowance?referer=');">how that compares to other areas</a> (space), or to previous data (time)</li>
<li>Look at the particular issue(s) that interests you in the data, e.g. a particular crime out of many, or a particular indicator. How does that <a title="health data compared" href="http://helpmeinvestigate.com/health/2012/02/22/public-health-spending-now-and-to-come-data-and-documents/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/health/2012/02/22/public-health-spending-now-and-to-come-data-and-documents/?referer=');">compare across space</a> (regions) or time?</li>
</ul>
<h2>4. Add value to the data</h2>
<p>Here are just some suggestions. You can use one or many:</p>
<ul>
<li><a title="scottish primary education data" href="http://helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/education/2012/01/free-school-meals-in-scottish-primary-schools-data-visualisation/?referer=');">Combine datasets</a> - e.g. one may have school ratings; another may have the addresses of all schools, or their local authority</li>
<li>Convert data &#8211; this amounts to much the same thing, but for example: postcodes are more useful <a href="http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/">when converted into lat/long coordinates</a> (likewise <a href="http://onlinejournalismblog.com/2011/08/12/how-to-convert-eastingnorthing-into-latlong-for-an-interactive-map/">easting and northing</a>)</li>
<li>Find out how the data was collected and/or measured (put simply: pick up the phone and ask)</li>
<li>Get an independent expert perspective on the data</li>
<li><a href="http://blogs.channel4.com/factcheck/" onclick="urchinTracker('/outgoing/blogs.channel4.com/factcheck/?referer=');">Compare the data with official claims or spin</a> - does it really back those claims up?</li>
</ul>
<p><iframe width="500" height="281" src="http://www.youtube.com/embed/TS6F9wcakSc?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<ul>
<li>Compare the data with reports from elsewhere &#8211; is <a href="http://www.thebureauinvestigates.com/2012/01/31/revealed-the-dead-not-included-in-the-official-figures/" onclick="urchinTracker('/outgoing/www.thebureauinvestigates.com/2012/01/31/revealed-the-dead-not-included-in-the-official-figures/?referer=');">anything missing</a>?</li>
<li>Unpick jargon and definitions (here&#8217;s an example of <a href="http://www.guardian.co.uk/global/reality-check-with-polly-curtis/2012/feb/22/unemployment-work-programme-welfare" onclick="urchinTracker('/outgoing/www.guardian.co.uk/global/reality-check-with-polly-curtis/2012/feb/22/unemployment-work-programme-welfare?referer=');">James Ball unpicking different work experience schemes</a>)</li>
<li>Add a <a href="http://onlinejournalismblog.com/2012/03/28/a-useful-tool-for-creating-a-search-interface-for-your-data-freedive/">search and filter interface</a></li>
</ul>
<p>Any of these provide useful opportunities for posting new content with the new contextual information (e.g. &#8220;<strong>How the data on X was gathered</strong>&#8220;) or new combined data (&#8220;<strong><a href="http://helpmeinvestigate.com/health/2012/02/15/data-gp-patient-lists-now-with-qof-data/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/health/2012/02/15/data-gp-patient-lists-now-with-qof-data/?referer=');">Now with QOF data</a></strong>&#8220;) or the issues that they raise (&#8220;<strong><a href="http://blogs.channel4.com/factcheck/how-dodgy-stats-could-decide-our-childrens-future" onclick="urchinTracker('/outgoing/blogs.channel4.com/factcheck/how-dodgy-stats-could-decide-our-childrens-future?referer=');">Why schools data may be worthless</a></strong>&#8220;).</p>
<h2>5. Communicate the story in the data</h2>
<p>I&#8217;ve<a href="http://onlinejournalismblog.com/2011/07/13/the-inverted-pyramid-of-data-journalism-part-2-6-ways-of-communicating-data-journalism/"> written separately about the different ways of communicating data stories, so you can read that here</a>. In short, human case studies are helpful, and visualisation is often useful.</p>
<p>And it&#8217;s at this point that you can also link to the further detail provided in all the content you&#8217;ve written in the previous 4 steps: How you got the data, the wider context, the specific data that&#8217;s of interest, the more detailed expert analysis or background, and so on.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F23%2Fstep-by-step-how-to-start-in-a-data-journalist-role%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/04/23/step-by-step-how-to-start-in-a-data-journalist-role/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/04/23/step-by-step-how-to-start-in-a-data-journalist-role/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>When data goes bad</title>
		<link>http://onlinejournalismblog.com/2012/04/19/when-data-goes-bad/</link>
		<comments>http://onlinejournalismblog.com/2012/04/19/when-data-goes-bad/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 13:34:41 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[bad data]]></category>
		<category><![CDATA[benford's law]]></category>
		<category><![CDATA[BIJ]]></category>
		<category><![CDATA[bureau of investigative journalism]]></category>
		<category><![CDATA[Channel 4]]></category>
		<category><![CDATA[Chile]]></category>
		<category><![CDATA[Ciudadano Inteligente Fundacion]]></category>
		<category><![CDATA[Clearspending]]></category>
		<category><![CDATA[data laundering]]></category>
		<category><![CDATA[dating]]></category>
		<category><![CDATA[Deaths in custody]]></category>
		<category><![CDATA[ellen miller]]></category>
		<category><![CDATA[FactCheck]]></category>
		<category><![CDATA[Felipe Heusser]]></category>
		<category><![CDATA[height]]></category>
		<category><![CDATA[IPCC]]></category>
		<category><![CDATA[Lauren York]]></category>
		<category><![CDATA[missing children]]></category>
		<category><![CDATA[OKCupid]]></category>
		<category><![CDATA[Philip Shakesheff]]></category>
		<category><![CDATA[register of interests]]></category>
		<category><![CDATA[S251]]></category>
		<category><![CDATA[sex trafficking]]></category>
		<category><![CDATA[simon rogers]]></category>
		<category><![CDATA[sunday times]]></category>
		<category><![CDATA[sunlight foundation]]></category>
		<category><![CDATA[tony hirst]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15842</guid>
		<description><![CDATA[Data is so central to the decision-making that shapes our countries, jobs and even personal lives that an increasing amount of data journalism involves scrutinising the problems with the very data itself. Here&#8217;s an illustrative list of when bad data becomes the story &#8211; and the lessons they can teach data journalists: Deaths in police [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F19%2Fwhen-data-goes-bad%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F04_2F19_2Fwhen-data-goes-bad_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F19%2Fwhen-data-goes-bad%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<figure id="attachment_16425" class="wp-caption alignnone" style="width: 614px"><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/04/Incorrect-statistics.jpg"><img class=" wp-image-16425 " src="http://onlinejournalismblog.com/wp-content/uploads/2012/04/Incorrect-statistics-682x1024.jpg" alt="Bad data on sex trafficking: flow chart" width="614" height="922" /></a><figcaption class="wp-caption-text">Image by Lauren York on the Data Journalism Blog</figcaption></figure>
<p>Data is so central to the decision-making that shapes our countries, jobs and even personal lives that an increasing amount of data journalism involves scrutinising the problems with the very data itself. Here&#8217;s an illustrative list of when bad data becomes the story &#8211; and the lessons they can teach data journalists:</p>
<h2>Deaths in police custody unrecorded</h2>
<p><a href="http://www.thebureauinvestigates.com/category/projects/deaths-in-police-custody-2/" onclick="urchinTracker('/outgoing/www.thebureauinvestigates.com/category/projects/deaths-in-police-custody-2/?referer=');">This investigation by the Bureau of Investigative Journalism</a> demonstrates an important question to ask about data: who decides what gets recorded?</p>
<p>In this case, the BIJ identified &#8220;a number of cases not included in the official tally of 16 ‘restraint-related’ deaths in the decade to 2009 &#8230; Some cases were not included because the person has not been officially arrested or detained.&#8221;<span id="more-15842"></span></p>
<p>As <a href="http://www.thebureauinvestigates.com/2012/01/31/revealed-the-dead-not-included-in-the-official-figures/" onclick="urchinTracker('/outgoing/www.thebureauinvestigates.com/2012/01/31/revealed-the-dead-not-included-in-the-official-figures/?referer=');">they explain</a>:</p>
<blockquote><p>&#8220;It turns out the IPCC has a very tight definition of ‘in custody’ –  defined only as when someone has been formally arrested or detained under the mental health act. This does not include people who have died after being in contact with the police.</p>
<p>&#8220;There are in fact two lists. The one which includes the widely quoted list of sixteen deaths in custody only records the cases where the person has been arrested or detained under the mental health act. So, an individual who comes into contact with the police – is never arrested or detained – but nonetheless dies after being restrained, is not included in the figures.</p>
<p>&#8220;&#8230; But even using the IPCC’s tightly drawn definition, the Bureau has identified cases that are still missing.&#8221;</p></blockquote>
<p>Cross-checking the official statistics against wider reports was key technique. As was using the Freedom of Information Act to request the details behind them and the details of those &#8220; who died in circumstances where restraint was used but was not necessarily a direct cause of death&#8221;.</p>
<h2>Cooking the books on drug-related murders</h2>
<p><img src="http://petewarden.typepad.com/.a/6a00d83454428269e20133f4f982a2970b-800wi" alt="Drug related murders in Mexico" width="560" height="349" /><br />
Cross-checking statistics against reports was also used in <a href="http://blog.diegovalle.net/2010/06/statistical-analysis-and-visualization.html" onclick="urchinTracker('/outgoing/blog.diegovalle.net/2010/06/statistical-analysis-and-visualization.html?referer=');">this investigation by Diego Valle-Jones into Mexican drug deaths</a>:</p>
<blockquote><p>&#8220;The Acteal massacre committed by paramilitary units with government backing against 45 Tzotzil Indians is missing from the vital statistics database. According to the INEGI there were only 2 deaths during December 1997 in the municipality of Chenalho, where the massacre occurred. What a silly way to avoid recording homicides! Now it is just a question of which data is less corrupt.&#8221;</p></blockquote>
<p>Diego also <a href="http://onlinejournalismblog.com/2010/10/12/statistical-analysis-as-journalism-benfords-law/">used the Benford&#8217;s Law technique</a> to identify potentially fraudulent data, which was also <a href="http://onlinejournalismblog.com/2011/10/13/statistics-as-journalism-redux-benfords-law-used-to-question-company-accounts/">used to highlight relationships between dodgy company data and real world events such as the dotcom bubble and deregulation</a>.</p>
<h2>Poor records mean no checks</h2>
<p>Detective Inspector Philip Shakesheff exposed a &#8220;gap between [local authority] records and police data&#8221;, <a href="http://www.thesundaytimes.co.uk/sto/news/uk_news/Society/article1016904.ece" onclick="urchinTracker('/outgoing/www.thesundaytimes.co.uk/sto/news/uk_news/Society/article1016904.ece?referer=');">reported The Sunday Times</a> in a story headlined &#8216;<em>Care home loses child 130 times</em>&#8216;:</p>
<blockquote><p>&#8220;The true scale of the problem was revealed after a check of records on police computers. For every child officially recorded by local authorities as missing in 2010, another seven were unaccounted for without their absence being noted.&#8221;</p></blockquote>
<p>Why is it important?</p>
<blockquote><p>&#8220;The number who go missing is one of the indicators on which Ofsted judges how well children’s homes are performing and the homes have a legal duty to keep accurate records.</p>
<p>&#8220;However, there is evidence some homes are failing to do so. In one case, Ofsted gave a good report to a private children’s home in Worcestershire when police records showed 1,630 missing person reports in five years. Police stationed an officer at the home and pressed Ofsted to look closer. The home was downgraded to inadequate and it later closed.</p>
<p>&#8220;The risks of being missing from care are demonstrated by Zoe Thomsett, 17, who was Westminster council’s responsibility. It sent her to a care home in Herefordshire, where she went missing several times, the final time for three days. She had earlier been found at an address in Hereford, but because no record was kept, nobody checked the address. She died there of a drugs overdose.</p>
<p>&#8220;The troubled life of Dane Edgar, 14, ended with a drugs overdose at a friend’s house after he repeatedly went missing from a children’s home in Northumberland. Another 14-year-old, James Jordan, was killed when he absconded from care and was the passenger in a stolen car.&#8221;</p></blockquote>
<h2>Interests not registered</h2>
<p>When there are no formal checks on declarations of interest, how can we rely on it? In Chile, the <a href="http://www.votainteligente.cl/" onclick="urchinTracker('/outgoing/www.votainteligente.cl/?referer=');">Ciudadano Inteligente Fundacion</a><a href="http://www.guardian.co.uk/news/datablog/2012/apr/18/chile-open-government-brasilia-2012" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2012/apr/18/chile-open-government-brasilia-2012?referer=');">decided to check the Chilean MPs&#8217; register of assets and interests by building a database</a>:</p>
<blockquote><p>&#8220;No-one was analysing this data, so it was incomplete,&#8221; explained Felipe Heusser, executive president of the Fundacion. &#8220;We used technology to build a database, using a wide range of open data and mapped all the MPs&#8217; interests. From that, we found that nearly 40% of MPs were not disclosing their assets fully.&#8221;</p></blockquote>
<p>The organisation has now launched a <a href="http://www.inspectordeintereses.cl/" onclick="urchinTracker('/outgoing/www.inspectordeintereses.cl/?referer=');">database</a> that &#8220;enables members of the public to find potential conflicts of interest by analysing the data disclosed through the members&#8217; register of assets.&#8221;</p>
<h2>Data laundering</h2>
<p>Tony Hirst&#8217;s <a href="http://blog.ouseful.info/2012/02/01/sleight-of-hand-and-data-laundering-in-evidence-based-policy-making/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2012/02/01/sleight-of-hand-and-data-laundering-in-evidence-based-policy-making/?referer=');">post about how dodgy data was &#8220;laundered&#8221; by Facebook</a> in a consultants report is a good illustration of the need to &#8216;follow the data&#8217;.</p>
<blockquote><p>We have some dodgy evidence, about which we’re biased, so we give it to an “independent” consultant who re-reports it, albeit with caveats, that we can then report, minus the caveats. Lovely, clean evidence. Our lobbyists can then go to a lazy policy researcher and take this scrubbed evidence, referencing it as finding in the Deloitte report, so that it can make its way into a policy briefing.”</p></blockquote>
<h2>&#8220;Things just don&#8217;t add up&#8221;</h2>
<p>In the video below Ellen Miller of the Sunlight Foundation takes the US government to task over the inconsistencies in its transparency agenda, and the flawed data published on its USAspending.gov &#8211; so flawed that they launched the <a href="http://sunlightfoundation.com/clearspending/" onclick="urchinTracker('/outgoing/sunlightfoundation.com/clearspending/?referer=');">Clearspending</a> website to automate and highlight the discrepancy between two sources of the same data:</p>
<p><iframe width="500" height="281" src="http://www.youtube.com/embed/UNQteT9Bu2w?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<h2>Key budget decisions made on useless data</h2>
<p>Sometimes data might appear to tell an astonishing story, but this turns out to be a mistake &#8211; and that mistake itself leads you to something much more newsworthy, as <a href="http://blogs.channel4.com/factcheck/how-dodgy-stats-could-decide-our-childrens-future/8400?utm_source=twitterfeed&amp;utm_medium=twitter" onclick="urchinTracker('/outgoing/blogs.channel4.com/factcheck/how-dodgy-stats-could-decide-our-childrens-future/8400?utm_source=twitterfeed_amp_utm_medium=twitter&amp;referer=');">Channel 4&#8242;s FactCheck found</a> when it started trying to find out if councils had been cutting spending on Sure Start children’s centres:</p>
<blockquote><p>&#8220;That ought to be fairly straightforward, as all councils by law have to fill in something called a Section 251 workbook detailing how much they are spending on various services for young people.</p>
<p>&#8220;&#8230; Brent Council in north London appeared to have slashed its funding by nearly 90 per cent, something that seemed strange, as we hadn’t heard an outcry from local parents.</p>
<p>&#8220;The council swiftly admitted making an accounting error – to the tune of a staggering £6m.&#8221;</p></blockquote>
<p>And they weren&#8217;t the only ones. In fact, the Department for Education  admitted the numbers were “not very accurate”:</p>
<blockquote><p>&#8220;So to recap, these spending figures don’t actually reflect the real amount of money spent; figures from different councils are not comparable with each other; spending in one year can’t be compared usefully with other years; and the government doesn’t propose to audit the figures or correct them when they’re wrong.&#8221;</p></blockquote>
<p>This was particularly important because the S251 form &#8220;is the document the government uses to reallocate funding from council-run schools to its flagship academies.&#8221;:</p>
<blockquote><p>&#8220;The Local Government Association (LGA) says less than £250m should be swiped from council budgets and given to academies, while the government wants to cut more than £1bn, prompting accusations that it is overfunding its favoured schools to the detriment of thousands of other children.</p>
<p>&#8220;Many councils’ complaints, made plain in responses to an ongoing government consultation, hinge on DfE’s use of S251, a document it has variously described as “unaudited”, “flawed” and”not fit for purpose”.</p></blockquote>
<h2>No data is still a story</h2>
<p>Sticking with education, the TES <a href="http://www.tes.co.uk/article.aspx?storycode=6204396" onclick="urchinTracker('/outgoing/www.tes.co.uk/article.aspx?storycode=6204396&amp;referer=');">reports on the outcome of an FOI request on the experience of Ofsted inspectors</a>:</p>
<blockquote><p>&#8220;[Stephen] Ball submitted a Freedom of Information request, asking how many HMIs had experience of being a secondary head, and how many of those had led an outstanding school. The answer? Ofsted “does not hold the details”.</p>
<p>&#8220;“Secondary heads and academy principals need to be reassured that their work is judged by people who understand its complexity,” Mr Ball said. “Training as a good head of department or a primary school leader on the framework is no longer adequate. Secondary heads don’t fear judgement, but they expect to be judged by people who have experience as well as a theoretical training. After all, a working knowledge of the highway code doesn’t qualify you to become a driving examiner.”</p>
<p>&#8220;&#8230; Sir Michael Wilshaw, Ofsted’s new chief inspector, has already argued publicly that raw data are a key factor in assessing a school’s performance. By not providing the facts to back up its boasts about the expertise of its inspectors, many heads will remain sceptical of the watchdog’s claims.&#8221;</p></blockquote>
<h2>Men aren&#8217;t as tall as they say they are</h2>
<p>To round off, here&#8217;s a <a href="http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating/" onclick="urchinTracker('/outgoing/blog.okcupid.com/index.php/the-biggest-lies-in-online-dating/?referer=');">quirky piece of data journalism by dating site OkCupid</a>, which looked at the height of its members and found an interesting pattern:</p>
<blockquote><p><img src="http://cdn.okcimg.com/blog/lies/MaleHeightDistributionYoink.png" alt="Male height distribution on OKCupid" /></p></blockquote>
<p>&nbsp;</p>
<blockquote><p>&#8220;The male heights on <strong>OkCupid</strong> very nearly follow the expected normal distribution—except the whole thing is shifted to the right of where it should be.</p>
<p>&#8220;Almost universally guys like to add a couple inches. You can also see a more subtle vanity at work: starting at roughly 5&#8242; 8&#8243;, the top of the dotted curve tilts even further rightward. This means that guys <em>as they get closer to six feet </em>round up a bit more than usual, stretching for that coveted psychological benchmark.&#8221;</p></blockquote>
<p><strong>Do you know of any other examples of bad data forming the basis of a story? Please post a comment &#8211; I&#8217;m collecting examples.</strong></p>
<p>UPDATE (April 20 2012): A useful addition from Simon Rogers: <a href="http://www.guardian.co.uk/news/datablog/2011/oct/27/department-resource-accounts-reports" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/oct/27/department-resource-accounts-reports?referer=');">Named and shamed: the worst government annual reports</a> explains why government department spending reports fail to support the Government&#8217;s claimed desire for an &#8220;army of armchair auditors&#8221;, with a list of the worst offenders at the end.</p>
<p>Also:</p>
<ul>
<li><a href="http://www.highlylegal.org/2012/04/23/measuring-risk-without-statistics-it-doesnt-add-up/" onclick="urchinTracker('/outgoing/www.highlylegal.org/2012/04/23/measuring-risk-without-statistics-it-doesnt-add-up/?referer=');">This post on the lack of data on deaths from legal highs</a>, by some of my students at City University.</li>
<li><a href="http://www.datajournalismblog.com/2012/05/08/sextraffickingdata/" onclick="urchinTracker('/outgoing/www.datajournalismblog.com/2012/05/08/sextraffickingdata/?referer=');">Sex trafficking: a story of data gone wrong</a>, which is the source of the opening image for this post (by <strong>Lauren York</strong>, another student of mine)</li>
<li><a href="http://www.chicagotribune.com/classified/automotive/traffic/ct-met-getting-around-0423-20120423,0,6210631.column" onclick="urchinTracker('/outgoing/www.chicagotribune.com/classified/automotive/traffic/ct-met-getting-around-0423-20120423_0_6210631.column?referer=');">Chicago police crash reports are full of errors</a>.</li>
</ul>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F04%2F19%2Fwhen-data-goes-bad%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/04/19/when-data-goes-bad/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/04/19/when-data-goes-bad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Get started in data scraping &#8211; and earn £75 for the pleasure</title>
		<link>http://onlinejournalismblog.com/2012/03/29/get-started-in-data-scraping-and-earn-75-for-the-pleasure/</link>
		<comments>http://onlinejournalismblog.com/2012/03/29/get-started-in-data-scraping-and-earn-75-for-the-pleasure/#comments</comments>
		<pubDate>Thu, 29 Mar 2012 13:24:29 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[openlylocal]]></category>
		<category><![CDATA[scraperwiki]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=16104</guid>
		<description><![CDATA[OpenlyLocal are trying to scrape planning application data from across the country. They want volunteers to help write the scrapers using Scraperwiki - and are paying £75 for each one. This is a great opportunity for journalists or journalism students looking for an excuse to write their first scraper: there are 3 sample scrapers to [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F29%2Fget-started-in-data-scraping-and-earn-75-for-the-pleasure%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F03_2F29_2Fget-started-in-data-scraping-and-earn-75-for-the-pleasure_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F29%2Fget-started-in-data-scraping-and-earn-75-for-the-pleasure%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>OpenlyLocal are <a href="http://countculture.wordpress.com/2012/03/29/how-to-help-build-the-uks-open-planning-database-writing-scrapers/" onclick="urchinTracker('/outgoing/countculture.wordpress.com/2012/03/29/how-to-help-build-the-uks-open-planning-database-writing-scrapers/?referer=');">trying to scrape planning application data from across the country</a>. They want volunteers to help write the scrapers using <a href="https://scraperwiki.com/" onclick="urchinTracker('/outgoing/scraperwiki.com/?referer=');">Scraperwiki </a>- and are <strong>paying £75 for each</strong> one.</p>
<p>This is a great opportunity for journalists or journalism students looking for an excuse to write their first scraper: there are 3 sample scrapers to help you find your feet, with many more <a href="https://scraperwiki.com/search/planning%20application/" onclick="urchinTracker('/outgoing/scraperwiki.com/search/planning_20application/?referer=');">likely to appear</a> as they are written. Hopefully, some guidance will appear too (if not, I may try to write some myself).</p>
<p><a href="http://countculture.wordpress.com/2012/03/29/how-to-help-build-the-uks-open-planning-database-writing-scrapers/" onclick="urchinTracker('/outgoing/countculture.wordpress.com/2012/03/29/how-to-help-build-the-uks-open-planning-database-writing-scrapers/?referer=');">Add your names in the comments on Andrew&#8217;s blog post</a>, and happy scraping!</p>
<p>&nbsp;</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F29%2Fget-started-in-data-scraping-and-earn-75-for-the-pleasure%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/03/29/get-started-in-data-scraping-and-earn-75-for-the-pleasure/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/03/29/get-started-in-data-scraping-and-earn-75-for-the-pleasure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comparing apples and oranges in data journalism: a case study</title>
		<link>http://onlinejournalismblog.com/2012/03/29/comparing-apples-and-oranges-in-data-journalism-a-case-study/</link>
		<comments>http://onlinejournalismblog.com/2012/03/29/comparing-apples-and-oranges-in-data-journalism-a-case-study/#comments</comments>
		<pubDate>Thu, 29 Mar 2012 13:06:17 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[comparisons]]></category>
		<category><![CDATA[context]]></category>
		<category><![CDATA[Guardian]]></category>
		<category><![CDATA[public sector pay]]></category>
		<category><![CDATA[simon rogers]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=16102</guid>
		<description><![CDATA[A must-read for any data journalist, aspiring or otherwise, is Simon Rogers&#8217; post on The Guardian Datablog where he compares public and private sector pay. This is a classic apples-and-oranges situation where politicians and government bodies are comparing two things that, really, are very different. Is a private school teacher really comparable to someone teaching in [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F29%2Fcomparing-apples-and-oranges-in-data-journalism-a-case-study%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F03_2F29_2Fcomparing-apples-and-oranges-in-data-journalism-a-case-study_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F29%2Fcomparing-apples-and-oranges-in-data-journalism-a-case-study%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>A must-read for any data journalist, aspiring or otherwise, is <a href="http://www.guardian.co.uk/news/datablog/2012/mar/27/public-private-sector-pay" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2012/mar/27/public-private-sector-pay?referer=');">Simon Rogers&#8217; post on The Guardian Datablog where he compares public and private sector pay</a>.</p>
<p>This is a classic <a href="http://en.wikipedia.org/wiki/Apples_and_oranges" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/Apples_and_oranges?referer=');">apples-and-oranges</a> situation where politicians and government bodies <a href="http://www.dailymail.co.uk/news/article-2121380/Average-state-staff-member-paid-15-private-worker-despite-working-fewer-hours.html" onclick="urchinTracker('/outgoing/www.dailymail.co.uk/news/article-2121380/Average-state-staff-member-paid-15-private-worker-despite-working-fewer-hours.html?referer=');">are comparing two things</a> that, really, <a href="http://www.guardian.co.uk/society/blog/2012/mar/27/public-private-sector-pay-comparisons" onclick="urchinTracker('/outgoing/www.guardian.co.uk/society/blog/2012/mar/27/public-private-sector-pay-comparisons?referer=');">are very different</a>. Is a private school teacher really comparable to someone teaching in an unpopular school? What is the private sector equivalent of a director of public health or a social worker?</p>
<p>But if these issues are being discussed, journalists must try to shed some light, and Simon Rogers does a great job in unpicking the comparisons. From pay and hours worked, to qualifications and age (big differences in both), and gender and pay inequality (more women in the public sector, more lower- and higher-paid workers in the private sector), Rogers crunches all the numbers:<span id="more-16102"></span></p>
<blockquote><p>&#8220;[T]he proportion of low skill jobs in the private sector has increased, and the proportion of high skill jobs in the public sector increased to around 31% of all jobs by 2011, compared 26% of all private sector jobs.</p>
<p>&#8220;But, at the same time, people who are most highly qualified actually get paid worse in the public sector.</p>
<p>&#8220;&#8230; Public sector workers tend to be older &#8230; Average mean hourly earnings peak in the early 40s in both sectors. They decline slightly approaching retirement although the decline happens earlier in the private sector than in the public sector, possibly because the higher earners in the private sector are more likely to leave the labour market earlier.</p></blockquote>
<blockquote><p>&#8220;It also shows that if you&#8217;re older in the public sector, you get paid better than in the private sector.</p>
<p>&#8220;&#8230; [T]he bottom 5% of workers in the public sector earn less than £6.91 per hour, whereas in the private sector, 5% of workers earn less than £5.93 per hour.&#8221;</p></blockquote>
<p>When you find yourself in an apples-and-oranges situation you can&#8217;t avoid, this is the way to do it. Any other examples?</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F29%2Fcomparing-apples-and-oranges-in-data-journalism-a-case-study%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/03/29/comparing-apples-and-oranges-in-data-journalism-a-case-study/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/03/29/comparing-apples-and-oranges-in-data-journalism-a-case-study/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A useful tool for creating a search interface for your data: freeDive</title>
		<link>http://onlinejournalismblog.com/2012/03/28/a-useful-tool-for-creating-a-search-interface-for-your-data-freedive/</link>
		<comments>http://onlinejournalismblog.com/2012/03/28/a-useful-tool-for-creating-a-search-interface-for-your-data-freedive/#comments</comments>
		<pubDate>Wed, 28 Mar 2012 10:35:39 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[freedive]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=16065</guid>
		<description><![CDATA[Here&#8217;s a solution to a problem that aspiring data journalists have come up against time and time again: how to quickly create a searchable interface to your dataset. freeDive is quick and &#8211; if you can follow the wizard&#8216;s instructions &#8211; easy too. If you have a dataset that you want people to be able [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F28%2Fa-useful-tool-for-creating-a-search-interface-for-your-data-freedive%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F03_2F28_2Fa-useful-tool-for-creating-a-search-interface-for-your-data-freedive_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F28%2Fa-useful-tool-for-creating-a-search-interface-for-your-data-freedive%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>    <iframe src="http://player.vimeo.com/video/35991763" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe></p>
<p><a href="http://multimedia.journalism.berkeley.edu/tools/freedive/" onclick="urchinTracker('/outgoing/multimedia.journalism.berkeley.edu/tools/freedive/?referer=');">Here&#8217;s</a> a solution to a problem that aspiring data journalists have come up against time and time again: how to quickly create a searchable interface to your dataset. </p>
<p><a href="http://multimedia.journalism.berkeley.edu/tools/freedive/" onclick="urchinTracker('/outgoing/multimedia.journalism.berkeley.edu/tools/freedive/?referer=');">freeDive</a> is quick and &#8211; if you can follow <a href="http://multimedia.journalism.berkeley.edu/tools/freedive/wizard" onclick="urchinTracker('/outgoing/multimedia.journalism.berkeley.edu/tools/freedive/wizard?referer=');">the wizard</a>&#8216;s instructions &#8211; easy too. <span id="more-16065"></span></p>
<p>If you have a dataset that you want people to be able to search or filter, publish it on Google Docs, copy part of the URL into freeDive when it asks you for it, then decide which fields you want people to be able to see, and how they can search, or filter their search.</p>
<p>At the end you&#8217;re given some script which is the most likely point of struggle for some users. If you use a WordPress.com blog then it won&#8217;t work &#8211; and even a self-hosted WordPress blog will require the <a href="http://www.artiss.co.uk/code-embed" onclick="urchinTracker('/outgoing/www.artiss.co.uk/code-embed?referer=');">installation and activation of a plugin</a> (helpfully linked on the freeDive site) and following those instructions (if you don&#8217;t know how to use custom fields <a href="http://codex.wordpress.org/Custom_Fields" onclick="urchinTracker('/outgoing/codex.wordpress.org/Custom_Fields?referer=');">check out the explanations here</a> and <a href="http://codex.wordpress.org/Administration_Panels#Screen_Options" onclick="urchinTracker('/outgoing/codex.wordpress.org/Administration_Panels_Screen_Options?referer=');">here</a>).</p>
<p>If you&#8217;ve tried it on Blogger or another platform, let me know about it.</p>
<p>Meanwhile, here&#8217;s a test of the interface in action &#8211; the technology is still in &#8216;alpha&#8217; so expect some bugginess.</p>
<p>
<!-- Artiss Code Embed v2.0.1 | http://www.artiss.co.uk/code-embed -->
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.0/jquery.min.js"></script> <script type="text/javascript" src="https://www.google.com/jsapi"></script> <script type="text/javascript"> google.load('visualization', '1.1', {packages:['controls']}); </script> <style type="text/css"> #fd_main p {font: 13px/16px Arial; margin-bottom:10px !important; padding:0;} #fd_main h4 {font-family:Arial; font-weight:bold; text-transform:uppercase;text-align:left; padding:0 0 5px 0; margin:0;} #fd_main label {font: bold 13px/24px Arial; padding: 0 5px 0 0; float:left; vertical-align: middle;} #fd_main input, textarea, select, button {font: 13px/16px Arial; color:#959595; vertical-align: middle;} .fd_note{font-family:Arial;color:#5E5E5E; font-size:.8em; border:none; padding:0;} div.search_widget{background:#E1E1E1;background:-webkit-gradient(linear,left top, left bottom,from(#FFF),to(#E1E1E1));background:-webkit-linear-gradient(top,#FFF,#E1E1E1);background:-moz-linear-gradient(top,#FFF,#E1E1E1);background:-ms-linear-gradient(top,#FFF,#E1E1E1);background:-o-linear-gradient(top,#FFF,#E1E1E1);border:2px solid #E1E1E1;padding:20px;-moz-border-radius:10px;-webkit-border-radius:10px; margin-bottom:20px;} div.control{margin-bottom: 15px; color:black;height:26px;} div.chart1{float: left;} .header {background-color: #5B5B5B; color: #FFFFFF; font-family:Arial, sans-serif; text-align:left; font-size:0.8em; text-transform:uppercase; font-weight:600; letter-spacing:1px; line-height:1.4em; border:none;} .row { background-color: #FFFFFF; color: #3F3F3F; font-family:inherit; font-size:0.9em;} .oddRow { background-color: #F5F5F5; color: #3F3F3F; font-family:inherit; text-align:left; font-size:0.9em;} .hoverRow { background-color: #CDCDCD; font-family:inherit; text-align:left; font-size:0.9em; } .selectedRow { background-color: #DCDCDC; color: #3F3F3F; text-align: center; font-family:inherit; text-align:left; font-size:0.9em; } .cell, table {padding:2px;border:none;} .headCell {padding:0;} .goog-menu {font-family:Arial;} .goog-menuitem-content {color: #3F3F3F;} .goog-menuitem-highlight,.goog-menuitem-hover {background-color: #F0F0F0;border-color: #F0F0F0;} .goog-menu-button-focused .goog-menu-button-outer-box,.goog-menu-button-focused .goog-menu-button-inner-box {border-color: #B1B1B1; margin} .goog-menu-button-inner-box {font: 13px/18px Arial, sans-serif; margin:2px 0; padding:0 5px;} .goog-menu-button-hover .goog-menu-button-outer-box,.goog-menu-button-hover .goog-menu-button-inner-box {border-color: #B1B1B1!important;} .goog-menu-button-active,.goog-menu-button-open {background-color: #F0F0F0;border-color: #B1B1B1;background-position: bottom left} .goog-menu-button-focused .goog-menu-button-outer-box,.goog-menu-button-focused .goog-menu-button-inner-box {border-color: #B1B1B1} .google-visualization-controls-slider-horizontal {border: 0px; background-color: #DBDBDB; border-radius: 5px; -moz-border-radius: 5px; outline:none; height:8px;} .google-visualization-controls-slider-thumb {background-color: #616161; border: none;width: 12px; height: 12px; } .google-visualization-controls-slider-horizontal .google-visualization-controls-slider-handle {height: 8px;} .google-visualization-controls-slider-horizontal .google-visualization-controls-slider-thumb {top: -2px; left-margin: 8px; border-radius: 2px; -moz-border-radius: 2px;} .google-visualization-controls-slider-handle {background-color: #616161; opacity: .6; height: 4px} .google-visualization-controls-rangefilter-thumblabel {font: 13px/24px Arial; color: #3F3F3F;padding: 0 0.5em} #page-loader { position: absolute; top: 0; bottom: 0%; left: 0; right: 0%; background-color: white; z-index: 99; display: none; text-align: center; width: 100%; padding-top: 25px; } </style> <script type="text/javascript" charset="utf-8"> $.extend({ getUrlVars: function(){ var vars = [], hash; var hashes = window.location.href.slice(window.location.href.indexOf('?') + 1).split('&'); for(var i = 0; i < hashes.length; i++) { hash = hashes[i].split('='); vars.push(hash[0]); vars[hash[0]] = hash[1]; } return vars; }, getUrlVar: function(name){ return $.getUrlVars()[name]; } }); </script> <script type="text/javascript"> anyquery = false; fdnq = decodeURIComponent($.getUrlVar('fdnq')); fdop = decodeURIComponent($.getUrlVar('fdop')); fdtq = decodeURIComponent($.getUrlVar('fdtq')); fdall = decodeURIComponent($.getUrlVar('fdall')); if (fdall != 'undefined') { fdall = true; } else { fdall = false; }; if (fdtq != 'undefined' || fdnq != 'undefined' || fdall == true) { anyquery = true; google.setOnLoadCallback(drawVisualization()); } else { fdtq = 'Enter case-sensitive text'; fdnq = 'Enter a number'; }; function drawVisualization() { var search_url = 'https://spreadsheets.google.com/a/google.com/tq?key=0ApTo6f5Yj1iJdHZTUjVYbEhZUG5nd3YxSXhvY19kR0E'; if (fdtq == 'undefined') { fdtq = ''; }; if (fdnq == 'undefined') { fdnq = ''; }; if (fdall) { var querystring = "select *"; } else { var querystring = "select B,D,G where "; if (fdtq != '' && fdnq == '' ) { querystring = querystring + "B like '%" + fdtq + "%'"; } else if (fdtq == '' && fdnq != '' ) { querystring = querystring + "G" + fdop + fdnq; } else { querystring = querystring + "B like '%" + fdtq + "%'" + " and " + "G" + fdop + fdnq; }; }; querystring = encodeURIComponent(querystring); search_url = search_url + "&tq=" + querystring; search_url = search_url.replace(/#&tq/,'&tq'); var query = new google.visualization.Query(search_url); query.send(handleQueryResponse); } function handleQueryResponse(response) { if (response.isError()) { console.log('Error in query: ' + response.getMessage() + ' ' + response.getDetailedMessage()); return; } var data = response.getDataTable(); var control1_use = new google.visualization.ControlWrapper({'controlType': 'CategoryFilter','containerId': 'control1','options': {'filterColumnLabel': 'PCT', 'matchType':'any', 'ui': {'label':'Filter by choosing PCT', 'cssClass' : 'custom-stringfilter', 'allowMultiple': false,'allowTyping': false}}}); var control2_use = new google.visualization.ControlWrapper({'controlType': 'NumberRangeFilter','containerId': 'control2','options': {'filterColumnLabel': 'patients per GP 2011', 'matchType':'any', 'ui': {'label':'Or narrow by patients per GP','cssClass' : 'custom-stringfilter', 'allowMultiple': false,'allowTyping': false}}}); var classes = {headerRow: 'header', tableRow: 'row', hoverTableRow: 'hoverRow', oddTableRow: 'oddRow', selectedTableRow:'selectedRow', tableCell:'cell', headderCell:'headCell' }; var table = new google.visualization.ChartWrapper({ 'chartType': 'Table', 'containerId': 'chart1', 'options': {'height': '400px', 'cssClassNames': classes, 'width': '600px','allowHtml':'true' } }); var dashboard = new google.visualization.Dashboard(document.getElementById('dashboard')). bind([control1_use , control2_use ], table). draw(data); $('div#dashboard').show(); }; </script> <div id="container"> <div id="fd_main" style='width:556px'> <div id="full_search" class="search_widget" style='float:left;width:556px'> <h4> SEARCH THE DATABASE </h4> <p class="fd_note"> Text search is case sensitive. Leave fields blank to see the entire database (may cause longer load times).</p> <p> <label for="search_text">Practice/Cost Centre Name and Address includes </label> <input id="search_text" value="" onfocus="changeSearchButton1()" > </p> <div id="num_search_bit"> <p><label for='range_num'>patients per GP 2011 </label><select name='sheet_op' id='sheet_op'><option value='>='>Greater than or equal to</option><option value='<='>Less than or equal to</option><option value='>'>Greater than</option><option value='<'>Less than</option><option value='='>Equals</option></select><input id='range_num' value='Enter a number'></p> </div> <div id="name"> <p style='position:absolute;left:10000px;'><input id='range_num' value='' ></p> </div> <p> <span class="fd_note" style="float:left;" > Powered by <a href="http://multimedia.journalism.berkeley.edu/tools/freedive" onclick="urchinTracker('/outgoing/multimedia.journalism.berkeley.edu/tools/freedive?referer=');">freeDive</a> </span> <input id="submit_button" type="submit" value="SEARCH" onclick="fd_refresh()" style="float:right; color:#000;"> </p> </div> <div id="page_loader"> </div> <div id="dashboard" style='width: 600px;font-family:Arial, sans-serif;float:left;'> <h4> Explore your results </h4> <p><b>Instructions:</b> Use the filter(s) below to customize your search results. Use the tool above to perform a new search. </p> <div id="control1" class="control"></div> <div id='control2' class='control'></div> <div class="fd_note" style="width:100%; margin-bottom:.3em">Click on a column label to resort the table.</div> <div id="chart1" class="chart1"> <p style="font-size:1.3em">Fetching data... Thank you for waiting. </p> <p>Searches with a large number of results may take longer to load.</div> </div> </div> </div> <script type="text/javascript" charset="utf-8" > if (anyquery == false) { $('div#dashboard').hide(); }; $('input#search_text').val(fdtq); $('input#range_num').val(fdnq); function changeSearchButton1() { document.getElementById('submit_button').value = 'SEARCH'; }; function fd_refresh() { var fdop = $('#sheet_op').val(); var fdnq = $('#range_num').val(); var fdtq = $('#search_text').val(); var sURL = window.location.href; sURL = sURL.replace(/(fdtq=.*&|fdtq=.*$)/gi,''); sURL = sURL.replace(/(fdnq=.*&|fdnq=.*$)/gi,''); sURL = sURL.replace(/(fdop=.*&|fdop=.*$)/gi,''); sURL = sURL.replace(/(fdall=.*&|fdall=.*$)/gi,''); sURL = sURL.replace(/&$/,''); var qs = ''; var found = sURL.search('\\?'); if (found == -1) { qs = "?"; }; if (fdnq == "Enter a number") {fdnq = ''}; if (fdtq == "Enter case-sensitive text") {fdtq = ''}; if (fdnq == '' && fdtq == '') { var allrecs = true; }; if (allrecs) { qs = qs + "fdall=true" ; } else { qs = qs + "&fdop=" + encodeURIComponent(fdop) + "&fdnq=" + encodeURIComponent(fdnq) + "&fdtq=" + encodeURIComponent(fdtq); }; sURL = sURL + qs; window.location.replace(sURL); } </script> <script type="text/javascript" charset="utf-8"> $(document).ready(function(){ $('input#range_num').focus(function() { if($(this).val() == 'Enter a number') $(this).val(''); }).blur(function() { if( $(this).val() == '') $(this).val('Enter a number'); }); $('input#search_text').focus(function() { if($(this).val() == 'Enter case-sensitive text') $(this).val(''); }).blur(function() { if( $(this).val() == '') $(this).val('Enter case-sensitive text'); }); if (anyquery == false) { document.getElementById('submit_button').value='SEE ALL RECORDS';} else{ document.getElementById('submit_button').value='SEARCH'; }; }); </script>
<!-- End of Artiss Code Embed code -->
</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F28%2Fa-useful-tool-for-creating-a-search-interface-for-your-data-freedive%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/03/28/a-useful-tool-for-creating-a-search-interface-for-your-data-freedive/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/03/28/a-useful-tool-for-creating-a-search-interface-for-your-data-freedive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>La Nación: data journalism from Argentina</title>
		<link>http://onlinejournalismblog.com/2012/03/14/la-nacion-data-journalism-from-argentina/</link>
		<comments>http://onlinejournalismblog.com/2012/03/14/la-nacion-data-journalism-from-argentina/#comments</comments>
		<pubDate>Wed, 14 Mar 2012 08:46:08 +0000</pubDate>
		<dc:creator>Duarte Romero Varela</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[newspapers]]></category>
		<category><![CDATA[argentina]]></category>
		<category><![CDATA[Duarte Romero]]></category>
		<category><![CDATA[Knight-Mozilla]]></category>
		<category><![CDATA[La Nacion]]></category>
		<category><![CDATA[Momi Peralta]]></category>
		<category><![CDATA[Nacion Data]]></category>
		<category><![CDATA[NICAR]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[OpenNews]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15964</guid>
		<description><![CDATA[Guest post by Duarte Romero Since the start of the year the Argentinian newspaper &#8216;La Nación&#8217; has been publishing &#8216;Nación Data&#8217;, a blog dedicated to data visualization, interactive projects and especially, all the news related with data journalism. During this time they have been posting interviews with experts from the community, reporting popular events such as [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F14%2Fla-nacion-data-journalism-from-argentina%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F03_2F14_2Fla-nacion-data-journalism-from-argentina_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F14%2Fla-nacion-data-journalism-from-argentina%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/03/Imagen-2.png"><img class="aligncenter size-full wp-image-15966" src="http://onlinejournalismblog.com/wp-content/uploads/2012/03/Imagen-2.png" alt="" width="630" height="169" /></a></p>
<p><strong><em>Guest post by <a href="https://twitter.com/#!/xan_guindan" onclick="urchinTracker('/outgoing/twitter.com/_/xan_guindan?referer=');">Duarte Romero</a></em></strong></p>
<p>Since the start of the year the Argentinian newspaper <a href="http://www.lanacion.com.ar/" onclick="urchinTracker('/outgoing/www.lanacion.com.ar/?referer=');">&#8216;La Nación&#8217;</a> has been publishing <a href="http://blogs.lanacion.com.ar/data/" onclick="urchinTracker('/outgoing/blogs.lanacion.com.ar/data/?referer=');">&#8216;Nación Data&#8217;</a>, a blog dedicated to data visualization, interactive projects and especially, all the news related with data journalism.</p>
<p>During this time they have been posting <a href="http://blogs.lanacion.com.ar/data/mundo/ny-times-aron-pilhofer-y-el-estado-del-periodismo-de-datos/" onclick="urchinTracker('/outgoing/blogs.lanacion.com.ar/data/mundo/ny-times-aron-pilhofer-y-el-estado-del-periodismo-de-datos/?referer=');">interviews with experts</a> from the community, reporting popular events such as <a href="http://blogs.lanacion.com.ar/data/mundo/conferencia-ire-nicar-dia-1/" onclick="urchinTracker('/outgoing/blogs.lanacion.com.ar/data/mundo/conferencia-ire-nicar-dia-1/?referer=');">NICAR</a> and sharing the most innovative pieces made by other newspapers.</p>
<p>The multimedia development manager of &#8216;La Nación&#8217;, <a href="http://blogs.lanacion.com.ar/data/aperalta/" onclick="urchinTracker('/outgoing/blogs.lanacion.com.ar/data/aperalta/?referer=');">Momi Peralta</a>, pointed out that their main goal so far is to release as much data as they can.<span id="more-15964"></span></p>
<p>It is important to highlight that in Argentina there is not a Freedom of Information Act, so most of the statistics and spreadsheets they use are made by their own journalists.</p>
<p>Peralta believes that this data journalism will help to open up information in her country:</p>
<blockquote><p>&#8220;One of our aims [is to] produce good information in Argentina, opening new data and creating visualizations. Each [set of] data that is published means that more knowledge is released.&#8221;</p></blockquote>
<p>Last Friday they launched a new platform, titled, simply: <a href="http://data.lanacion.com.ar/dashboards/4610/indicadores-generales/" onclick="urchinTracker('/outgoing/data.lanacion.com.ar/dashboards/4610/indicadores-generales/?referer=');">&#8216;Data&#8217;</a>. This tool is used to publish those indices that are collected and refined by journalists and allow the possibility to download them as a spreadsheet or share on social networks.</p>
<p>On the dashboard you can find data about topics as different as traffic collisions, carbon dioxide emissions and the activity of the parliament.</p>
<p>Some are used in current stories but others are just uploaded for the interest of the public.</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/03/Imagen-3.png"><img class="aligncenter size-full wp-image-15968" src="http://onlinejournalismblog.com/wp-content/uploads/2012/03/Imagen-3.png" alt="" width="1178" height="534" /></a></p>
<p>Peralta hopes that this project will help empower the open data movement in Argentina, so she wants to use the blog to keep in touch with the community.</p>
<p>She likes to compare data driven journalism with constructing a building. Data are the materials and tools are the technology, but if you want to make something you will need to order them first.</p>
<p>Finally, she points out that you may build things on your own but if you have contributions from other people, the result will be better.</p>
<p><strong>Learning by doing</strong></p>
<p>None of this would be possible if &#8216;La Nación&#8217; had not made a great effort to train its workers in the main tools used in data journalism. The team that is in charge of this area attended several courses on Excel, Tableau, scraping and other techniques.</p>
<p>During the NICAR conference they played a video (embedded below) about this learning process to share their improvements and achievements with the community.</p>
<p><iframe width="500" height="281" src="http://www.youtube.com/embed/lMvCOjqG0PQ?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>This enthusiasm was recognized by some of the most prestigious institutions of the online journalism community. In the last two years, &#8216;La Nación&#8217; has received two <a href="http://journalists.org/2011/09/25/2011-online-journalism-award-winners-announced/" onclick="urchinTracker('/outgoing/journalists.org/2011/09/25/2011-online-journalism-award-winners-announced/?referer=');">ONA</a> <a href="http://journalists.org/awards/past-winners-2010/" onclick="urchinTracker('/outgoing/journalists.org/awards/past-winners-2010/?referer=');">awards</a> and one <a href="http://www.eppyawards.com/Content/Past_2011_Winners-28-.aspx" onclick="urchinTracker('/outgoing/www.eppyawards.com/Content/Past_2011_Winners-28-.aspx?referer=');">EPPY</a> thanks to their innovative projects.</p>
<p>Furthermore, last week the Argentinian paper <a href="http://blog.mozilla.com/blog/2012/03/09/new-york-times-joins-mozilla/" onclick="urchinTracker('/outgoing/blog.mozilla.com/blog/2012/03/09/new-york-times-joins-mozilla/?referer=');">was chosen by the Knight-Mozilla OpenNews</a> for a partnership aimed at driving open source innovation in news.</p>
<p>This award is shared with The New York Times, ProPublica and Spiegel Online and it means that &#8216;La Nación&#8217; is the first paper written in Spanish to receive this prize.</p>
<p>Not many Spanish-language countries have developed a truly open data policy, but with more examples of media organisations working with data, the hope is that public institutions will be forced to improve their transparency and openness.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F14%2Fla-nacion-data-journalism-from-argentina%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/03/14/la-nacion-data-journalism-from-argentina/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/03/14/la-nacion-data-journalism-from-argentina/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The straw man of data journalism&#8217;s &#8220;scientific&#8221; claim</title>
		<link>http://onlinejournalismblog.com/2012/03/12/the-straw-man-of-data-journalisms-scientific-claim/</link>
		<comments>http://onlinejournalismblog.com/2012/03/12/the-straw-man-of-data-journalisms-scientific-claim/#comments</comments>
		<pubDate>Mon, 12 Mar 2012 09:11:00 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[regulation, law and ethics]]></category>
		<category><![CDATA[accuracy]]></category>
		<category><![CDATA[datablog]]></category>
		<category><![CDATA[Fleet Street Blues]]></category>
		<category><![CDATA[Guardian]]></category>
		<category><![CDATA[james ball]]></category>
		<category><![CDATA[unemployment]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15949</guid>
		<description><![CDATA[Over the weekend Fleet Street Blues has had a bee in its bonnet about the &#8220;pretence&#8221; of data journalism and Saturday&#8217;s Guardian front page: &#8220;Half UK&#8217;s young black men out of work&#8220;. This, says FSB, is a lie that demonstrates the &#8221;pretence&#8221; that &#8220;&#8216;crunching the numbers&#8217; is somehow an an abstract, scientific, mathematical task&#8221;. There are [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F12%2Fthe-straw-man-of-data-journalisms-scientific-claim%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F03_2F12_2Fthe-straw-man-of-data-journalisms-scientific-claim_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F12%2Fthe-straw-man-of-data-journalisms-scientific-claim%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/03/Guardian_cover.png"><img class="alignnone size-full wp-image-15954" src="http://onlinejournalismblog.com/wp-content/uploads/2012/03/Guardian_cover.png" alt="Guardian cover March 10 2012: Half UK's young black men out of work" width="561" height="347" /></a></p>
<p>Over the weekend <a href="http://fleetstreetblues.blogspot.com/2012/03/dodgy-data-journalism.html" onclick="urchinTracker('/outgoing/fleetstreetblues.blogspot.com/2012/03/dodgy-data-journalism.html?referer=');">Fleet Street Blues has had a bee in its bonnet</a> about the &#8220;pretence&#8221; of data journalism and Saturday&#8217;s Guardian front page: &#8220;<a href="http://www.guardian.co.uk/society/2012/mar/09/half-uk-young-black-men-unemployed" onclick="urchinTracker('/outgoing/www.guardian.co.uk/society/2012/mar/09/half-uk-young-black-men-unemployed?referer=');">Half UK&#8217;s young black men out of work</a>&#8220;.</p>
<p>This, says FSB, is a lie that demonstrates the &#8221;pretence&#8221; that &#8220;&#8216;crunching the numbers&#8217; is somehow an an abstract, scientific, mathematical task&#8221;.<span id="more-15949"></span></p>
<p>There are two problems with this: the first is that I&#8217;ve never heard a data journalist make this claim; and the second is that the &#8216;lie&#8217; does not come from a data journalist (they generally don&#8217;t write headlines). It is, in short, a straw man.</p>
<p>The story itself is, however, perfectly valid. While FSB points to the exclusion of students, for example, The Guardian&#8217;s story mentions that early on.</p>
<p>It&#8217;s fair to say that those who are economically inactive should not be included in unemployment figures. Indeed, <a href="http://www.guardian.co.uk/news/datablog/2012/mar/09/black-unemployed-young-men?INTCMP=ILCNETTXT3487" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2012/mar/09/black-unemployed-young-men?INTCMP=ILCNETTXT3487&amp;referer=');">the Datablog post which expands on the data</a> does a very good job in explaining how that activity is mentioned:</p>
<blockquote><p>&#8220;Youth unemployment figures are always slightly odd, and as with many things in life, it&#8217;s students that get the blame.</p>
<p>&#8220;Students can be counted in three different ways: a full-time student doing an evening job in a bar counts as employed. A student who wants bar work but can&#8217;t get it is unemployed. A full-time student who&#8217;s not topping up his income with a job (and isn&#8217;t trying to) is economically inactive.&#8221;</p></blockquote>
<p>Fleet Street Blues uses the raw data published by the Datablog to highlight a number of other ways of interpreting the data, all of which are interesting &#8211; and in fact, I&#8217;ll probably use them in future as an example of how the same data can tell many different stories.</p>
<h2>More than one story</h2>
<p>But again, this proves nothing about the &#8216;pretence&#8217; of data journalism. All it proves is that there is more than one story to be found in a dataset, and that journalists will pick the one that is most newsworthy for their particular market.</p>
<p>In fact, not only journalists, but politicians, PR staff, marketers, scientists, lobbyists and anyone else who wants to tell a story.</p>
<p>It&#8217;s because of this that data journalism is not something which should be snootily written off as a &#8220;fad&#8221;. Data is important. Journalists need to be able to interrogate it and find the stories that are not being told.</p>
<p>That is exactly what The Guardian have done. Yes, the headline could be more accurate* &#8211; but how many times has a headline writer omitted key details due to the limitations of space (on every type of story)? And yes, as one FSB commenter points out, the inclusion of whole numbers would have added further context.</p>
<p>But the irony is that it&#8217;s precisely because The Guardian isn&#8217;t trying to pretend to be &#8216;The Only Truth&#8217; that FSB and its commenter can interrogate the data, and that the reader can understand the subtleties in how data is gathered and classified.</p>
<p>If there is a pretence about data journalism, it is a wider one: a  belief in society that somehow numbers equate to truth. A belief which is exploited by politicians but which is coming &#8211; and should come &#8211; under increasing scrutiny from journalists. (The story, for example, began as <a href="http://www.guardian.co.uk/commentisfree/2012/mar/05/young-black-unemployed-tragedy" onclick="urchinTracker('/outgoing/www.guardian.co.uk/commentisfree/2012/mar/05/young-black-unemployed-tragedy?referer=');">a column by a Labour politician in The Guardian</a>, <a href="http://blogs.channel4.com/factcheck/factcheck-are-the-young-black-jobless-worse-off-than-white-youths/9740" onclick="urchinTracker('/outgoing/blogs.channel4.com/factcheck/factcheck-are-the-young-black-jobless-worse-off-than-white-youths/9740?referer=');">fact-checked by Channel 4 News</a>, and followed up by The Guardian&#8217;s journalists)</p>
<p>The more good data journalism we have, the less that anyone &#8211; including journalists &#8211; can pretend to the idea of a &#8220;scientific&#8221; process.</p>
<p><em>*(Notably, the online version includes a second headline which is clearer: &#8220;Unemployment rate for black 16 to 24-year-olds available for work now double that for white counterparts, ONS data shows&#8221;)</em></p>
<div><a href="http://onlinejournalismblog.com/wp-content/uploads/2012/03/GuardianWeb.png"><img class="alignnone size-full wp-image-15955" src="http://onlinejournalismblog.com/wp-content/uploads/2012/03/GuardianWeb.png" alt="Web version of same Guardian article" width="466" height="118" /></a></div>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F12%2Fthe-straw-man-of-data-journalisms-scientific-claim%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/03/12/the-straw-man-of-data-journalisms-scientific-claim/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/03/12/the-straw-man-of-data-journalisms-scientific-claim/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>From CMS to DMS</title>
		<link>http://onlinejournalismblog.com/2012/03/09/from-cms-to-data-management-system/</link>
		<comments>http://onlinejournalismblog.com/2012/03/09/from-cms-to-data-management-system/#comments</comments>
		<pubDate>Fri, 09 Mar 2012 15:49:14 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[DMS]]></category>
		<category><![CDATA[francis irving]]></category>
		<category><![CDATA[rufus pollock]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15947</guid>
		<description><![CDATA[There&#8217;s a persuasive argument being made by Francis Irving and Rufus Pollock in a joint blog post about the growth of data management systems &#8211; the &#8216;DMS&#8217; to content management systems&#8217; &#8216;CMS&#8217;: &#8220;Just as then we wrote HTML in text files by hand and uploaded it by FTP, now we analyse data on our laptops [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F09%2Ffrom-cms-to-data-management-system%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F03_2F09_2Ffrom-cms-to-data-management-system_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F09%2Ffrom-cms-to-data-management-system%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>There&#8217;s a persuasive argument <a href="http://blog.scraperwiki.com/2012/03/09/from-cms-to-dms-c-is-for-content-d-is-for-data/" onclick="urchinTracker('/outgoing/blog.scraperwiki.com/2012/03/09/from-cms-to-dms-c-is-for-content-d-is-for-data/?referer=');">being made</a> by Francis Irving and Rufus Pollock in a joint blog post about the growth of data management systems &#8211; the &#8216;DMS&#8217; to content management systems&#8217; &#8216;CMS&#8217;:</p>
<blockquote><p>&#8220;Just as then we wrote HTML in text files by hand and uploaded it by FTP, now we analyse data on our laptops using Excel, and share it with friends by emailing CSV files.</p>
<p>&#8220;But it reaches the point where using the filesystem and Outlook as your DMS stretches to breaking point. You’ll need a proper one.</p>
<p>&#8220;Nobody really knows what a proper one will look like yet. We’re all working on it.&#8221;</p></blockquote>
<p>Their <a href="http://blog.scraperwiki.com/2012/03/09/from-cms-to-dms-c-is-for-content-d-is-for-data/" onclick="urchinTracker('/outgoing/blog.scraperwiki.com/2012/03/09/from-cms-to-dms-c-is-for-content-d-is-for-data/?referer=');">post </a>lists what a DMS needs to do and the companies already trying to solve the &#8216;DMS problem&#8217; from different directions: a list which includes Google Docs (&#8220;coming from the web spreadsheet direction&#8221;), the data social network BuzzData, visualisation tool Tableau, data marketplaces, operating systems, Scraperwiki, and PANDA (&#8220;making a DMS for newsrooms&#8221;)</p>
<p>It&#8217;s a well-drawn picture from an angle which I haven&#8217;t seen before. Certainly, a number of news organisations are trying to reduce the friction of producing content for different platforms by &#8216;atomising&#8217; it in data-driven production processes (where a piece of content might be assembled and presented differently depending on the platform it is accessed through, for example), and their internal systems can probably be added to the list above.</p>
<p>What do you think? Is this a problem that&#8217;s being addressed in your own organisation?</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F09%2Ffrom-cms-to-data-management-system%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div><div class="printfriendly alignleft"><a href="http://onlinejournalismblog.com/2012/03/09/from-cms-to-data-management-system/?pfstyle=wp" rel="nofollow" ><img src="//cdn.printfriendly.com/pf-button.gif" alt="Print Friendly" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/03/09/from-cms-to-data-management-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

