<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; online journalism book</title>
	<atom:link href="http://onlinejournalismblog.com/tag/online-journalism-book/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Thu, 24 May 2012 08:39:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>What you need to know about the laws on harassment, data protection and hate speech {UPDATED: Stalking added}</title>
		<link>http://onlinejournalismblog.com/2012/03/28/what-you-need-to-know-about-the-laws-on-harassment-data-protection-and-hate-speech/</link>
		<comments>http://onlinejournalismblog.com/2012/03/28/what-you-need-to-know-about-the-laws-on-harassment-data-protection-and-hate-speech/#comments</comments>
		<pubDate>Wed, 28 Mar 2012 08:18:12 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[regulation, law and ethics]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[UGC]]></category>
		<category><![CDATA[bloggerheads]]></category>
		<category><![CDATA[Charles Russell]]></category>
		<category><![CDATA[Communications Act 2003]]></category>
		<category><![CDATA[Crime and Disorder Act 1998]]></category>
		<category><![CDATA[Criminal Justice and Immigration Act 2008]]></category>
		<category><![CDATA[data protection]]></category>
		<category><![CDATA[Data Protection Act]]></category>
		<category><![CDATA[Equality Act 2010]]></category>
		<category><![CDATA[Fabrice Muamba]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[harassment]]></category>
		<category><![CDATA[hate speech]]></category>
		<category><![CDATA[incitement]]></category>
		<category><![CDATA[Liam Stacey]]></category>
		<category><![CDATA[nadine dorries]]></category>
		<category><![CDATA[News Ltd]]></category>
		<category><![CDATA[offensive communications]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[online journalism handbook]]></category>
		<category><![CDATA[Protection From Harrassment Act 1997]]></category>
		<category><![CDATA[racism]]></category>
		<category><![CDATA[Serious Crime Act 2007]]></category>
		<category><![CDATA[The Public Order Act 1986]]></category>
		<category><![CDATA[the Racial and Religious Hatred Act 2006]]></category>
		<category><![CDATA[tim ireland]]></category>
		<category><![CDATA[Twitter Joke Trial]]></category>
		<category><![CDATA[uk riots]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=16058</guid>
		<description><![CDATA[The following is taken from the law chapter of The Online Journalism Handbook. The book blog and Facebook page contain updates and additions &#8211; those specifically on law can be found here. Harassment The Protection From Harrassment Act 1997 is occasionally used to prevent journalists on reporting on particular individuals. Specifically, any conduct which amounts to [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F28%2Fwhat-you-need-to-know-about-the-laws-on-harassment-data-protection-and-hate-speech%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2012_2F03_2F28_2Fwhat-you-need-to-know-about-the-laws-on-harassment-data-protection-and-hate-speech_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F28%2Fwhat-you-need-to-know-about-the-laws-on-harassment-data-protection-and-hate-speech%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>The following is taken from the law chapter of <a href="http://www.amazon.co.uk/The-Online-Journalism-Handbook-Practical/dp/140587340X/ref=as_li_ss_mfw?&amp;linkCode=wey&amp;tag=onlijourblog-21" onclick="urchinTracker('/outgoing/www.amazon.co.uk/The-Online-Journalism-Handbook-Practical/dp/140587340X/ref=as_li_ss_mfw?_amp_linkCode=wey_amp_tag=onlijourblog-21&amp;referer=');">The Online Journalism Handbook</a>. The <a href="http://onlinejournalismhandbook.wordpress.com/" onclick="urchinTracker('/outgoing/onlinejournalismhandbook.wordpress.com/?referer=');">book blog</a> and <a href="http://www.facebook.com/onlinejournalismhandbook" onclick="urchinTracker('/outgoing/www.facebook.com/onlinejournalismhandbook?referer=');">Facebook page</a> contain updates and additions &#8211; <a href="http://onlinejournalismhandbook.wordpress.com/category/chapter-11-law/" onclick="urchinTracker('/outgoing/onlinejournalismhandbook.wordpress.com/category/chapter-11-law/?referer=');">those specifically on law can be found here</a>.</em></p>
<h2>Harassment</h2>
<p>The <strong>Protection From Harrassment Act 1997</strong> is occasionally used to prevent journalists on reporting on particular individuals. Specifically, any conduct which amounts to harassment of someone can be considered to a criminal act, for which the victim can seek an injunction (followed by arrest if broken) or damages.</p>
<p><a href="http://onlinejournalismblog.com/2010/01/25/seismic-shock-blogger-paid-a-visit-by-police-over-libel-issue/">One example of a blogger&#8217;s experience</a> is illustrative of the way the act can be used with regard to online journalism, even if no case reaches court.<span id="more-16058"></span></p>
<p>In January 2010 the Seismic Shock blog published a post linking an Anglican reverend with holocaust denial and antisemitism. The reverend complained of harassment to his local police force &#8211; Surrey Police &#8211; who passed on the complaint to the police force covering the blogger&#8217;s district: Yorkshire Police. Yorkshire Police visited the blogger and suggested he remove his blog.</p>
<p>The blogger, feeling intimidated, complied.</p>
<p>It was only when the reverend threatened another blogger (who had linked to the same evidence), boasting of his previous success (and falsely claiming that Seismic Shock had received a caution), that the Seismic Shock blogger talked publicly about what had happened and the story received national attention (<a href="http://www.bbc.co.uk/blogs/thereporters/rorycellanjones/2010/01/seismic_shock_when_blogging_me.html" onclick="urchinTracker('/outgoing/www.bbc.co.uk/blogs/thereporters/rorycellanjones/2010/01/seismic_shock_when_blogging_me.html?referer=');">Cellan-Jones, 2010</a>).</p>
<p>Defences to a charge of harassment include if you were undertaking actions for the purpose of preventing or detecting crime, or that your conduct was &#8220;reasonable&#8221; in the particular circumstances.</p>
<p>The fewer the incidents, and the more spaced out the instances of those, the weaker the case.</p>
<p>If you have complied with an internal code of conduct with regard to privacy and fairness this will also help you.</p>
<p>A further consideration with regard to harassment is if someone claims that they are being harassed on your website. While they can report the harasser to the police, they might also expect you to take action under the <strong>Equality Act 2010</strong> if the harassment is sexual in nature or based on gender, sexuality, disability, age, pregnancy, race or religion.</p>
<p>This legislation is useful to refer to if you wish to remove content that might be considered harassment, or bar a contributor for such behaviour. As always, clear terms and conditions outlining unacceptable behaviour that would result in such actions will strengthen your position.</p>
<h2>Data Protection</h2>
<p>If you gathering user information in any way &#8211; for example, requiring users to register to comment, upload material or to access your site, or &#8216;crowdsourcing&#8217; details which include personal information &#8211; then you will need to be aware of the Data Protection Act.</p>
<p>The <strong>Data Protection Act 1998</strong> stipulates how you should process any personal information you handle, and gives individuals powers to request access to information held about them. It requires that you use information &#8220;fairly and lawfully&#8221; and only for the purposes for which it is gathered, and only for as long as it is needed; that you store it securely and do not transfer it outside the EU (unless you ensure adequate protection); that you keep it accurate and up to date where necessary; and that you provide avenues for users to access their personal data if they require it.</p>
<p>In practical terms this means that when you gather information you should be clear about what it is to be used for and how the user can gain access to information held about them.</p>
<p>You should only provide access to user databases or spreadsheets containing personal details to members of staff who need that access to do what you said would be done with that information.</p>
<p>Importantly, the Act contains an exemption for information held only for ‘journalistic, literary or artistic&#8217; purposes, which applies before first publication and if the publisher believes that publication would be in the public interest.</p>
<p>If these conditions are met then the data must only be held securely and you are exempt from the other requirements.</p>
<p>This is clearly important because otherwise the subject of a secret investigation could request any information that is held about them.</p>
<p>More information and advice about data protection <a href="www.ico.gov.uk/for_organisations.aspx">can be found on the Information Commissioner&#8217;s Office website</a>.</p>
<h2>Hate speech laws</h2>
<p>A number of laws forbid expression of &#8216;hate speech&#8217; online in the UK. <strong>The Public Order Act 1986, the Racial and Religious Hatred Act 2006</strong> and the <strong>Criminal Justice and Immigration Act 2008</strong> cover, respectively, stirring up racial hatred (which can be based on nationality, colour, and ethnic origins); stirring up religious hatred; and inciting hatred on the basis of sexual orientation. If material is published on your site comes under any of these categories you should inform the contributor of the legal basis under which you are removing them.</p>
<h2>Incitement and offensive communications</h2>
<p>In addition to the hate speech laws covered in the Online Journalism Handbook, there are three other laws that are increasingly coming into play with relation to comments posted by website users.</p>
<p>The law on incitement – now “encouraging or assisting a crime” under the <strong>Serious Crime Act 2007</strong> covers acts where individuals incite others to commit illegal acts. It was <a href="http://www.bbc.co.uk/news/uk-14488055" onclick="urchinTracker('/outgoing/www.bbc.co.uk/news/uk-14488055?referer=');">used in a number of cases surrounding the UK riots</a> where defendants were accused of encouraging disorder using social networks such as Facebook, with two men in particular <a href="http://www.guardian.co.uk/uk/2011/aug/16/facebook-riot-calls-men-jailed" onclick="urchinTracker('/outgoing/www.guardian.co.uk/uk/2011/aug/16/facebook-riot-calls-men-jailed?referer=');">receiving a sentence of 4 years in prison as a result</a>.</p>
<p>Student Liam Stacey was charged under a second act – the <strong>Crime and Disorder Act 1998</strong> – which covers incitement to ethnic or racial hatred, after making racist remarks on Twitter in the aftermath of the collapse of Bolton Wanderers footballer Fabrice Muamba. He was <a href="http://www.thisissouthwales.co.uk/Tweeter-jailed-disgusting-racist-posts-Fabrice/story-15644497-detail/story.html" onclick="urchinTracker('/outgoing/www.thisissouthwales.co.uk/Tweeter-jailed-disgusting-racist-posts-Fabrice/story-15644497-detail/story.html?referer=');">sentenced to 56 days in prison</a>.</p>
<p>The <strong>Communications Act 2003</strong>, specifically Section 127 – covers “grossly offensive” messages, a term broad enough to include a worrying range of discussion for publishers.</p>
<p>A number of Twitter users <a href="http://www.guardian.co.uk/technology/2012/mar/27/twitter-racism-taking-on-twacists?newsfeed=true" onclick="urchinTracker('/outgoing/www.guardian.co.uk/technology/2012/mar/27/twitter-racism-taking-on-twacists?newsfeed=true&amp;referer=');">have been prosecuted under the act for offensive messages sent to footballers</a>.</p>
<p>It was also <a href="http://www.opendemocracy.net/ourkingdom/fahad-ansari/racially-aggravated-prosecution-case-of-azhar-ahmed" onclick="urchinTracker('/outgoing/www.opendemocracy.net/ourkingdom/fahad-ansari/racially-aggravated-prosecution-case-of-azhar-ahmed?referer=');">used to prosecute Azhar Ahmed</a> for the following statement, also on Facebook:</p>
<blockquote><p><em>“People gassin about the deaths of soldiers! What about the innocent familys who have been brutally killed.. The women who have been raped.. The children who have been sliced up..! Your enemy’s were the Taliban not innocent harmless familys. All soldiers should DIE &amp; go to HELL! THE LOWLIFE F*****N SCUM! gotta problem go cry at your soliders grave &amp; wish him hell because that where he is going..”</em></p></blockquote>
<p>The contentious issue here is who decides what is offensive. As Fahad Ansari explains:</p>
<blockquote><p>“The test for “grossly offensive” is whether or not the message would cause gross offence to those to whom it relates, who need not be the recipients.”</p></blockquote>
<p>Normally these laws are used to charge individuals, but publishers and journalists should also be aware of the potential for them to be used to request users’ details – including sources. If they have been warned about such content and have not removed it, there may also be legal consequences. These are as yet largely unexplored, although the case of News Ltd in Australia – <a href="http://m.smh.com.au/business/news-ltd-website-posted-offensive-comments-court-finds-20120328-1vxyy.html" onclick="urchinTracker('/outgoing/m.smh.com.au/business/news-ltd-website-posted-offensive-comments-court-finds-20120328-1vxyy.html?referer=');">found to have breached racial discrimination laws in publishing moderated comments</a> – is illustrative.</p>
<p>The lawyer Charles Russell <a href="http://charlesrussell.wordpress.com/2010/12/11/twitterjoketrial-a-deconstruction-of-a-statutory-provision/" onclick="urchinTracker('/outgoing/charlesrussell.wordpress.com/2010/12/11/twitterjoketrial-a-deconstruction-of-a-statutory-provision/?referer=');">deconstructs a series of cases relating to that act here</a>, including the ‘<a href="http://en.wikipedia.org/wiki/Trial_of_Paul_Chambers" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/Trial_of_Paul_Chambers?referer=');">Twitter Joke Trial</a>’.</p>
<h2>Stalking</h2>
<p>Bloggerheads&#8217; Tim Ireland writes about his experiences of accusations of &#8216;stalking&#8217; by one MP after he wrote about evidence surrounding the investigation into her expenses claims. <a href="http://www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-04/" onclick="urchinTracker('/outgoing/www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-04/?referer=');">The series is worth reading</a> as an illustration of how social media is bending the boundaries of the physical and digital worlds:</p>
<blockquote><p>&#8220;Chris Paul blogged about Nadine Dorries. <a href="http://www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-03/" onclick="urchinTracker('/outgoing/www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-03/?referer=');">Dorries reported him to police as a stalker.</a></p>
<p>&#8220;Ms Humphreycushion tweeted about Nadine Dorries. <a href="http://www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-02/" onclick="urchinTracker('/outgoing/www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-02/?referer=');">Dorries reported her to police as a stalker.</a></p>
<p>&#8220;I blogged and tweeted about Nadine Dorries. I also attended a public meeting I was invited to. <a href="http://www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-01/" onclick="urchinTracker('/outgoing/www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-01/?referer=');">Dorries reported me to police as a stalker.</a></p>
<p>&#8220;Linda Jack ran against Nadine Dorries in an election. <a href="http://www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-03/" onclick="urchinTracker('/outgoing/www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-03/?referer=');">Dorries reported her to police as a stalker.</a>&#8220;</p></blockquote>
<p>Tim has used the <strong>Data Protection Act</strong> particularly well to obtain the original complaints made against him, <a href="http://www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-04/" onclick="urchinTracker('/outgoing/www.bloggerheads.com/archives/2011/11/dorries-wolf-letter-04/?referer=');">although</a>:</p>
<blockquote><p>&#8220;Even when I submitted a subject access request to her office legally compelling her to reveal what she claims are my emails, <a href="http://www.bloggerheads.com/archives/2011/03/nadine-dorries-right-to-know/" onclick="urchinTracker('/outgoing/www.bloggerheads.com/archives/2011/03/nadine-dorries-right-to-know/?referer=');">she refused to cooperate</a> (!) in defiance of the Information Commissioner’s Office and the Data Protection Act.&#8221;</p></blockquote>
<p>Do you know of any other examples of stalking laws being used in relation to journalism?</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2012%2F03%2F28%2Fwhat-you-need-to-know-about-the-laws-on-harassment-data-protection-and-hate-speech%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2012/03/28/what-you-need-to-know-about-the-laws-on-harassment-data-protection-and-hate-speech/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>My online journalism book is now out</title>
		<link>http://onlinejournalismblog.com/2011/06/27/my-online-journalism-book-is-now-out/</link>
		<comments>http://onlinejournalismblog.com/2011/06/27/my-online-journalism-book-is-now-out/#comments</comments>
		<pubDate>Mon, 27 Jun 2011 14:38:57 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[online journalism]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[UGC]]></category>
		<category><![CDATA[online journalism book]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14771</guid>
		<description><![CDATA[The Online Journalism Handbook, written with Liisa Rohumaa, has now been published. You can get it here. I&#8217;ve been blogging throughout the process of writing the book &#8211; particularly the chapters on data journalism, blogging and UGC &#8211; and you can still find those blog posts under the tag &#8216;Online Journalism Book&#8216;. Other chapters cover [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F06%2F27%2Fmy-online-journalism-book-is-now-out%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F06_2F27_2Fmy-online-journalism-book-is-now-out_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F06%2F27%2Fmy-online-journalism-book-is-now-out%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The <strong>Online Journalism Handbook</strong>, written with Liisa Rohumaa, has now been published. <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&#038;camp=2486&#038;linkCode=wey&#038;tag=onlijourblog-21&#038;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_038_camp=2486_038_linkCode=wey_038_tag=onlijourblog-21_038_creative=8882&amp;referer=');">You can get it here</a>.</p>
<p>I&#8217;ve been blogging throughout the process of writing the book &#8211; particularly the chapters on data journalism, blogging and UGC &#8211; and you can still find those blog posts under the tag &#8216;<a href="http://onlinejournalismblog.com/tag/online-journalism-book/">Online Journalism Book</a>&#8216;.</p>
<p>Other chapters cover interactivity, audio slideshows and podcasting, video, law, some of the history that helps in understanding online journalism, and writing for the web (including SEO and SMO).</p>
<p>Meanwhile, I&#8217;ve created a <a href="http://onlinejournalismhandbook.wordpress.com/" onclick="urchinTracker('/outgoing/onlinejournalismhandbook.wordpress.com/?referer=');">blog</a>, <a href="http://www.facebook.com/pages/Online-Journalism-Handbook/127125834036761" onclick="urchinTracker('/outgoing/www.facebook.com/pages/Online-Journalism-Handbook/127125834036761?referer=');">Facebook page</a> and Twitter account (<a href="http://twitter.com/#!/ojhandbook" onclick="urchinTracker('/outgoing/twitter.com/_/ojhandbook?referer=');">@OJhandbook</a>) to provide updates, corrections and additions to the book.</p>
<p>If you spot anything in the book that needs updating or correcting, let me know. Likewise, let me know what you think of the book and anything you&#8217;d like to see added in future.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F06%2F27%2Fmy-online-journalism-book-is-now-out%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/06/27/my-online-journalism-book-is-now-out/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Why did you get into data journalism?</title>
		<link>http://onlinejournalismblog.com/2010/09/22/why-did-you-get-into-data-journalism/</link>
		<comments>http://onlinejournalismblog.com/2010/09/22/why-did-you-get-into-data-journalism/#comments</comments>
		<pubDate>Wed, 22 Sep 2010 10:40:13 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[Charles Arthur]]></category>
		<category><![CDATA[jonathan richards]]></category>
		<category><![CDATA[mary hamilton]]></category>
		<category><![CDATA[online journalism book]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8423</guid>
		<description><![CDATA[In researching my book chapter (UPDATE: now published) I asked a group of journalists who worked with data what led them to do so. Here are their answers: Jonathon Richards, The Times: The flood of information online presents an amazing opportunity for journalists, but also a challenge: how on earth does one keep up with; make [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F22%2Fwhy-did-you-get-into-data-journalism%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F09_2F22_2Fwhy-did-you-get-into-data-journalism_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F22%2Fwhy-did-you-get-into-data-journalism%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>In researching my book chapter (<strong>UPDATE: <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">now published</a></strong>) I asked a group of journalists who worked with data what led them to do so. Here are their answers:</p>
<h2>Jonathon Richards, The Times:</h2>
<blockquote><p>The flood of information online presents an amazing opportunity for journalists, but also a challenge: how on earth does one keep up with; make sense of it? You could go about it in the traditional way, fossicking in individual sites, but much of the journalistic value in this outpouring, it seems, comes in aggregation: in processing large amounts of data, distilling them, and exploring them for patterns. To do that &#8211; unless you&#8217;re superhuman, or have a small army of volunteers &#8211; you need the help of a computer.</p>
<p>I &#8216;got into&#8217; data journalism because I find this mix exciting. It appeals to the traditional journalistic instinct, but also calls for a new skill which, once harnessed, dramatically expands the realm of &#8216;stories I could possibly investigate&#8230;&#8217;<span id="more-8423"></span></p></blockquote>
<h2>Mary Hamilton, Eastern Daily Press:</h2>
<blockquote><p>I started coding out of necessity, not out of desire. In my day-to-day work for local newspapers I came across stories that couldn&#8217;t be told any other way. Excel spreadsheets full of data that I knew was relevant to readers if I could break it down or aggregate it up. Lists of locations that meant nothing on the page without a map. Timelines of events and stacks of documents. The logical response for me was to try to develop the skills to parse data to get to the stories it can tell, and to present it in interactive, interesting and &#8211; crucially &#8211; relevant ways. I see data journalism as an important skill in my storytelling toolkit &#8211; not the only option, but an increasingly important way to open up information to readers and users.</p></blockquote>
<h2>Charles Arthur, The Guardian:</h2>
<blockquote><p>When I was really young, I read a book about computers which made the point &#8211; rather effectively &#8211; that if you found yourself doing the same process again and again, you should hand it over to a computer. That became a rule for me: never do some task more than once if you can possibly get a computer to do it.</p>
<p>Obviously, to implement that you have to do a bit of programming. It turns out all programming languages are much the same &#8211; they vary in their grammar, but they&#8217;re all about making the computer do stuff. And it&#8217;s often the same stuff (at least in my ambit) &#8211; fetch a web page, mash up two sets of data, filter out some rubbish and find the information you want.</p>
<p>I got into data journalism because I also did statistics &#8211; and that taught me that people are notoriously bad at understanding data. Visualisation and simplification and exposition are key to helping people understand.</p>
<p>So data journalism is a compound of all those things: determination to make the computer do the slog, confidence that I can program it to, and the desire to tell the story that the data is holding and hiding.</p>
<p>I don&#8217;t think there was any particular point where I suddenly said &#8220;ooh, this is data journalism&#8221; &#8211; it&#8217;s more that the process of thinking &#8220;oh, big dataset, stuff it into an ad-hoc MySQL database, left join against that other database I&#8217;ve got, see what comes out&#8221; goes from being a huge experiment to your natural reaction.</p>
<p>It&#8217;s not just data though &#8211; I use programming to slough off the repetitive tasks of the day, such as collecting links, or resizing pictures, or getting the picture URL and photographer and licence from a Flickr page and stuffing it into a blogpost.</p>
<p>Data journalism is actually only half the story. The other half is that journalists should be **actively unwilling** to do repetitive tasks if it&#8217;s machine-like (say, removing line breaks from a piece of copy, or changing a link format).</p>
<p>Time spent doing those sorts of tasks is time lost to journalism and given up to being a machine. Let the damn machines do it. Humans have better things to do.</p></blockquote>
<h2><a href="http://stdout.be/en/" onclick="urchinTracker('/outgoing/stdout.be/en/?referer=');">Stijn Debrouwere</a>, Belgian information designer:</h2>
<blockquote><p>I used to love reading the daily newspaper, but lately I can&#8217;t seem to be bothered anymore. I&#8217;m part of that generation of people news execs fear so much: those that simply don&#8217;t care about what newspapers and news magazines have to offer. I enjoy being an information designer because it gives me a chance to help reinvent the way we engage and inform communities through news and analysis, both offline and online. Technology doesn&#8217;t solve everything, but it sure can help. My professional goal is simply this: make myself love news and newspapers again, and thereby hopefully getting others to love it too.</p></blockquote>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F22%2Fwhy-did-you-get-into-data-journalism%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/09/22/why-did-you-get-into-data-journalism/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Podcasting: the experiences of Bagel Tech News</title>
		<link>http://onlinejournalismblog.com/2010/09/13/podcasting-the-experiences-of-bagel-tech-news/</link>
		<comments>http://onlinejournalismblog.com/2010/09/13/podcasting-the-experiences-of-bagel-tech-news/#comments</comments>
		<pubDate>Mon, 13 Sep 2010 07:30:31 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[radio]]></category>
		<category><![CDATA[Andy Ihnatko]]></category>
		<category><![CDATA[bagel tech news]]></category>
		<category><![CDATA[Ewen Rankin]]></category>
		<category><![CDATA[leo laporte]]></category>
		<category><![CDATA[Marc Silk]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[podcasting]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=9747</guid>
		<description><![CDATA[As part of the research into a forthcoming book on online journalism (UPDATE: now published), I interviewed Ewen Rankin of independent podcast Bagel Tech News. Here are his responses in full: The background My background is as a commercial photographer. I started life in graphic design and quickly moved to shooting photographs for the agency [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F13%2Fpodcasting-the-experiences-of-bagel-tech-news%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F09_2F13_2Fpodcasting-the-experiences-of-bagel-tech-news_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F13%2Fpodcasting-the-experiences-of-bagel-tech-news%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><img class="alignleft" src="http://a1.phobos.apple.com/us/r30/Podcasts/75/f4/14/ps.bnvchnpb.170x170-75.jpg" alt="Bagel Tech News podcast" width="170" height="170" /></p>
<p><em>As part of the research into a forthcoming </em><em><a href="http://onlinejournalismblog.com/tag/online-journalism-book/">book on online journalism</a> (<strong>UPDATE:<a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');"> now published</a>)</strong></em><em>, I interviewed </em><strong><em>Ewen Rankin</em></strong><em> of independent podcast </em><a href="http://www.bageltech.net/BagelTech/Home.html" onclick="urchinTracker('/outgoing/www.bageltech.net/BagelTech/Home.html?referer=');"><em>Bagel Tech News</em></a><em>. Here are his responses in full:</em></p>
<h2>The background</h2>
<p>My background is as a commercial photographer. I started life in graphic design and quickly moved to shooting photographs for the agency at which I worked. It was kind of a lucky transition as I wasn&#8217;t much cop as a graphic artist. I took fairly low level stuff to start with (picture business cards were all the rage in the 80s) and then moved to more commercial work shooting the advertising shots for Pretty Polly and Golden Lady tights in about 1988.</p>
<p>I start broadcasting in July 2008 and after two weeks Amber Macarthur made us Podcast of the Week on the Net@Night show with Leo Laporte. Listenership rose and we began to grow.</p>
<p>The Daily News show was published&#8230; daily until November 2008 and then I started publishing the BOG Show with Marc Silk, and was opened by Andy Ihnatko on 30th November 2008. I removed Marc from the show in Christmas 2009 and installed a &#8216;Skype Wall&#8217; in January 2010 to run a more panel based show. More shows have been added in the intervening period and the network now has 7 active shows<span id="more-9747"></span></p>
<ul>
<li>Bagel Tech News &#8211; 70k Dloads PCM</li>
<li>Bagel Tech BIG &#8211; 3k Dloads/Week</li>
<li>Bagel Profits &#8211; No Show since May due to Athos Work committments. Generally around 250-500 per episode</li>
<li>Bagel Tech Foto &#8211; New podcast on Photography &#8211; 5 episodes produced 250 Dloads Per episode</li>
<li>Bagel Tech Media &#8211; Formerly Sonic Beyond Podcast &#8211; 500 Dloads per episode</li>
<li>Bagel Tech Rage &#8211; Formerly Tech Rage News &#8211; 250 Dloads per Episode.</li>
<li>Bagel Tech Mac will begin airing in September</li>
<li>Bagel Tech Law will begin in 2011</li>
</ul>
<p>Apart from the Daily Show, all podcasts are produced weekly.</p>
<p>Bagel Tech Media Group will also add non tech related shows in 2011.</p>
<h2>Preparing the show</h2>
<p>The Daily Show is prepared each morning at 5.30am with a trawl through around 300 stories gathered using the Firefox Plugin &#8216;Brief&#8217; and then saved and Synced as bookmarks using &#8216;X Marks&#8217; After that the chosen stories are ordered and then the podcast is recorded. This is generally about 10 minutes of audio including fluffs and rereads and edits to between 5 and 6.5 mins.</p>
<p>Then the pictures are added to the M4a Version and then the website is updated.</p>
<p>Stories are selected based on whether I believe that the story is either something that the listenership would Want to Know, but I also include stories which I think they SHOULD know or could know. And every podcast has an &#8216;And Finally&#8217; to sign off with a snigger.</p>
<p>The Weekly shows are more relaxed and there is minimal prep for these.</p>
<p>Tricks of the Trade&#8230; hmm. I guess I have just got more efficient at reviewing stories and creating the podcast and website. I have learnt more tools which can save me time and I am already set up to work from locations across the country. I am truly a mobile office and studio and it is rare for me to miss an episode of the Daily Show. The process is time consuming in prep more than delivery. Some mornings are hard to get motivated, others come easier.</p>
<h2>Advice</h2>
<p>Broadcast with enthusiasm and passion for the subject. Make sure that podcasting is your hobby first and try to make money second. If you show your financial hand too early then you will alienate listeners.</p>
<p>Concentrate on community. Let people feel part of the &#8216;X Show&#8217; community rather than isolated listeners. Open a chatroom and live feed while you record for the interaction which ensures this develops.</p>
<p>Lastly, broadcast to more than 1000 people every day. It doesn&#8217;t matter if there arent 1000 people on the other side of the microphone&#8230; always broadcast like there are or it will show through.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F13%2Fpodcasting-the-experiences-of-bagel-tech-news%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/09/13/podcasting-the-experiences-of-bagel-tech-news/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>5 data visualisation tips from David McCandless</title>
		<link>http://onlinejournalismblog.com/2010/05/14/5-data-visualisation-tips-from-david-mccandless/</link>
		<comments>http://onlinejournalismblog.com/2010/05/14/5-data-visualisation-tips-from-david-mccandless/#comments</comments>
		<pubDate>Fri, 14 May 2010 09:33:38 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[david mccandless]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8514</guid>
		<description><![CDATA[Here&#8217;s another snippet from my data journalism book chapter (now published). As part of my research David McCandless, author of the very lovely book and website Information is Beautiful gave  his 5 tips for visualising data: Double source data wherever possible &#8211; even the UN and WorldBank can make mistakes Take information out &#8211; there&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F05%2F14%2F5-data-visualisation-tips-from-david-mccandless%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F05_2F14_2F5-data-visualisation-tips-from-david-mccandless_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F05%2F14%2F5-data-visualisation-tips-from-david-mccandless%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Here&#8217;s another snippet from my <a href="http://onlinejournalismblog.com/tag/data-journalism/">data journalism book chapter</a> (<strong><a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">now published</a></strong>). As part of my research <strong>David McCandless</strong>, author of the very lovely <a href="http://bit.ly/d6VIFh" onclick="urchinTracker('/outgoing/bit.ly/d6VIFh?referer=');">book</a> and website <a href="http://www.informationisbeautiful.net/" onclick="urchinTracker('/outgoing/www.informationisbeautiful.net/?referer=');">Information is Beautiful</a> gave  his 5 tips for visualising data:</p>
<ol>
<li> <strong>Double source data</strong> wherever possible &#8211; even the UN and WorldBank can make mistakes</li>
<li><strong>Take information out</strong> &#8211; there&#8217;s a long tradition among statistical journalists of showing everything. All data points. The whole range. Every column and row. But stories are about clear threads with extraneous information fuzzed out. And journalism is about telling stories. You can only truly do that when you mask out the irrelevant or the minor data. The same applies to design which is about reducing something to its functional essence.</li>
<li><strong>Avoid standard abstract units</strong> &#8211; tons of carbon, billions of dollars &#8211; these kinds of units are over-used and impossible to imagine or relate to. Try to rework or process units down to &#8216;everyday&#8217; measures. Try to give meaningful context for huge figures whenever possible.</li>
<li>Self-sufficency &#8211; all <strong>graphs, charts and infographics should be self-sufficient</strong>. That is, you shouldn&#8217;t require any other information to understand them. They&#8217;re like interfaces. So each should have a clear title, legend, source, labels etc. And credit yourself. I&#8217;ve seen too many great visuals with no credit or name at the bottom.</li>
<li><strong>Show your workings </strong>- transparency seems like a new front for journalists. Google Docs makes it incredibly easy to share your data and thought processes with readers. Who can then participate.</li>
</ol>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F05%2F14%2F5-data-visualisation-tips-from-david-mccandless%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/05/14/5-data-visualisation-tips-from-david-mccandless/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Data journalism pt5: Mashing data (comments wanted)</title>
		<link>http://onlinejournalismblog.com/2010/05/04/data-journalism-pt5-mashing-data-comments-wanted/</link>
		<comments>http://onlinejournalismblog.com/2010/05/04/data-journalism-pt5-mashing-data-comments-wanted/#comments</comments>
		<pubDate>Tue, 04 May 2010 08:36:59 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[apis]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[datamasher]]></category>
		<category><![CDATA[friendfeed]]></category>
		<category><![CDATA[google public data explorer]]></category>
		<category><![CDATA[jumbra]]></category>
		<category><![CDATA[MapTube]]></category>
		<category><![CDATA[mapumental]]></category>
		<category><![CDATA[mashups]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[twazzup]]></category>
		<category><![CDATA[xfruits]]></category>
		<category><![CDATA[Yahoo! Pipes]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8429</guid>
		<description><![CDATA[This is a draft from a book chapter on data journalism (part 1 looks at finding data; part 2 at interrogating data; part 3 at visualisation, and 4 at visualisation tools). I’d really appreciate any additions or comments you can make &#8211; particularly around tips and tools. UPDATE: It has now been published in The Online Journalism Handbook. [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F05%2F04%2Fdata-journalism-pt5-mashing-data-comments-wanted%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F05_2F04_2Fdata-journalism-pt5-mashing-data-comments-wanted_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F05%2F04%2Fdata-journalism-pt5-mashing-data-comments-wanted%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>This is a draft from a book chapter on data journalism (</em><em><a href="../2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/">part 1</a> looks at finding data</em><em>; </em><em><a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">part 2 at interrogating data</a>; <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/">part 3 at visualisation</a>, and <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/">4 at visualisation tools</a></em><em>). I’d really appreciate any additions or comments you can make &#8211; particularly around tips and tools.</em></p>
<p><strong>UPDATE: <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">It has now been published in The Online Journalism Handbook</a>.</strong></p>
<h2>Mashing data</h2>
<p>Wikipedia defines a mashup particularly succinctly, as &#8220;a web page or application that uses or combines data or functionality from two or many more external sources to create a new service.&#8221; Those sources may be online spreadsheets or tables; maps; RSS feeds (which could be anything from Twitter tweets, blog posts or news articles to images, video, audio or search results); or anything else which is structured enough to &#8216;match&#8217; against another source.</p>
<p>This &#8216;match&#8217; is typically what makes a mashup. It might be matching a city mentioned in a news article against the same city in a map; or it may be matching the name of an author with that same name in the tags of a photo; or matching the search results for &#8216;earthquake&#8217; from a number of different sources. The results can be useful to you as a journalist, to the user, or both.</p>
<h2>Why make a mashup?</h2>
<p>Mashups can be particularly useful in providing live coverage of a particular event or ongoing issue &#8211; mashing images from a protest march, for example, against a map. Creating a mashup online is not too dissimilar from how, in broadcast journalism, you might set up cameras at key points around a physical location in anticipation of an event from which you will later &#8216;pull&#8217; live feeds: in a mashup you are effectively doing exactly the same thing &#8211; only in a virtual space rather than a physical one. So, instead of setting up a feed at the corner of an important junction, you might decide to pull a feed from Flickr of any images that are tagged with the words &#8216;protest&#8217; and &#8216;anti-fascist&#8217;.<span id="more-8429"></span></p>
<p>Some web developers have built entire sites that are mashups. <strong>Twazzup</strong> (<a href="http://twazzup.com" onclick="urchinTracker('/outgoing/twazzup.com?referer=');">twazzup.com</a>) for example, will show you a mix of Twitter tweets, images from Flickr, news updates and websites &#8211; all based on the search term you enter. And <strong>Friendfeed</strong> (<a href="http://friendfeed.com" onclick="urchinTracker('/outgoing/friendfeed.com?referer=');">friendfeed.com</a>) pulls in data that you and your social circle post to a range of social networking sites, and displays them in one place.</p>
<p>Mashups also provide a different way for users to interact with content &#8211; either by choosing how to navigate (for instance by using a map), or by inviting them to input something (for instance, a search term, or selecting a point on a slider). The <a href="http://googlemapsmania.blogspot.com/2008/02/google-super-tuesday-map-mashup.html" onclick="urchinTracker('/outgoing/googlemapsmania.blogspot.com/2008/02/google-super-tuesday-map-mashup.html?referer=');">Super Tuesday YouTube/Google Maps mashup</a>, for instance, provided an at-a-glance overview of what election-related videos were being uploaded where across the US.</p>
<p>Finally, mashups offer an opportunity for juxtaposing different datasets to provide fresh, sometimes ongoing, insights. The MySociety/Channel 4 project <a href="http://mapumental.channel4.com/signup" onclick="urchinTracker('/outgoing/mapumental.channel4.com/signup?referer=');">Mapumental</a>, for example, combines house price data with travel information and data on the &#8216;scenicness&#8217; of different locations to provide an interactive map of a location which the user can interrogate based on their individual preferences.</p>
<h2>Mashup tools</h2>
<p>Like so many aspects of online journalism, the ease with which you can create a mashup has increased significantly in recent years. An increase in the number and power of online tools, combined with the increasing &#8216;mashability&#8217; of websites and data, mean that journalists can now create a basic mashup through the simple procedures of drag-and-drop or copy-and-paste.</p>
<p>A simple RSS mashup, which combines the feeds from a number of different sources into one, for example, can now be created using an RSS aggregator such as <strong>xFruits</strong> (<a href="http://xfruits.com" onclick="urchinTracker('/outgoing/xfruits.com?referer=');">xfruits.com</a>) or <strong>Jumbra</strong> (<a href="http://jumbra.com" onclick="urchinTracker('/outgoing/jumbra.com?referer=');">jumbra.com</a>).</p>
<p>Likewise, you can mix two maps together using the website <strong>MapTube</strong> (<a href="http://maptube.org" onclick="urchinTracker('/outgoing/maptube.org?referer=');">maptube.org</a>) which also contains a number of maps for you to play with.</p>
<p>And if you want to mix two sources of data into one visualisation the site <strong>DataMasher</strong> (<a href="http://datamasher.org" onclick="urchinTracker('/outgoing/datamasher.org?referer=');">datamasher.org</a>) will let you do that &#8211; although you&#8217;ll have to make do with the US data that the site provides. <strong>Google Public Data Explorer</strong> (<a href="http://google.com/publicdata" onclick="urchinTracker('/outgoing/google.com/publicdata?referer=');">google.com/publicdata</a>) is a similar tool which allows you to play with global data.</p>
<p>But perhaps the most useful tool for news mashups is <strong>Yahoo! Pipes</strong> (<a href="http://pipes.yahoo.com" onclick="urchinTracker('/outgoing/pipes.yahoo.com?referer=');">pipes.yahoo.com</a>).</p>
<p>Yahoo! Pipes allows you to choose a source of data &#8211; it might be an RSS feed, an online spreadsheet or something that the user will input &#8211; and do a variety of things with it. Here are just some of the basic things you might do:</p>
<ul>
<li>Add it to other sources</li>
<li>Combine it with other sources &#8211; for instance, matching images to text</li>
<li>Filter it</li>
<li>Count it</li>
<li>Annotate it</li>
<li>Translate it</li>
<li>Create a gallery from the results</li>
<li>Place results on a map</li>
</ul>
<p>You could write a whole book on how to use Yahoo! Pipes &#8211; indeed, people have &#8211; so we will not cover the practicalities of using all of those features here. There are also dozens of websites and help files devoted to the site (which you should explore). Below, however, is <a href="http://onlinejournalismblog.com/2008/07/16/how-to-create-basic-mashups-with-yahoo-pipes/">a short tutorial to introduce you to the website and how it works</a> &#8211; this is a good way to understand how basic mashups work, and how easily they can be created.</p>
<h2>Mashups and APIs</h2>
<p>Although there are a number of easy-to-use mashup creators listed above, really impressive mashups tend to be written by people with knowledge of programming languages, and use APIs. APIs (Application Programming Interface) allow websites to interact with other websites. The launch of the Google Maps API in 2005, for example, has been described as a &#8216;huge tipping point&#8217; in mashup history (<a href="http://www.webmonkey.com/2008/11/mashups_are_dead__but_the_web_is_alive/" onclick="urchinTracker('/outgoing/www.webmonkey.com/2008/11/mashups_are_dead_but_the_web_is_alive/?referer=');">Duvander, 2008</a>) as it allowed web developers to &#8216;mash&#8217; countless other sources of data with maps. Since then it has become commonplace for new websites, particularly in the social media arena, to launch their own APIs in order to allow web developers to do interesting things with their feeds and data &#8211; not just mashups, but applications and services too.</p>
<p>If you want to develop a particularly ambitious mashup it is likely that you will need to teach yourself some programming skills, and familiarise yourself with some APIs (the APIs of Twitter, Google Maps and Flickr are good places to start).</p>
<h2>Box-out: Anatomy of a feed</h2>
<p>The <a href="http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php" onclick="urchinTracker('/outgoing/www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php?referer=');">image below from ReadWriteWeb</a> shows the code behind a simple Twitter update. It includes information about the author, their location, whether the update was a reply to someone else, what time and where it was created, and lots more besides. Each of these values can be used by a mashup in various ways &#8211; for example, you might match the author of this tweet with the author of a blog or image; you might match its time against other things being published at that moment; or you might use their location to plot this update on a map.</p>
<p>While the code can be intimidating, you do not need to understand programming in order to be able to do things with it. Of course, it <em>will</em> help if you do&#8230;</p>
<p><img src="http://www.readwriteweb.com/images/map_of_a_tweet.png" alt="Anatomy of a Twitter feed" width="384" height="492" /></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F05%2F04%2Fdata-journalism-pt5-mashing-data-comments-wanted%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/05/04/data-journalism-pt5-mashing-data-comments-wanted/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Data journalism pt4: visualising data &#8211; tools and publishing (comments wanted)</title>
		<link>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/</link>
		<comments>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 11:47:15 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[charttool]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[factual]]></category>
		<category><![CDATA[fusioncharts]]></category>
		<category><![CDATA[google chart tools]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[google fusion tables]]></category>
		<category><![CDATA[icharts]]></category>
		<category><![CDATA[jing]]></category>
		<category><![CDATA[kwout]]></category>
		<category><![CDATA[manyeyes]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[skitch]]></category>
		<category><![CDATA[socrata]]></category>
		<category><![CDATA[swivel]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[tag clouds]]></category>
		<category><![CDATA[tagxedo]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[verifiable]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[widgenie]]></category>
		<category><![CDATA[word clouds]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8413</guid>
		<description><![CDATA[This is a draft from a book chapter on data journalism (here are parts 1; two; and three, which looks the charts side of visualisation). I’d really appreciate any additions or comments you can make &#8211; particularly around tips and tools. UPDATE: It has now been published in The Online Journalism Handbook. Visualisation tools So if [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F04_2F28_2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>This is a draft from a book chapter on data journalism (here are </em><em><a href="../2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/">parts 1</a></em><em>; </em><em><a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">two</a>; and <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/">three</a>, which looks the charts side of visualisation</em><em>). I’d really appreciate any additions or comments you can make &#8211; particularly around tips and tools.</em></p>
<p><strong>UPDATE: <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">It has now been published in The Online Journalism Handbook</a>.</strong></p>
<h2>Visualisation tools</h2>
<p>So if you want to visualise some data or text, how do you do it? Thankfully there are now dozens of free and cheap pieces of software that you can use to quickly turn your tables into charts, graphs and clouds.</p>
<p>The best-known tool for creating word clouds is <strong>Wordle </strong>(<a href="http://wordle.net" onclick="urchinTracker('/outgoing/wordle.net?referer=');">wordle.net</a>). Simply paste a block of text into the site, or the address of an RSS feed, and the site will generate a word cloud whose fonts and colours you can change to your preferences. Similar tools include <strong>Tagxedo </strong>(<a href="http://tagxedo.com/" onclick="urchinTracker('/outgoing/tagxedo.com/?referer=');">tagxedo.com</a>) and Wordlings (<a href="http://wordlin.gs/" onclick="urchinTracker('/outgoing/wordlin.gs/?referer=');">http://wordlin.gs</a>), both of which allow you to put your word cloud into a particular shape.</p>
<p><strong>ManyEyes </strong>(<a href="http://manyeyes.alphaworks.ibm.com/manyeyes/" onclick="urchinTracker('/outgoing/manyeyes.alphaworks.ibm.com/manyeyes/?referer=');">manyeyes.alphaworks.ibm.com/manyeyes/</a>) also allows you to create word clouds and tag clouds &#8211; as well as word trees and phrase nets that allow you to see common phrases. But it is perhaps most useful in allowing you to easily create scattergrams, bar charts, bubble charts and other forms. The site also contains a raft of existing data that you can play with to get a feel for the site. Similar tools that allow access to other data include <strong>Factual </strong>(<a href="http://factual.com" onclick="urchinTracker('/outgoing/factual.com?referer=');">factual.com</a>), <strong>Swivel </strong>(<a href="http://swivel.com" onclick="urchinTracker('/outgoing/swivel.com?referer=');">swivel.com</a>)[see comments], <strong>Socrata </strong>(<a href="http://socrata.com" onclick="urchinTracker('/outgoing/socrata.com?referer=');">socrata.com</a>) and <strong>Verifiable.com</strong> (<a href="http://verifiable.com" onclick="urchinTracker('/outgoing/verifiable.com?referer=');">verifiable.com</a>). And <strong>Google Fusion Tables</strong> (<a href="http://tables.googlelabs.com" onclick="urchinTracker('/outgoing/tables.googlelabs.com?referer=');">tables.googlelabs.com</a>) is particularly useful if you want to collaborate on tables of data, as well as offering visualisation options.</p>
<p>More general visualisation tools include <strong>widgenie </strong>(<a href="http://widgenie.com" onclick="urchinTracker('/outgoing/widgenie.com?referer=');">widgenie.com</a>), <strong>iCharts </strong>(<a href="http://icharts.net" onclick="urchinTracker('/outgoing/icharts.net?referer=');">icharts.net</a>), <strong>ChartTool </strong>(<a href="http://onlinecharttool.com" onclick="urchinTracker('/outgoing/onlinecharttool.com?referer=');">onlinecharttool.com</a>) and <strong>ChartGo </strong>(<a href="http://www.chartgo.com" onclick="urchinTracker('/outgoing/www.chartgo.com?referer=');">www.chartgo.com</a>). <strong>FusionCharts </strong>is a piece of visualisation software with a Google Gadget service that publishers may find useful. You can find instructions on how to use it at <a href="http://www.fusioncharts.com/GG/Docs/Index.html" onclick="urchinTracker('/outgoing/www.fusioncharts.com/GG/Docs/Index.html?referer=');">www.fusioncharts.com/GG/Docs</a></p>
<p>If you want more control over your visualisation &#8211; or want it to update dynamically when the source information is updated, <strong>Google Chart Tools</strong> (<a href="http://code.google.com/apis/charttools" onclick="urchinTracker('/outgoing/code.google.com/apis/charttools?referer=');">code.google.com/apis/charttools</a>) is worth exploring. This requires some technical knowledge, but there is a lot of guidance and help on the site to get you started quickly.</p>
<p><strong>Tableau Public </strong>is a piece of free software you can download (<a href="http://tableausoftware.com/public" onclick="urchinTracker('/outgoing/tableausoftware.com/public?referer=');">tableausoftware.com/public</a>) with some powerful visualisation options. You will also find visualisation options on spreadsheet applications such as <strong>Excel </strong>or the free <strong>Google Docs spreadsheet</strong> service. These are worth exploring as a way to quickly generate charts from your data on the fly.</p>
<h2>Publishing your visualisation</h2>
<p>There will come a point when you&#8217;ve visualised your data and need to publish it somehow. The simplest way to do this is to take an image (screengrab) of the chart or graph. This can be done with a web-based screencapture tool like <strong>Kwout </strong>(<a href="http://kwout.com" onclick="urchinTracker('/outgoing/kwout.com?referer=');">kwout.com</a>), a free desktop application like <strong>Skitch </strong>(<a href="http://skitch.com" onclick="urchinTracker('/outgoing/skitch.com?referer=');">skitch.com</a>) or <strong>Jing </strong>(<a href="http://jingproject.com" onclick="urchinTracker('/outgoing/jingproject.com?referer=');">jingproject.com</a>), or by simply using the &#8216;Print Screen&#8217; button on a PC keyboard (cmd+shift+3 on a Mac) and pasting the screengrab into a graphics package such as <strong>Photoshop</strong>.</p>
<p>The advantage of using a screengrab is that the image can be easily distributed on social networks, image sharing websites (such as Flickr), and blogs &#8211; driving traffic to the page on your site where it is explained.</p>
<p>If you are more technically minded, you can instead choose to embed your chart or graph. Many visualisation tools will give you a piece of code which you can copy and paste into the HTML of an article or blog post in the place you wish to display it (this will not work on most third party blog hosting services, such as WordPress.com). One particular advantage of this approach is that the visualisation can update itself if the source data is updated.</p>
<p>Alternatively, an understanding of Javascript can allow you to build &#8216;progressively enhanced&#8217; charts which allow users to access the original data or see what happens when it is changed.</p>
<h2>Showing your raw data</h2>
<p>It is generally a good idea to give users access to your raw data alongside its visualisation. This not only allows them to check it against your visualisation but add insights you may not otherwise gain. It is relatively straightforward to publish a spreadsheet online using Google Docs (see the sidebar on publishing a spreadsheet)</p>
<h2>SIDEBAR: How to: publish a spreadsheet online</h2>
<p><strong>Google Docs</strong> (<a href="http://docs.google.com" onclick="urchinTracker('/outgoing/docs.google.com?referer=');">docs.google.com</a>) is a free website which allows you to create and share documents. You can share them via email, by publishing them as a webpage, or by embedding your document in another webpage, such as a blog post. This is how you share a spreadsheet:</p>
<ol>
<li>Open your spreadsheet in Google Docs. You can upload a spreadsheet into Google Docs if you&#8217;ve created it elsewhere &#8211; there is a size limit, however, so if you are told the file is too big try removing unnecessary sheets or columns.</li>
<li>Look for the &#8216;Share&#8217; button (currently in the top right corner) and click on it.</li>
<li>A drop-down menu should appear. Click on &#8216;Publish as a web page&#8217;</li>
<li>A new window should appear asking which sheets you want to publish. Select the sheet you want to publish and click &#8216;Start publishing&#8217; (you should also make sure &#8216;Automatically republish when changes are made&#8217; is ticked if you want the public version of the spreadsheet to update with any data you add.)</li>
<li>Now the bottom half of that window &#8211; &#8216;Get a link to the published data&#8217; &#8211; should become active. In the bottom box should be a web address where you can now see the public version of your spreadsheet. If you want to share that, copy the address and test that it works in a web browser. You can now link to it from any webpage.</li>
<li>Alternatively, you can embed your spreadsheet &#8211; or part of it &#8211; in another webpage. To do this click on the first drop-down menu in this area &#8211; it will currently say &#8216;Web page&#8217; &#8211; and change it to &#8216;HTML to embed in a page&#8217;. Now the bottom box on this window should show some HTML that begins with</li>
<li>If you want to embed just part of a spreadsheet, in the box that currently says &#8216;All cells&#8217; type the range of cells you wish to show. For example, typing A1:G10 will select all the cells in your spreadsheet from A1 (the first row of column A) to G10 (the 10th row of column G). Once again, the HTML below will change so that it only displays that section of your spreadsheet.</li>
</ol>
<p><em>Once again, I&#8217;d welcome any comments on things I may have missed or tips you can add. <a href="http://onlinejournalismblog.com/2010/05/04/data-journalism-pt5-mashing-data-comments-wanted/">Part 5, on mashups, is now available here</a>.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Data journalism pt3: visualising data &#8211; charts and graphs (comments wanted)</title>
		<link>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/</link>
		<comments>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 09:49:59 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[bar charts]]></category>
		<category><![CDATA[bubble charts]]></category>
		<category><![CDATA[charlie beckett]]></category>
		<category><![CDATA[chartgo]]></category>
		<category><![CDATA[charttool]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[factual]]></category>
		<category><![CDATA[fusioncharts]]></category>
		<category><![CDATA[google chart tools]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[google fusion tables]]></category>
		<category><![CDATA[histograms]]></category>
		<category><![CDATA[humanisation]]></category>
		<category><![CDATA[icharts]]></category>
		<category><![CDATA[jing]]></category>
		<category><![CDATA[kwout]]></category>
		<category><![CDATA[line graphs]]></category>
		<category><![CDATA[manyeyes]]></category>
		<category><![CDATA[marcos weskamp]]></category>
		<category><![CDATA[newsmap]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[personalisation]]></category>
		<category><![CDATA[pictograms]]></category>
		<category><![CDATA[pie charts]]></category>
		<category><![CDATA[scattergrams]]></category>
		<category><![CDATA[skitch]]></category>
		<category><![CDATA[small multiples]]></category>
		<category><![CDATA[socrata]]></category>
		<category><![CDATA[swivel]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[tableau public]]></category>
		<category><![CDATA[tag clouds]]></category>
		<category><![CDATA[tagxedo]]></category>
		<category><![CDATA[treemaps]]></category>
		<category><![CDATA[verifiable]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[widgenie]]></category>
		<category><![CDATA[word clouds]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8407</guid>
		<description><![CDATA[This is a draft from a book chapter on data journalism (the first, on gathering data, is here; the section on interrogating data is here). I’d really appreciate any additions or comments you can make &#8211; particularly around considerations in visualisation. A further section on visualisation tools, can be found here. UPDATE: It has now been [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt3-visualising-data-comments-wanted%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F04_2F28_2Fdata-journalism-pt3-visualising-data-comments-wanted_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt3-visualising-data-comments-wanted%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>This is a draft from a book chapter on data journalism (<a href="../2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/">the first, on gathering data, is here</a>; the <a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">section on interrogating data is here</a>). I’d really appreciate any additions or comments you can make &#8211; particularly around considerations in visualisation. A further section <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/">on visualisation tools, can be found here</a></em><em>.</em></p>
<p><strong>UPDATE: <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">It has now been published in The Online Journalism Handbook</a>.</strong></p>
<blockquote><p>&#8220;At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers &#8211; even a very large set &#8211; is to look at pictures of those numbers.&#8221; (Edward Tufte, <em><a href="http://bit.ly/9rMX9D" onclick="urchinTracker('/outgoing/bit.ly/9rMX9D?referer=');">The Visual Display of Quantitative Information</a>, 2001)</em></p></blockquote>
<p>Visualisation is the process of giving a graphic form to information which is often otherwise dry or impenetrable. Classic examples of visualisation include turning a table into a bar chart, or a series of percentage values into a pie chart &#8211; but the increasing power of both computer analysis and graphic design software have seen the craft of visualisation develop with increasing sophistication. In larger organisations the data journalist may work with a graphic artist to produce an infographic that visualises their story &#8211; but in smaller teams, in the initial stages of a story, or when speed is of the essence they are likely to need to use visualisation tools to give form to their data.</p>
<p>Broadly speaking there are two typical reasons for visualising data: to find a story; or to tell one. Quite often, it is both.<span id="more-8407"></span></p>
<p>In the parking tickets story <a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">above</a>, for example, it was the process of visualisation that tipped off Adrian Short and Guardian journalist Charles Arthur to the story &#8211; and led to further enquiries.</p>
<p>In most cases, however, the story will not be as immediately visible. Sometimes the data will need to be visualised in different ways before a story becomes clear. And an understanding of the strengths of different types of visualisation can be particularly useful here.</p>
<p>UPDATE (Dec 7, 2010): Visualisation probably needs to be extended to include humanisation and personalisation. <a href="http://onlinejournalismblog.com/2010/12/07/wikileaks-cablegate/">More detail here</a>, and to come.</p>
<h2>Types of visualisation</h2>
<p>Visualisation can take on a range of forms. The most familiar are those we know from maths and statistics: <strong>pie charts</strong>, for example, allow you to show how one thing is divided &#8211; for example, how a budget is spent, or how a population is distributed. They are thought to be particularly useful when the proportions represented are large (for example, above 25%), but less useful when lower percentages are involved, due to issues with perception and the ability to compare different elements.</p>
<p>More useful in those circumstances are <strong>bar charts</strong> or <strong>histograms</strong>. Although these look the same there are subtle differences between them: the bars in bar charts represent categories (such as different cities), whereas bars in histograms represent different values on a continuum (for instance: ages, weights or amounts). You should avoid using 3D or shadow effects in bar charts as these do not add to the information or clarity (histograms do not have gaps between bars). The advantage of both types of chart over pie charts is that users can more easily see the difference between one quantity and another. Bar charts also allow you to show change over time.</p>
<p><strong>Pictograms</strong> are like bar charts but use an icon to represent quantity &#8211; so a population of 50,000 might be represented by 5 &#8216;person&#8217; icons. It is not advisable to use pictograms if quantities are close together as the user will find it harder to discern the differences.</p>
<p>Also useful for showing change over time are <strong>line graphs</strong>. Lines are &#8220;suited for showing trend, acceleration or deceleration, and volatility, including sudden peaks or troughs&#8221; (<a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0393072959" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0393072959?referer=');">Wong, 2010</a>, p51). In addition, a series of lines overlaid upon each other can also quickly show if any variables change at different points or at simultaneous points, suggesting either relationships or shared causes (but by no means proving it &#8211; these should be taken as starting points for further investigation. You should also avoid plotting more than four lines in one chart for purposes of clarity).</p>
<p>Line graphs should not be used to show unrelated events. As Seth Godin (<a href="http://sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html" onclick="urchinTracker('/outgoing/sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html?referer=');">2009</a>) puts it: &#8220;A graph of IQs of everyone in your kindergarten class should be a series of unrelated points, not a line graph. On the other hand, your weight loss is in fact a continuous function, so each piece of data should be attached.&#8221;</p>
<p><strong>Scattergrams </strong>are similar to line graphs, showing the distribution of individual elements against two axes, but can be particularly useful in showing up &#8216;outliers&#8217;. Outliers are pieces of data which differ noticeably from the rest. These may be of particular interest journalistically when they show, for example, an MP claiming substantially more (or less) expenses than their peers.</p>
<p>A number of charts can be visualised together in what is sometimes called <strong>&#8216;small multiples</strong>&#8216;, allowing the journalist or users to display a number of pie charts, line graphs or other charts alongside each other &#8211; allowing comparison, for example, between different populations.</p>
<p>Two increasingly popular forms of visualisation online are treemaps and bubble charts. Unlike other charts which allow you to visualise two aspects of the data (i.e. their place on each axis) <strong>bubble charts</strong> allow you to visualise three aspects of the data &#8211; the third being represented by the size of the bubble itself. A particularly good example of bubble charts in action can be seen in <a href="http://www.youtube.com/watch?v=RUwS1uAdUcI" onclick="urchinTracker('/outgoing/www.youtube.com/watch?v=RUwS1uAdUcI&amp;referer=');">Hans Rosling&#8217;s TED talk on debunking third-world myths</a> &#8211; a presentation which also demonstrates the potential of other forms of visualisation, and animation, in presenting complex information in an easy-to-understand way.</p>
<p>Finally, <strong>Treemaps </strong>visualise hierarchical data in a way that could be described as rectangular pie charts-within-pie charts. This is particularly useful for representing different parts of a whole and their relationship to each other, for instance, different budgets within a government.</p>
<p>Perhaps the best-known example of a treemap is <a href="http://newsmap.jp/" onclick="urchinTracker('/outgoing/newsmap.jp/?referer=');">Newsmap</a>, created in 2004 by Marcos Weskamp. This visualises the amount of coverage given to stories by news organisations based on a feed from Google News. Weskamp explains it as follows:</p>
<blockquote><p>&#8220;Google News automatically groups news stories with similar content and places them based on algorithmic results into clusters. In Newsmap, the size of each cell is determined by the amount of related articles that exist inside each news cluster that the Google News Aggregator presents. In that way users can quickly identify which news stories have been given the most coverage, viewing the map by region, topic or time. Through that process it still accentuates the importance of a given article.&#8221; (<a href="http://marumushi.com/projects/newsmap" onclick="urchinTracker('/outgoing/marumushi.com/projects/newsmap?referer=');">Weskamp, 2005</a>)</p></blockquote>
<p>These are just the most common forms of visualisation, but there are dozens more to explore. <a href="http://www.visual-literacy.org/periodic_table/periodic_table.html" onclick="urchinTracker('/outgoing/www.visual-literacy.org/periodic_table/periodic_table.html?referer=');">The Periodic Table of Visualisation</a> is a particularly useful webpage giving an overview of the various forms.</p>
<h2>Considerations in visualisation</h2>
<p>Charlie Beckett <a href="http://www.charliebeckett.org/?p=3930" onclick="urchinTracker('/outgoing/www.charliebeckett.org/?p=3930&amp;referer=');">makes a useful distinction</a> between using visualisation for &#8220;rational understanding (I now get the figures) and emotional understanding (I  now care about the figures and want to do something).&#8221; It is worth deciding which of the two you are aiming for.</p>
<p>When visualising data it is also important to ensure that any comparisons are meaningful, or like-for-like. In one visualisation of how many sales a musician needs to make to earn the minimum wage, for example, a comparison is made between sites selling albums, sites selling individual tracks, and those providing music streams. Clearly this is misleading &#8211; and was criticised for being so (<a href="http://techdirt.com/articles/20100413/1647599007.shtml" onclick="urchinTracker('/outgoing/techdirt.com/articles/20100413/1647599007.shtml?referer=');">Techdirt, 2010</a>).</p>
<p>The <a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0393072959" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0393072959?referer=');">Wall Street Journal Guide to Information Graphics (2010</a>) offers a wealth of tips on elements to consider and mistakes to avoid in both visualisation and data research and is well worth reading for more on this area. Here are just a selection:</p>
<ul>
<li>&#8220;Choose the best data series to illustrate your point, e.g. market share vs. total revenue</li>
<li>&#8220;Filter and simplify the data to deliver the essence of the data to your intended audience</li>
<li>&#8220;Make numerical adjustments to the raw data to enhance your point, e.g. absolute values vs. percentage change</li>
<li>&#8220;Choose the appropriate chart settings, e.g. scale, y-axis increments and baseline</li>
<li>&#8220;If the raw data is insufficient to tell the story, do not add decorative elements. Instead, research additional sources and adjust data to stay on point</li>
<li>&#8220;Data is only as good as its source. Getting data from reputable and impartial sources is critical. For example, data should be benchmarked against a third party to avoid bias and add credibility</li>
<li>&#8220;In the research stage, a bigger data set allows more in-depth analysis. In the edit phase, it is important to assess whether all your extra information buries the main point of the story or enhancwes [it].&#8221;</li>
</ul>
<h2>Visualising large amounts of text</h2>
<p>If you are working with text rather than numbers there are ways to visualise that as well. <strong>Word clouds</strong>, for instance, show which words are used most often in a particular document (such as a speech, bill, or manifesto) or data stream (such as an RSS feed of what people are saying on Twitter or blogs). This can be particularly useful in drawing out the themes of a politician&#8217;s speech, for example, or the reaction from people online to a particular event. They can also be used to draw comparisons &#8211; word clouds have been used in the past to compare the inaugural speeches of Barack Obama with those of Bush and Clinton; and to compare the 2010 UK election manifestos of the Labour and Conservative parties. The <strong>tag cloud</strong> is similar to the word cloud, but typically allows you to click on an individual tag (word or phrase) to see where it has been used.</p>
<p>There are other forms for word visualisation too, particularly around showing relationships between words &#8211; when they occur together, or how often. The terminology varies: visualisation tool <strong>ManyEyes</strong>, for example, calls these <strong>word trees</strong> and <strong>phrase nets</strong> but other tools will have different names.</p>
<p><em>Once again, I&#8217;d welcome any comments on areas I may have missed or things journalists should consider. I&#8217;ve had to split this section into two, so <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/">Part 4 continues to look at visualisation, and focuses on tools and publishing</a>. </em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt3-visualising-data-comments-wanted%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Data journalism pt2: Interrogating data</title>
		<link>http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/</link>
		<comments>http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 09:04:32 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[adrian short]]></category>
		<category><![CDATA[Charles Arthur]]></category>
		<category><![CDATA[cleaning data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[kaiser fung]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[tony hirst]]></category>
		<category><![CDATA[trim]]></category>
		<category><![CDATA[Yahoo! Pipes]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8398</guid>
		<description><![CDATA[This is a draft from a book chapter on data journalism (the first, on gathering data, is here). I’d really appreciate any additions or comments you can make &#8211; particularly around ways of spotting stories in data, and mistakes to avoid. UPDATE: It has now been published in The Online Journalism Handbook. &#8220;One of the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F26%2Fdata-journalism-pt2-interrogating-data%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F04_2F26_2Fdata-journalism-pt2-interrogating-data_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F26%2Fdata-journalism-pt2-interrogating-data%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>This is a draft from a book chapter on data journalism (<a href="http://onlinejournalismblog.com/2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/">the first, on gathering data, is here</a>). I’d really appreciate any additions or comments you can make &#8211; particularly around ways of spotting stories in data, and mistakes to avoid.</em></p>
<p><strong>UPDATE: <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">It has now been published in The Online Journalism Handbook</a>.</strong></p>
<blockquote><p>&#8220;One of the most important (and least technical) skills in understanding data is asking good questions. An appropriate question shares an interest you have in the data, tries to convey it to others, and is curiosity-oriented rather than math-oriented. Visualizing data is just like any other type of communication: success is defined by your audience&#8217;s ability to pick up on, and be excited about, your insight.&#8221; (<a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0596514557" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0596514557?referer=');">Fry, 2008, p4</a>)</p></blockquote>
<p>Once you have the data you need to see if there is a story buried within it. The great advantage of computer processing is that it makes it easier to sort, filter, compare and search information in different ways to get to the heart of what &#8211; if anything &#8211; it reveals.<span id="more-8398"></span></p>
<p>The first stage in this process, then, is making sure the data is in the right format to be interrogated. Quite often this will be a spreadsheet or CSV (comma-separated values) file. If your information is in a PDF you will not be able to do a great deal with it other than re-type the values into a new spreadsheet (making sure to check you have not made any errors). A Word or Powerpoint document is likely to require the same work.</p>
<p>If the information is already online you can sometimes &#8216;scrape&#8217; it &#8211; that is, automatically copy the relevant information into a separate document. How easy this is to do depends on how structured the information is. A table in a Wikipedia entry, for example, can be &#8216;scraped&#8217; into a Google spreadsheet relatively easily (Tony Hirst gives instructions on how to do this at <a href="http://ouseful.wordpress.com/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/" onclick="urchinTracker('/outgoing/ouseful.wordpress.com/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/?referer=');">http://ouseful.wordpress.com/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/</a>) and an online CSV file and certain other structured data <a href="http://www.daybarr.com/blog/2007/12/11/yahoo-pipes-tutorial-an-example-using-the-fetch-page-module-to-make-a-web-scraper" onclick="urchinTracker('/outgoing/www.daybarr.com/blog/2007/12/11/yahoo-pipes-tutorial-an-example-using-the-fetch-page-module-to-make-a-web-scraper?referer=');">can be scraped with Yahoo! Pipes</a> (see below for more on using Yahoo! Pipes) but most scraping will involve programming (the online tool <a href="http://scraperwiki.com/" onclick="urchinTracker('/outgoing/scraperwiki.com/?referer=');">ScraperWiki </a>provides one environment to help you do this).</p>
<p><strong>Insert: Cleaning up data</strong></p>
<p>Whether you have been given data, had to scrape it, or copied it manually, you will probably need to clean it up. All sorts of things can &#8216;dirty&#8217; your data, from misspellings and variations in spelling, to odd punctuation, mixtures of numbers and letters, unnecessary columns or rows, and more. Computers, for example, will see &#8216;New Town&#8217;, &#8216;Newtown&#8217; and &#8216;newtown&#8217; as three separate towns when they may be one.</p>
<p>This can cause problems later on when analysing your data &#8211; for example, calculations not working or results not being accurate.</p>
<p>Some tips for cleaning your data include:</p>
<ul>
<li>Use a spellchecker to check for misspellings. You will probably have to add some words to the computer&#8217;s dictionary.</li>
<li>Use &#8216;find and replace&#8217; (normally in the Edit menu) to remove double-spaces and other common punctuation errors. Alternatively, if you are in Excel you can create a new column and use the =TRIM() function, which will copy the contents of the cell in the brackets and remove any spaces.</li>
<li>Remove duplicate entries &#8211; if you are using Excel there are a few ways to do this under the Data tab &#8211; search for duplicates in Help.</li>
</ul>
<p>For more tips on Excel specifically see this guide: <a href="http://office.microsoft.com/en-us/excel/HA102218401033.aspx" onclick="urchinTracker('/outgoing/office.microsoft.com/en-us/excel/HA102218401033.aspx?referer=');">http://office.microsoft.com/en-us/excel/HA102218401033.aspx</a></p>
<p>For cleaning up very large sets of data you might want to <a href="http://onlinejournalismblog.com/2010/11/11/data-cleaning-tool-relaunches-freebase-gridworks-becomes-google-refine/">use a data cleaning tool like Google Refine</a>.</p>
<h2>Spotting the story</h2>
<p>Once your data is cleaned you can start to look for anything newsworthy in it. There are some obvious places to start: if you are dealing with numbers, for example, you can work out what the &#8216;average&#8217; is (the average bonus paid to council employees, for example). Similarly, you might look for the term which appears most often (e.g. the most common reason given for arresting terrorist suspects).</p>
<p>However, Kaiser Fung, a statistician whose blog <a href="http://junkcharts.typepad.com/" onclick="urchinTracker('/outgoing/junkcharts.typepad.com/?referer=');">Junk Charts</a> is essential reading on the field, notes the dangers in lazily reaching for the average when you want to make an editorial point:</p>
<blockquote><p>&#8220;Averaging stamps out diversity, reducing anything to its simplest terms. In so doing, we run the risk of oversimplifying, of forgetting the variations around the average. Hitching one&#8217;s attention to these variations rather than the average is a sure sign of maturity in statistical thinking. One can, in fact, define statistics as the study of the nature of variability. How much do things change? How large are these variations? What causes them?&#8221; (<a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0071626530" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0071626530?referer=');">Fung, 2010, p4</a>)</p></blockquote>
<p>So while averages and modes (a mode is the number or term which appears most often) can be interesting discoveries, they should most often be used as a starting point for more illuminating investigation &#8211; normally involving leaving your computer to make phonecalls and speak to sources.</p>
<p>If you are looking at data over time, you can look to see what has increased over that period, or decreased &#8211; or disappeared.</p>
<p>But you will need to gather further data to provide context to your figures. If, for example, more council staff are receiving bonuses, is that simply because more staff have been employed? How much is spent on wages, and how do your figures compare? If you are comparing one city with another, understand how their populations differ &#8211; not just in aggregate, but in relevant details such as age, ethnicity, life expectancy, etc. You will need to know where to access basic statistics like these &#8211; the National Statistics website is often a good place to start.</p>
<p>Sometimes a change in the way data is gathered or categorised can produce a dramatic change in the data itself. In one example, designer Adrian Short obtained information (via an FOI request) on parking tickets from Transport for London that showed the numbers of tickets issued against a particular offence plummeted from around 8,000 to 8 in the space of one month (<a href="http://www.guardian.co.uk/news/datablog/2009/apr/29/transport-london-parking-tickets" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2009/apr/29/transport-london-parking-tickets?referer=');">Arthur, 2009</a>). Had people suddenly stopped committing that parking offence, or was there another explanation? A quick phonecall to Transport for London revealed that traffic wardens were issued with new handsets around the same time. Guardian journalist Charles Arthur hypothesised:</p>
<blockquote><p>&#8220;Could it be that s46 [another offence which had a steep rise at the same time] is the default on the screen to issue a new ticket, and that wardens don&#8217;t bother to change it? Whatever it is, there&#8217;s a serious problem for TfL if those aren&#8217;t all s46 offences which have been ticketed since August 2006. Because if the ticket isn&#8217;t written out to the correct offence, then the fine isn&#8217;t payable. Theoretically, TfL might have to pay back millions in traffic fines for people who have been ticketed for s46 offences when they were actually committing s25 or s30 offences.&#8221;</p></blockquote>
<p>This particular story came about at least in part because that information was easy to visualise.</p>
<p><em>The <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/">next section covers visualisation</a>. In the meantime, once again I’d really appreciate any additions or comments – particularly around ways of spotting stories in data, and mistakes to avoid.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F26%2Fdata-journalism-pt2-interrogating-data%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Data journalism pt1: Finding data (draft &#8211; comments invited)</title>
		<link>http://onlinejournalismblog.com/2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/</link>
		<comments>http://onlinejournalismblog.com/2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 15:22:13 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[online journalism]]></category>
		<category><![CDATA[foi]]></category>
		<category><![CDATA[isitopendata.org]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[Public Sector Information Unlocking Service]]></category>
		<category><![CDATA[tim davies]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8340</guid>
		<description><![CDATA[The following is a draft from a book about online journalism that I&#8217;ve been working on. I&#8217;d really appreciate any additions or comments you can make &#8211; particularly around sources of data and legal considerations The first stage in data journalism is sourcing the data itself. Often you will be seeking out data based on [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F21%2Fdata-journalism-pt1-finding-data-draft-comments-invited%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F04_2F21_2Fdata-journalism-pt1-finding-data-draft-comments-invited_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F21%2Fdata-journalism-pt1-finding-data-draft-comments-invited%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>The following is a draft from <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">a book about online journalism</a> that I&#8217;ve been working on. I&#8217;d really appreciate any additions or comments you can make &#8211; particularly around sources of data and legal considerations<br />
</em></p>
<p>The first stage in data journalism is sourcing the data itself. Often you will be seeking out data based on a particular question or hypothesis (for a good guide to forming a journalistic hypothesis see Mark Hunter&#8217;s free ebook Story-Based Inquiry (2010)). On other occasions, it may be that the release or discovery of data itself kicks off your investigation.</p>
<p>There are a range of sources available to the data journalist, both online and offline, public and hidden. Typical sources include:</p>

<ul>
<li>national and local government;</li>
<li>bodies that monitor organisations (such as regulators or consumer bodies);</li>
<li>scientific and academic institutions;</li>
<li>health organisations;</li>
<li>charities and pressure groups;</li>
<li>business;</li>
<li>and the media itself.</li>
</ul>
<p>One of the best places to find UK government data online, for example, is Data.gov.uk, an initiative influenced by its US predecessor Data.gov. Data.gov.uk &#8211; launched in January 2010 with the backing of the inventor of the World Wide Web, Sir Tim Berners-Lee &#8211; effectively acts as a search engine and index for thousands of sets of data held by a range of government departments, from statistics on the re-offending of juveniles to the Agricultural Price Index. The site also hosts forums for users to discuss their use of the data, examples of applications using data, further information on how to use the data, and technical resources.</p>
<p>At a regional level, local authorities are also releasing information that can be used as part of data journalism projects. The quality, quantity and accessibility of this information varies enormously by council, but there is continuing pressure for improvement in this area.</p>
<p>There are also a number of volunteer projects, such as OpenlyLocal and Mash The State, that make local government data available in as accessible a format as possible, while the organisation MySociety operate a group of websites providing easy access to information ranging from particular politicians&#8217; voting record (TheyWorkForYou) to local problems (FixMyStreet) and information about a particular area&#8217;s transport links and beauty (Mapumental). MySociety also runs a petitions website for Downing Street, and websites that allow people to pledge to do something if other people sign up too (PledgeBank), to find groups near you (GroupsNearYou), contact your MP (WriteToThem) or be contacted by them (HearFromYourMP).</p>
<h2>Private companies and charities</h2>
<p>In the private sector, a number of organisations regularly release data online, from tables and research reports published on company websites to the annual reports that are filed with bodies such as Companies House. Also worth looking at is the web project Companies Open House, which seeks to make company information more easily accessible.</p>
<p>The Charity Commission is an excellent source of information on registered charities, who must file accounts and annual reports with the organisation. The commission also conducts occasional research into the sector.</p>
<h2>Regulators, researchers and the media</h2>
<p>NHS foundation trusts likewise must file reports to their regulator, Monitor. And you will find similar regulators in other areas such as the Financial Services Authority, Ofcom, Ofwat, Ofqual, the General Medical Council, the General Social Care Council and the Pensions Regulator to name just a few.</p>
<p>For academic and scientific research there are hundreds of specialist journals. Most have online search facilities which will provide access to summaries. To get access to the full paper you will probably need to use the library of a university which has a subscription. For access to a journal on midwifery, for example, your best bet is to give a quick call to the nearest university which teaches courses in that field. Although university libraries increasingly limit access to students, you can request a special pass. For access to the data on which research is based it is likely you will need to contact the author.</p>
<p>Media organisations such as The Guardian and the New York Times publish &#8216;datablogs&#8217; that regularly release sets of data produced or acquired by investigations, ranging from scientific information about global warming to lists of Oscar winners. These can be a rich source of material for the data journalist, and a great starting point for the beginner as they are often &#8216;cleaner&#8217; than data from elsewhere.</p>
<p>The Guardian and the New York Times websites are also among an increasing number of web platforms generally which are making their own data available via APIs (Application Programming Interfaces). Typically, websites which offer this access are social networking sites (such as Flickr and Twitter).</p>
<p>Accessing this data typically requires a level of technical ability, but can be particularly useful in measuring activity across social networks (for example sharing and publishing). Even if you don&#8217;t have that technical ability, understanding the possibilities can be extremely useful when working with web developers on a data journalism project (see the part of this chapter on mashups for more information on APIs).</p>
<h2>Using search engines to find data</h2>
<p>If you are using a search engine to find the data you are looking for, you should familiarise yourself with the advanced search facility, where you can often specify the format of the file you are looking for. Searching specifically for spreadsheets (files ending in .xls), for example, is likely to get you to data more quickly. Similarly, official reports can often be found more effectively by searching for PDF format, while Powerpoint presentations (.ppt) will sometimes contain useful tables of data. You can also include &#8216;XML&#8217; or &#8216;RDF&#8217; in your search terms if you think your data may be in those or other formats.</p>
<p>Advanced search also allows you to specify the type of website you are searching &#8211; those ending in .gov.uk (government), .org and .org.uk (charities), .ac.uk (educational establishments), .nhs, .police.uk and .mod (Ministry of Defence) are just some that will be particularly relevant (you can also specify an individual site &#8211; for instance, that of a local council). A basic familiarity with these search techniques &#8211; for example limiting your search to spreadsheets on .gov.uk websites &#8211; can improve your results.</p>
<h2>Live data</h2>
<p>Another type of data to think about is live data that is not stored anywhere yet but, rather, will be produced at a particular time. A good example of this would be how newspapers are increasingly using Twitter commentary to provide context to a particular debate. Part of the Guardian&#8217;s coverage of Tony Blair&#8217;s appearance at the Chilcot Inquiry into the Iraq War, for example, used the data of thousands of Twitter updates (&#8216;tweets&#8217;) to provide <a href="http://www.guardian.co.uk/politics/blairometer/history" onclick="urchinTracker('/outgoing/www.guardian.co.uk/politics/blairometer/history?referer=');">a &#8216;sentiment analysis&#8217; timeline of how people reacted to particular parts of his evidence as it went on</a>. Similar timelines have been produced for political debates and speeches to measure public reaction.</p>
<p>Preparation is key to live data projects &#8211; where will you get the data from, and how will you filter it? How can you visualise it most clearly? And how do you prevent it being &#8216;gamed&#8217; (users intentionally skewing the results for fun or commercial or political reasons)?</p>
<h2>Legal considerations</h2>
<p>Whatever data you are acquiring, you will need to consider whether you have permission to republish that data. Data may be covered by copyright, or may raise issues of data protection or privacy. Even apparently anonymous information can sometimes be traced back to individual users (<a href="http://query.nytimes.com/gst/fullpage.html?res=9E0CE3DD1F3FF93AA3575BC0A9609C8B63" onclick="urchinTracker('/outgoing/query.nytimes.com/gst/fullpage.html?res=9E0CE3DD1F3FF93AA3575BC0A9609C8B63&amp;referer=');">Barbaro &amp; Zeller, 2006</a>), and while government information is paid for by public money, for example, it is, strictly speaking, often covered by Crown Copyright, while organisations like Ordnance Survey and Royal Mail have been notoriously protective of geographical information and postcodes (see <a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0434020265" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0434020265?referer=');">Brooke, 2010</a>).</p>
<h2>Books and FOI</h2>
<p>Of course, there is also a rich range of data available in books that the data journalist should familiarise themselves with &#8211; from books of facts and statistics to almanacs, from the Civil Service Year Book (<a href="http://www.civil-service.co.uk/" onclick="urchinTracker('/outgoing/www.civil-service.co.uk/?referer=');">also online</a>) to volumes like Who&#8217;s Who (online at <a href="http://ukwhoswho.com" onclick="urchinTracker('/outgoing/ukwhoswho.com?referer=');">ukwhoswho.com</a> &#8211; your library may have a subscription).</p>
<p>Particularly useful is the data held by public bodies which can be accessed through a well-worded Freedom of Information (FOI) request. Heather Brooke&#8217;s book <a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0745325823" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0745325823?referer=');">Your Right To Know (2007)</a> is a key reference work in this area, and the online tool WhatDoTheyKnow is particularly useful in allowing you to submit FOI requests easily, as well as allowing you to find similar FOI requests and the responses to them.</p>
<p>When requesting data through an FOI request, it is always useful to specify the format that you wish the information to be supplied in &#8211; typically a spreadsheet in electronic format. A PDF or Word document, for example, will mean extra work at the next stage: interrogation.</p>
<p>UPDATE: Tim Davies <a href="http://www.timdavies.org.uk/2011/01/29/sourcing-raw-data/" onclick="urchinTracker('/outgoing/www.timdavies.org.uk/2011/01/29/sourcing-raw-data/?referer=');">lists a couple of further avenues</a> along these lines:</p>
<blockquote><p><a href="http://unlockingservice.data.gov.uk/" onclick="urchinTracker('/outgoing/unlockingservice.data.gov.uk/?referer=');">http://unlockingservice.data.gov.uk/</a> provides a root for requesting data is opened up by the Data.gov.uk team. It’s not backed by the legal framework of FOI, but may play a role in data requests under the currently debated ‘Right to Data’ legislation.</p>
<p><strong><a href="http://isitopendata.org/" onclick="urchinTracker('/outgoing/isitopendata.org/?referer=');">IsItOpenData.org</a> </strong>provides a useful tool for asking non-public bodies to share their data as open data, or to clarify the licensing.&#8221;</p></blockquote>
<p><em>Once again &#8211; this is a draft: I&#8217;d really appreciate any additions or comments you can make &#8211; particularly around sources of data and legal considerations. <a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">Part 2 &#8211; on interrogating data &#8211; can be found here</a></em><em>.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F21%2Fdata-journalism-pt1-finding-data-draft-comments-invited%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe><div align="center"><a href="http://twitter.com/paulbradshaw" target="_blank" onclick="urchinTracker('/outgoing/twitter.com/paulbradshaw?referer=');"><img src="http://onlinejournalismblog.com/wp-content/plugins/igit-follow-me-after-post-button-new/twitter8.png" /></a><div style="font-size:8px;"><a href="http://php-freelancer.in/" style="color:#D2D2D2" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer" title="PHP Freelancer , PHP Freelancer India , Hire PHP Freelancer"  onclick="urchinTracker('/outgoing/php-freelancer.in/?referer=');">PHP Freelancer</a></div></div>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
	</channel>
</rss>

