<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; api</title>
	<atom:link href="http://onlinejournalismblog.com/tag/api/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Sat, 11 Feb 2012 12:06:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Maps &#8220;in the public interest&#8221; now exempt from Google Maps API charge</title>
		<link>http://onlinejournalismblog.com/2011/11/28/maps-in-the-public-interest-now-exempt-from-google-maps-api-charge/</link>
		<comments>http://onlinejournalismblog.com/2011/11/28/maps-in-the-public-interest-now-exempt-from-google-maps-api-charge/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 12:55:34 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[Google Maps]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[nieman journalism lab]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15452</guid>
		<description><![CDATA[If you thought you couldn&#8217;t use the Google Maps API any more as a journalist, this update to the Google Geo Developers Blog should make you reconsider. From Nieman Journalism Lab: &#8220;Certain web apps will be given blanket exemptions from charging. Here’s Google: “Maps API applications developed by non-profit organisations, applications deemed by Google to be in the public interest, and<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/11/28/maps-in-the-public-interest-now-exempt-from-google-maps-api-charge/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F11%2F28%2Fmaps-in-the-public-interest-now-exempt-from-google-maps-api-charge%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F11_2F28_2Fmaps-in-the-public-interest-now-exempt-from-google-maps-api-charge_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F11%2F28%2Fmaps-in-the-public-interest-now-exempt-from-google-maps-api-charge%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>If you thought you couldn&#8217;t use the Google Maps API any more as a journalist, <a href="http://googlegeodevelopers.blogspot.com/2011/11/understanding-how-maps-api-usage-limits.html" onclick="urchinTracker('/outgoing/googlegeodevelopers.blogspot.com/2011/11/understanding-how-maps-api-usage-limits.html?referer=');">this update to the Google Geo Developers Blog</a> should make you reconsider. <a href="http://www.niemanlab.org/2011/11/google-backtracks-a-bit-on-charging-for-its-maps-api/?utm_source=Weekly+Lab+email+list&amp;utm_medium=email&amp;utm_campaign=d6cc0c58d4-WEEKLY_EMAIL" onclick="urchinTracker('/outgoing/www.niemanlab.org/2011/11/google-backtracks-a-bit-on-charging-for-its-maps-api/?utm_source=Weekly+Lab+email+list_amp_utm_medium=email_amp_utm_campaign=d6cc0c58d4-WEEKLY_EMAIL&amp;referer=');">From Nieman Journalism Lab</a>:</p>
<blockquote><p><strong>&#8220;Certain web apps will be given blanket exemptions from charging.</strong> Here’s Google: “Maps API applications developed by non-profit organisations, applications deemed by Google to be in the public interest, and applications based in countries where we do not support Google Checkout transactions or offer Maps API Premier are exempt from these usage limits.” So nonprofit news orgs look to be in the clear, and Google could declare other news org maps apps to be “in the public interest” and free to run. (It also notes that nonprofits could be eligible for a free <a href="http://www.google.com/enterprise/earthmaps/maps-compare.html" onclick="urchinTracker('/outgoing/www.google.com/enterprise/earthmaps/maps-compare.html?referer=');">Maps API Premier license</a>, which comes with extra goodies around advertising and more.)&#8221;</p></blockquote>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F11%2F28%2Fmaps-in-the-public-interest-now-exempt-from-google-maps-api-charge%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/11/28/maps-in-the-public-interest-now-exempt-from-google-maps-api-charge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scraperwiki now makes it easier to ask questions of data</title>
		<link>http://onlinejournalismblog.com/2011/09/22/scraperwiki-now-makes-it-easier-to-ask-questions-of-data/</link>
		<comments>http://onlinejournalismblog.com/2011/09/22/scraperwiki-now-makes-it-easier-to-ask-questions-of-data/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 20:09:34 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[scraperwiki]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15193</guid>
		<description><![CDATA[I was very excited recently to read on the Scraperwiki mailing list that the website was working on making it possible to create an RSS feed from a SQL query. Yes, that&#8217;s the sort of thing that gets me excited these days. But before you reach for a blunt object to knock some sense into me, allow me to explain&#8230;<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/09/22/scraperwiki-now-makes-it-easier-to-ask-questions-of-data/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F09%2F22%2Fscraperwiki-now-makes-it-easier-to-ask-questions-of-data%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F09_2F22_2Fscraperwiki-now-makes-it-easier-to-ask-questions-of-data_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F09%2F22%2Fscraperwiki-now-makes-it-easier-to-ask-questions-of-data%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div class="wp-caption alignnone" style="width: 410px"><img src="https://si0.twimg.com/profile_images/1540150290/b91e_nom_nom_nom_lunchbag_closeup.jpg" alt="EatSafeWalsall" width="400" height="264" /><p class="wp-caption-text">Image from @EatSafeWalsall</p></div>
<p>I was very excited recently to read on the Scraperwiki mailing list that the website was working on making it possible to create an RSS feed from a SQL query.</p>
<p>Yes, that&#8217;s the sort of thing that gets me excited these days.</p>
<p>But before you reach for a blunt object to knock some sense into me, allow me to explain&#8230;</p>
<p>Scraperwiki has, until now, done very well at trying to make it easier to get hold of hard-to-reach data. It has done this in two ways: firstly by creating an environment which lowers the technical barrier to creating scrapers (these get hold of the data); and secondly by lowering the <em>social </em>barrier to creating scrapers (by hosting <a href="https://scraperwiki.com/request_data/" onclick="urchinTracker('/outgoing/scraperwiki.com/request_data/?referer=');">a space where journalists can ask developers for help in writing scrapers</a>).</p>
<p>This move, however, does something different.<span id="more-15193"></span></p>
<p>It allows you to ask questions &#8211; of any dataset on the site. Not only that, but it allows you to receive updates as those answers change. And those updates come in an RSS feed, which opens up all sorts of possibilities around automatically publishing those answers.</p>
<p><a href="http://blog.scraperwiki.com/2011/09/21/make-rss-with-an-sql-query/" onclick="urchinTracker('/outgoing/blog.scraperwiki.com/2011/09/21/make-rss-with-an-sql-query/?referer=');">The blog post explaining the development</a> already has a couple of examples of this in practice:</p>
<p>Anna, for example, has scraped data on alcohol licence applications. The new feature not only allows her to get a constant update of new applications in her RSS reader &#8211; but you could also <a href="https://scraperwiki.com/docs/api?name=islington_business_licences#sqlite" onclick="urchinTracker('/outgoing/scraperwiki.com/docs/api?name=islington_business_licences_sqlite&amp;referer=');">customise that feed</a> to tell you about licence applications on a particular street, or from a particular applicant, and so on.</p>
<p>You will need to know some SQL, which is widely used in data journalism &#8211; particularly in the US &#8211; but it&#8217;s pretty simple to learn, because as a query language, it is designed to ask questions like &#8216;Select all the applications from that dataset where the application is of this status and the applicant has this name&#8217;.</p>
<p>And because RSS is so flexible, Stuart can use the same technology to publish live updates on restaurant inspections to <a href="https://twitter.com/#!/eatsafewalsall" onclick="urchinTracker('/outgoing/twitter.com/_/eatsafewalsall?referer=');">@EatSafeWalsall</a> (it could also feed a widget on a blog or website, or a map, a Facebook page, or an email newsletter).</p>
<p>So you can put that blunt object away. This makes Scraperwiki useful in wholly new ways: asking questions, and publishing and distributing the results, automatically.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F09%2F22%2Fscraperwiki-now-makes-it-easier-to-ask-questions-of-data%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/09/22/scraperwiki-now-makes-it-easier-to-ask-questions-of-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to use the CableSearch API to quickly reference names against Wikileaks cables (SFTW)</title>
		<link>http://onlinejournalismblog.com/2011/09/09/how-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables/</link>
		<comments>http://onlinejournalismblog.com/2011/09/09/how-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables/#comments</comments>
		<pubDate>Fri, 09 Sep 2011 12:33:25 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[cables]]></category>
		<category><![CDATA[cablesearch]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[grel]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[Something for the weekend]]></category>
		<category><![CDATA[Wikileaks]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15134</guid>
		<description><![CDATA[CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it&#8217;s been around for some time, I&#8217;ve only just noticed the site&#8217;s API, so I thought I&#8217;d show how such an API can be useful as a<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/09/09/how-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F09%2F09%2Fhow-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F09_2F09_2Fhow-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F09%2F09%2Fhow-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><img src="http://searchengineland.com/figz/wp-content/seloads/2010/12/logo-cablesearch.png-PNG-Image-473x172-pixels-300x145.jpg" alt="Cablesearch logo" /></p>
<p><a href="http://cablesearch.org/" onclick="urchinTracker('/outgoing/cablesearch.org/?referer=');">CableSearch</a> is a neat project by the European Centre for Computer Assisted Research and <a href="http://www.vvoj.nl/cms/vvoj-english/contact-us" onclick="urchinTracker('/outgoing/www.vvoj.nl/cms/vvoj-english/contact-us?referer=');">VVOJ</a> (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it&#8217;s been around for some time, I&#8217;ve only just noticed the site&#8217;s API, so I thought I&#8217;d show how such an API can be useful as a way to draw on such data sources to complement data of your own.<span id="more-15134"></span></p>
<h2>Example question: &#8220;How many Swedish party leaders are mentioned in the cables?&#8221;</h2>
<p>There&#8217;s no particular reason why I picked Sweden, but this is an exercise you could do with any list &#8211; MPs, cabinet members, organisational heads, etc.</p>
<p>First, you need to grab the list. I did so by <a href="http://excelnotes.posterous.com/scraping-a-table-from-a-webpage-using-importh" onclick="urchinTracker('/outgoing/excelnotes.posterous.com/scraping-a-table-from-a-webpage-using-importh?referer=');">using the =importHTML formula</a> on <a href="http://en.wikipedia.org/wiki/List_of_members_of_the_parliament_of_Sweden,_2010%E2%80%932014" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/List_of_members_of_the_parliament_of_Sweden_2010_E2_80_932014?referer=');">this Wikipedia page</a>. You would obviously need to check that. Alternatively, you could <a href="http://onlinejournalismblog.com/2011/07/29/sftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml/">use =importXML</a> on <a href="http://www.sweden.gov.se/sb/d/10893/a/109925" onclick="urchinTracker('/outgoing/www.sweden.gov.se/sb/d/10893/a/109925?referer=');">this official Swedish parliament page</a> for a list of ministers.</p>
<p>(I&#8217;m not going to repeat these processes as you can read how to do these by clicking through to the links explaining them above)</p>
<p><a href="https://docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEh3d21WYjF2S1gxNW1ZRGo5eC1qeGc&amp;hl=en_GB" onclick="urchinTracker('/outgoing/docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEh3d21WYjF2S1gxNW1ZRGo5eC1qeGc_amp_hl=en_GB&amp;referer=');">Here are the results</a>. As often happens with Wikipedia tables, the first row is shifted so the headings don&#8217;t quite match the columns below. As we only need a list of names we don&#8217;t have to correct that. (For the =importXML scrape, you&#8217;ll also encounter a problem with accented characters, but this will still be quicker to correct than if we were manually copying the list across)</p>
<p>Now download that spreadsheet as a CSV file, and open up Google Refine.</p>
<h2>Testing with the API</h2>
<p>I&#8217;ve previously explained <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/">how to use Google Refine with the APIs of Google Maps</a>, <a href="http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/">UK-Postcodes</a>, and <a href="http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/">They Work For You (UK politics)</a>.</p>
<p>The <a href="http://cablesearch.org/?page_id=242" onclick="urchinTracker('/outgoing/cablesearch.org/?page_id=242&amp;referer=');">CableSearch API page</a> is pretty straightforward if you&#8217;ve followed any of those &#8211; but it&#8217;s key that you test what results Google Refine provides against what you get from a manual search (and make sure you have a test that provides unusual results &#8211; in this case, anything less than 10 results).</p>
<p>In particular, testing reveals that your search term needs to first be formatted in a particular way to avoid you getting the wrong results.</p>
<h2>Formatting your data</h2>
<p>So in our data we have a list of names &#8211; but if we just run them through CableSearch we will get results where those names do not appear together. In other words, a search for John Jones will bring back results where <em>anyone </em>called John and<em> anyone</em> called Jones is mentioned.</p>
<p>The normal solution is to <strong>put quotation marks around the search term</strong>, to ensure that only results containing that exact phrase are returned, i.e. &#8220;John Jones&#8221;.</p>
<p>With an API where we are constructing a URL, however, that space can cause problems because a URL cannot contain a space. <strong>We need to replace it with a code for a space: %20</strong> (if you do a search for anything containing a space, you will notice that %20 will sometimes appear in the URL for the results in its place; at other times a + sign will replace the space)</p>
<p>So, here&#8217;s how to reformat the text accordingly:</p>
<ol>
<li>Click on the arrow at the top of your column of names, and select <strong>Edit Column &gt; Add column based on this column&#8230;</strong></li>
<li>In the window that appears type the following code: <strong>&#8216;&#8221;&#8216;+value.split(&#8221; &#8220;).join(&#8220;%20&#8243;)+&#8217;&#8221;&#8216;</strong></li>
<li>Give the column a name and click OK.</li>
</ol>
<p>The start and end may be difficult to see, so here it is with spaces in between:</p>
<p><strong>&#8216; &#8221; &#8216;</strong></p>
<p>You&#8217;ll see that it&#8217;s a single inverted comma followed by double inverted commas and a further single inverted comma. That adds double inverted commas at the start and end of our new data.</p>
<p>The rest of the code splits the original data wherever there is a space (&#8221; &#8220;) and joins the resulting fragments together with &#8220;%20&#8243;.</p>
<p>And so John Jones becomes &#8220;John%20Jones&#8221; &#8211; which will work in the API (one cell has 2 names, however, which you will need to clean up).</p>
<h2>Grabbing from the API</h2>
<p>Now that we have properly formatted text we can ask the CableSearch API for the information it has on each name. Here&#8217;s how:</p>
<ol>
<li>Click on the arrow at the top of the newly created column of formatted names, and select <strong>Edit Column &gt; Add column by fetching URLs</strong></li>
<li>In the window that appears type the following code: <strong><a>&#8220;http://cablesearch.org/cable/api/search?q=&#8221;+value</a></strong></li>
<li>Give the column a name and click OK.</li>
</ol>
<p>It will now go and fetch data for each name, which may take a few minutes (or more, depending how many names you have).</p>
<p>When it&#8217;s finished you should have a column of cells containing JSON data. It will be very hard to look at (<a href="http://onlinejournalismblog.com/2011/04/14/data-for-journalists-json-for-beginners/">more on how to read JSON here</a>) but that&#8217;s OK because we&#8217;re going to create a final column to extract the piece of data we want.</p>
<h2>Extracting from the JSON</h2>
<p>The process should be familiar by now:</p>
<ol>
<li>Click on the arrow at the top of the newly created column of formatted names, and select <strong>Edit Column &gt; Add column <strong>based on this column&#8230;</strong></strong></li>
<li>In the window that appears type the following code: <strong><a></a><a>value.parseJson().info.items</a></strong></li>
<li>Give the column a name and click OK.</li>
</ol>
<p>This will create a new column which just tells you how many results there are for each name. Where it says &#8217;10&#8242; there are probably more (that&#8217;s the maximum value &#8211; sadly the API doesn&#8217;t return any information on total records, although <a href="http://cablesearch.org/?page_id=242" onclick="urchinTracker('/outgoing/cablesearch.org/?page_id=242&amp;referer=');">the API page</a> details one way you can continue to cycle through pages of results beyond the first 10).</p>
<p>This enables you to take a list of names and quickly find out which ones are mentioned in the cables at all, and which ones have been mentioned just a few times &#8211; saving you lots of searches, and time, and allowing you to narrow the focus of your work.</p>
<p>A more powerful API would allow you to narrow your focus further: by date range, for example, or source, urgency or classification. The broader point is: this is why APIs are useful. Knowing how to use them (and <a href="http://www.programmableweb.com/apis/directory/1?sort=date&amp;pagesize=25" onclick="urchinTracker('/outgoing/www.programmableweb.com/apis/directory/1?sort=date_amp_pagesize=25&amp;referer=');">which ones there are</a>) simply gives you another way to do a job better.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F09%2F09%2Fhow-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/09/09/how-to-use-the-cablesearch-api-to-quickly-reference-names-against-wikileaks-cables/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>How to: convert easting/northing into lat/long for an interactive map</title>
		<link>http://onlinejournalismblog.com/2011/08/12/how-to-convert-eastingnorthing-into-latlong-for-an-interactive-map/</link>
		<comments>http://onlinejournalismblog.com/2011/08/12/how-to-convert-eastingnorthing-into-latlong-for-an-interactive-map/#comments</comments>
		<pubDate>Fri, 12 Aug 2011 08:27:07 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[conversion]]></category>
		<category><![CDATA[easting]]></category>
		<category><![CDATA[fusion tables]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[latitude]]></category>
		<category><![CDATA[longitude]]></category>
		<category><![CDATA[northing]]></category>
		<category><![CDATA[speed cameras]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14794</guid>
		<description><![CDATA[Google Fusion Tables is great for creating interactive maps from a spreadsheet &#8211; but it isn&#8217;t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things &#8211; for example, speed cameras. So you&#8217;ll need a way to convert easting and northing<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/08/12/how-to-convert-eastingnorthing-into-latlong-for-an-interactive-map/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F12%2Fhow-to-convert-eastingnorthing-into-latlong-for-an-interactive-map%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F08_2F12_2Fhow-to-convert-eastingnorthing-into-latlong-for-an-interactive-map_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F12%2Fhow-to-convert-eastingnorthing-into-latlong-for-an-interactive-map%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div id="attachment_14984" class="wp-caption alignnone" style="width: 422px"><a href="http://onlinejournalismblog.com/wp-content/uploads/2011/07/Picture-42.png"><img class="size-full wp-image-14984 " title="A map generated in Google Fusion Tables from a geocoded dataset" src="http://onlinejournalismblog.com/wp-content/uploads/2011/07/Picture-42.png" alt="A map generated in Google Fusion Tables from a geocoded dataset" width="412" height="254" /></a><p class="wp-caption-text">A map generated in Google Fusion Tables from a dataset cleaned using these methods</p></div>
<p>Google Fusion Tables is great for creating interactive maps from a spreadsheet &#8211; but it isn&#8217;t too keen on <a href="http://en.wikipedia.org/wiki/Easting_and_northing" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/Easting_and_northing?referer=');">easting and northing</a>. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things &#8211; for example, <a href="http://www.google.co.uk/search?sourceid=chrome&amp;ie=UTF-8&amp;q=easting+northing+speed+cameras" onclick="urchinTracker('/outgoing/www.google.co.uk/search?sourceid=chrome_amp_ie=UTF-8_amp_q=easting+northing+speed+cameras&amp;referer=');">speed cameras</a>.</p>
<p>So you&#8217;ll need a way to convert easting and northing into something that Fusion Tables does like &#8211; such as latitude and longitude.</p>
<p>Here&#8217;s how I did it &#8211; quickly.<span id="more-14794"></span></p>
<h2>Find an API to do the work for you</h2>
<p>The first thing I needed was an online tool that will do the conversions. <a href="http://www.nearby.org.uk/" onclick="urchinTracker('/outgoing/www.nearby.org.uk/?referer=');">Nearby.org.uk</a> is pretty useful for doing so manually &#8211; and <a href="http://www.nearby.org.uk/api/convert-help.php" onclick="urchinTracker('/outgoing/www.nearby.org.uk/api/convert-help.php?referer=');">there&#8217;s an API as well</a> &#8211; but I wanted something that would give me a nice JSON feed for Google Refine.</p>
<p>So I asked Twitter.</p>
<p>This is where being a part of <a href="http://onlinejournalismblog.com/2011/04/01/communities-of-practice-teaching-students-to-learn-in-networks/">communities of practice</a> is important for journalists. (Samuel Johnson once said that there are <a href="http://www.samueljohnson.com/twokinds.html" onclick="urchinTracker('/outgoing/www.samueljohnson.com/twokinds.html?referer=');">two types of knowledge</a>: &#8220;We know a subject ourselves, or we know where we can find information upon it.&#8221; Those communities are an example of the latter).</p>
<p>Stuart Harrison very helpfully said he would adapt <a href="http://www.uk-postcodes.com/api.php" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/api.php?referer=');">his postcodes API</a> to convert easting and northing &#8211; and within an hour <a href="http://www.uk-postcodes.com/eastingnorthing.php" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/eastingnorthing.php?referer=');">it was ready</a>.</p>
<h2>Using Google Refine to work with the API</h2>
<p>The API works by generating information in JSON format based on a URL (<a href="http://onlinejournalismblog.com/2011/04/14/data-for-journalists-json-for-beginners/">I explain JSON in this post</a>).</p>
<p>For example, the following URL generates a page of JSON with the latitude and longitude for easting 492412, northing 329757:</p>
<p><a href="http://www.uk-postcodes.com/eastingnorthing.php?easting=492412&amp;northing=329757" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/eastingnorthing.php?easting=492412_amp_northing=329757&amp;referer=');">http://www.uk-postcodes.com/eastingnorthing.php?easting=492412&amp;northing=329757</a></p>
<p>I know that <a href="http://code.google.com/p/google-refine/wiki/Downloads?tm=2" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/Downloads?tm=2&amp;referer=');">Google Refine</a> will be able to use that JSON to extract the latitude and longitude for dozens of rows with different values and add them to the spreadsheet (<a href="http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/">here&#8217;s a post explaining that</a> in more detail).</p>
<p>So here&#8217;s what I do:</p>
<h2>Generating the end bit of the URLs</h2>
<p>I need a new column in my spreadsheet that fetches information from those URLs &#8211; there are a couple of ways of doing this but I&#8217;m going to show the simplest way for a beginner (rather than the simplest method programmatically*)</p>
<p>This involves creating a new column which conveniently puts together the end part of the URL that I&#8217;ll be calling: in this case easting=492412&amp;northing=329757 (where the numbers change in each cell).</p>
<ol>
<li>Click on the drop-down arrow at the top of the Easting column and select <strong>Edit column &gt; Add column based on this column&#8230;</strong></li>
<li>In the window that appears type the following GREL (Google Refine Expression Language): <strong>&#8220;easting=&#8221;+cells["Easting"].value+&#8221;&amp;northing=&#8221;+cells["Northing"].value</strong></li>
<li>This assumes that the column with the easting values is called &#8216;Easting&#8217; (note the capital E) and the northing column is called &#8216;Northing&#8217;. Change these to the names of your columns if they&#8217;re different.</li>
<li>Give the new column a name in the box at the top and save it. You should see a new column appear, populated with values like easting=492412&amp;northing=329757 &#8211; in each cell the process is simply writing a string of characters that begins with easting=, then adds the value in the cell within the &#8216;Easting&#8217; column, adds &amp;northing=, then adds the value in the cell within the &#8216;Northing&#8217; column.</li>
</ol>
<p>These are the second parts of the URLs we&#8217;re going to fetch lat-long values from.</p>
<h2>Fetching data from those URLs</h2>
<p>At the top of this new column, then:</p>
<ol>
<li>Click on the drop-down arrow of your newest column and select <strong>Edit column &gt; Add column by fetching URLs&#8230;</strong></li>
<li>In the window that appears type the following GREL (Google Refine Expression Language): <strong>&#8220;http://www.uk-postcodes.com/eastingnorthing.php?&#8221;+value<br />
</strong></li>
<li>As you can see, this simply looks at a URL that begins http://www.uk-postcodes.com/eastingnorthing.php? and ends with the value in each cell of the column selected. It will then populate a new column of cells with the JSON returned by each different URL.</li>
<li>Give the new column a name in the box at the top and save it. You should again see a new column appear &#8211; but this will take longer, because it is going to that website and gathering information. Make a cup of tea.</li>
</ol>
<h2>Extracting the latitude and longitude into separate cells</h2>
<p>Great &#8211; now we have the lat-long values for each row. But to visualise this data we need separate columns for latitude and longitude, so this is how we get that out of the JSON.</p>
<ol>
<li>Click on the drop-down arrow and select <strong>Edit column &gt; Add column <strong>based on this column&#8230;</strong></strong></li>
<li>In the window that appears type the following GREL (Google Refine Expression Language): <strong>value.parseJson()lng<br />
</strong></li>
<li>This will look at the value of each cell, and pull out the bit after &#8220;lng&#8221; and populate a new column of cells with each value</li>
<li>Give the new column a name in the box at the top (e.g. longitude) and save it.</li>
<li>Repeat the process for latitude &#8211; the GREL you need is <strong>value.parseJson()lat</strong></li>
</ol>
<p>You should now have a spreadsheet of data that includes latitude and longitude for each row. Click on <strong>Export</strong> in the upper right corner and select <strong>Comma-separated value</strong>.</p>
<h2>Visualise it in Fusion Tables</h2>
<p>Go to <a href="http://www.google.com/fusiontables/Home" onclick="urchinTracker('/outgoing/www.google.com/fusiontables/Home?referer=');">Google Fusion Tables</a> and upload that file. Then open it. Click on <strong>Visualize </strong>and you should have a map option. Once visualised you can embed it elsewhere by clicking on <strong>Get embeddable link</strong>.</p>
<p>For <a href="http://helpmeinvestigate.com/transport/speedcameras/" onclick="urchinTracker('/outgoing/helpmeinvestigate.com/transport/speedcameras/?referer=');">an example of how that embed code looks on a page, here is one I prepared earlier</a>. (And <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdHF4T2FOX00zaUJ2TEhTNEE0QXNDcXc&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdHF4T2FOX00zaUJ2TEhTNEE0QXNDcXc_amp_hl=en_GB&amp;referer=');">here is the data it is pulling from</a>).</p>
<p><em>*The simpler way programmatically is to go straight to &#8216;Fetching data from URLs&#8217; and use the following GREL code:</em></p>
<p><strong>&#8220;http://www.uk-postcodes.com/eastingnorthing.php?easting=&#8221;+cells["Easting"].value+&#8221;&amp;northing=&#8221;+cells["Northing"].value</strong></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F12%2Fhow-to-convert-eastingnorthing-into-latlong-for-an-interactive-map%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/08/12/how-to-convert-eastingnorthing-into-latlong-for-an-interactive-map/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SFTW: How to grab useful political data with the They Work For You API</title>
		<link>http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/</link>
		<comments>http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 08:35:47 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[constituencies]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[grel]]></category>
		<category><![CDATA[guardian api]]></category>
		<category><![CDATA[Politics]]></category>
		<category><![CDATA[Something for the weekend]]></category>
		<category><![CDATA[they work for you]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14930</guid>
		<description><![CDATA[It&#8217;s been over 2 years since I stopped doing the &#8216;Something for the Weekend&#8217; series. I thought I would revive it with a tutorial on They Work For You and Google Refine&#8230; If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F22%2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F07_2F22_2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F22%2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><img src="http://www.theyworkforyou.com/images/logo.png" alt="They Work For You" /></p>
<p><em>It&#8217;s been over 2 years since I stopped doing the &#8216;<a href="http://onlinejournalismblog.com/tag/something-for-the-weekend/">Something for the Weekend&#8217; series</a>. I thought I would revive it with a tutorial on They Work For You and Google Refine&#8230;<br />
</em><br />
If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs for those constituencies – the <a href="http://www.theyworkforyou.com/api/" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/?referer=');">They Work For You API</a> can save you hours of fiddling &#8211; if you know how to use it.</p>
<p>An API is – for the purposes of journalists – a way of asking questions for reams of data. For example, you can use an API to ask “What constituency is each of these postcodes in?” or “When did these politicians enter office?” or even “Can you show me an image of these people?”</p>
<p>The They Work For You API will give answers to a range of UK political questions on subjects including Lords, MLAs (Members of the Legislative Assembly in Northern Ireland), MPs, MSPs (Members of the Scottish Parliament), select committees, debates, written answers, statements and constituencies.</p>
<p>When you combine that API with <strong>Google Refine</strong> you can fill a whole spreadsheet with additional political data, allowing you to answer questions you might otherwise not be able to.</p>
<p>I’ve written before on <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/">how to use Google Refine to pull data into a spreadsheet from the Google Maps API</a> and <a href="http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/">the UK Postcodes API</a>, but this post takes things a bit further because the They Work For You API requires something called a ‘key’. This is quite common with APIs so knowing how to use them is &#8211; well &#8211; <em>key</em>. If you need extra help, try those tutorials first.<span id="more-14930"></span></p>
<h2>The They Work For You API key</h2>
<p>Unlike the previous APIs I’ve written about, the They Work For You API requires you to register for a ‘key’ to use it. If you don’t understand how this works the <a href="http://www.theyworkforyou.com/api/" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/?referer=');">instructions on the TWFY website</a> can be a little confusing. So here’s how it works:</p>
<p>The key is a password of sorts, used when you ask the API a question.</p>
<p>As your ‘question’ takes the form of a web address (URL) then that key needs to be included at a particular part of that URL.</p>
<p>You’ll see how that works when we get to asking the URL questions. But first, go to <a href="http://www.theyworkforyou.com/api/key" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/key?referer=');">http://www.theyworkforyou.com/api/key </a>to get a key.</p>
<p>Got it? OK, now copy it into a text document – or just keep this window open. You’ll need to paste it later.</p>
<h2>Using the TWFY key</h2>
<p>The API has a number of pre-set questions, called ‘functions’. These are listed in the right hand column, and include <a href="http://www.theyworkforyou.com/api/docs/getMPs" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getMPs?referer=');">getMPs</a>, <a href="http://www.theyworkforyou.com/api/docs/getLord" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getLord?referer=');">getLord</a>, <a href="http://www.theyworkforyou.com/api/docs/getDebates" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getDebates?referer=');">getDebates</a> and so on. If you click on any of these you will be given information on how they work, and you can also test the function with the ‘Explorer’.</p>
<p>To demonstrate how to use these functions, <a href="http://www.theyworkforyou.com/api/docs/getConstituency" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getConstituency?referer=');">click on getConstituency</a>.</p>
<p>If you use the ‘Explorer’ to test it (in this case with &#8216;Edinburgh South&#8221;) you will be shown a bunch of results at a URL like this:</p>
<p><strong>http://www.theyworkforyou.com/api/docs/getConstituency?name=edinburgh+south&amp;postcode=&amp;output=js#output</strong></p>
<p>Now you could manually use the Explorer to get information for each of the cells in a spreadsheet, but it&#8217;s much, much quicker to use the API to automate the process instead.</p>
<p>On that front the Explorer can be a little misleading. Because although it shows you the information you might get from the API, this is not the URL that you will need.</p>
<p>The URL you really need is shown above the results, and below the word ‘<strong><em>Output</em></strong>’ like so:</p>
<p><strong>http://www.theyworkforyou.com/api/getConstituency?name=edinburgh+south&amp;output=js</strong></p>
<p>If you copy and paste that URL into your browser you will get the following warning:</p>
<p><strong>{</strong><br />
<strong> error: &#8220;No API key provided. Please see http://www.theyworkforyou.com/api/key for more information.&#8221;</strong><br />
<strong> }</strong></p>
<p>So now we need that key.</p>
<h2>Using your key</h2>
<p>Assuming you still have your API key copied somewhere, or still open in another window, you can find instructions on how to use it at <a href="http://www.theyworkforyou.com/api/" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/?referer=');">http://www.theyworkforyou.com/api/</a></p>
<p>Here you are told to use the key as part of the following structure:</p>
<p><strong>http://www.theyworkforyou.com/api/function?key=key&amp;output=output&amp;other_variables</strong></p>
<p>The important bit is where it says <strong>key=key&amp;</strong></p>
<p>That is where you need to add your own key, so that that part of the URL looks <em>something </em>like</p>
<p><strong>key=aTh0jklerJaHui7&amp;</strong></p>
<p>(where that random assortment of characters is your key, copied earlier, followed by the <strong>&amp;</strong> sign)</p>
<p>Going back for a moment to the URL that wasn’t working without a key, we can see that it can be split into two parts:</p>
<p><strong><strong>http://www.theyworkforyou.com/api/getConstituency?</strong><br />
</strong></p>
<p><em>and</em></p>
<p><strong><strong><strong>name=edinburgh+south&amp;output=js</strong></strong><br />
</strong></p>
<p>Adding in the <em>key</em> in the middle makes up a <em>third</em> part, like so:</p>
<p><strong><strong>http://www.theyworkforyou.com/api/getConstituency?</strong><br />
</strong></p>
<p><em>and</em></p>
<p><strong><strong><strong>key=key&amp;</strong></strong></strong></p>
<p><em>and</em></p>
<p><strong><strong><strong>name=edinburgh+south&amp;output=js</strong></strong></strong></p>
<p>So, you now need to <em>edit the output URL to include your API key</em>. It should then look something like this:</p>
<p>http://www.theyworkforyou.com/api/getConstituency?<strong>key=AHdajHUShajshaJ&#038;</strong>name=edinburgh+south&#038;output=js</p>
<p><em>UPDATE: Matthew Somerville points out that the key can be used anywhere after the ? so you can tag it on the end if that&#8217;s easier.</em></p>
<h2>The URL broken down further</h2>
<p>Just to clarify, these are the parts:</p>
<p><strong>http://www.theyworkforyou.com/</strong></p>
<p>(The website hosting the API)</p>
<p><strong>api/</strong></p>
<p>(The API)</p>
<p><strong>getConstituency?</strong></p>
<p>(The function – or question being asked)</p>
<p><strong>key=AHdajHUShajshaJ</strong></p>
<p>(Our API key – or password)</p>
<p><strong>&amp;name=edinburgh+south</strong></p>
<p>(and the constituency name that we are asking the API for information on)</p>
<p><strong>&amp;output=js</strong></p>
<p>(and the format we want the answer in &#8211; JSON, in this case)</p>
<p>You should now get a page of JSON code giving data for the question. If your browser doesn&#8217;t display it particularly well, try Chrome or Firefox.</p>
<h2>Using with Google Refine to get a bunch of results</h2>
<p>Great. But we could get one result by using the ‘Explorer’, so why did we need to do all that? Because we can now use Google Refine to automate the process of asking the same question hundreds of times.</p>
<p>To demonstrate this, <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDA3V010RUlqTjhYalN6ejh0T2ZGN0E&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDA3V010RUlqTjhYalN6ejh0T2ZGN0E_amp_hl=en_GB&amp;referer=');">here&#8217;s a spreadsheet with 4 constituencies</a>. Open it, and select <strong>File &gt; Download as&#8230; &gt; CSV </strong></p>
<p>Open Google Refine (<a href="http://code.google.com/p/google-refine/wiki/Downloads?tm=2" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/Downloads?tm=2&amp;referer=');">download here</a>) and create a new project with that spreadsheet. Create a new column from the one you have by clicking on the arrow at the top of the column and selecting <strong>Edit Column &gt; Add Column by fetching URLs</strong></p>
<p>In the window that appears adapt the following piece of Google Refine Expression Language (GREL) with your own API key (shown in bold):</p>
<div id="left-panel">
<div>
<div id="refine-tabs-history">
<div>
<div><a>&#8220;http://www.theyworkforyou.com/api/getConstituency?<strong>key=Gr7jUUlKdhB3fsihFnHzab&amp;</strong>name=&#8221;+value+&#8221;&amp;output=js&#8221;</a></div>
</div>
<div>This generates a URL in each cell based on the value of the original column: the start and end of the URL are in quotation marks; the value is inserted in the middle where it says +value+</div>
<div><strong> </strong></div>
</div>
</div>
</div>
<p>(NOTE: Avoid copying and pasting as quotation marks may cause you problems. Instead try typing it in yourself &#8211; this also helps you remember things) This generates a URL in each cell based on the value of the original column: the start and end of the URL are in quotation marks; the value is inserted in the middle where it says<strong> +value+</strong></p>
<p>Give the column a name and click <strong>OK</strong>. It will now run &#8211; this test example only has 4 rows so you can see the results quickly.</p>
<p>You&#8217;ll see that only one row has actually worked &#8211; Tatton. The others have failed. Why? Because they have more than one word.</p>
<p>Take another look at that URL that the API returned earlier with the test of Edinburgh South:</p>
<p>http://www.theyworkforyou.com/api/getConstituency?key=AHdajHUShajshaJ&#038;<strong>name=edinburgh+south</strong>&#038;output=js</p>
<p>When a constituency has two words the space between them is represented by a plus sign &#8211; so we need to format our data in the same way for it to work.</p>
<h2>Formatting data for the API</h2>
<p>You could use Find and Replace in Excel to replace all spaces in that column with a plus sign but you will still hit problems with unusual constituency names. But this is how to do it in Google Refine:</p>
<p><del>Click on the arrow at the top of the constituency column and selecting <strong>Edit Column &gt; Add column based on this column&#8230;</strong></del></p>
<p><del> </del></p>
<p><del>In the window that appears type the following GREL:</del></p>
<p><del> </del></p>
<p>value.split(&#8221; &#8220;).join(&#8220;+&#8221;)</p>
<p>To explain:</p>
<p><em>&#8216;Value&#8217; is the value in each cell.</em></p>
<p><em>&#8216;.split(&#8221; &#8220;)&#8217; splits each value where there is a space (&#8221; &#8220;).</em></p>
<p>&nbsp;</p>
<p><del><em>&#8216;.join(&#8220;+&#8221;) then joins the resulting items together, with a plus sign.</em></del></p>
<p><del>Give it a name and click <strong>OK</strong>. You&#8217;ll see a new column with plus signs replacing the spaces. </del><em>[see comment from Matthew Somerville for explanation]</em></p>
<p>Create a new column from the one you have by clicking on the arrow at the top of the column and selecting <strong>Edit Column &gt; Add Column by fetching URLs</strong></p>
<p>In the window that appears adapt the following piece of Google Refine Expression Language (GREL) with your own API key (shown in bold):</p>
<p>&#8220;http://www.theyworkforyou.com/api/getConstituency?name=&#8221; + escape(value, &#8220;url&#8221;) + &#8220;<strong>&amp;<strong>key=Gr7jUUlKdhB3fsihFnHzab&amp;</strong></strong>output=js&#8221;</p>
<p><strong> </strong></p>
<p>The key part here is between the + signs. Whereas before we simply inserted the value of each cell, here we <em>escape</em> that value at the same time so that it will work in a URL.</p>
<p>This will change Edinburgh South to &#8220;edinburgh+south&#8221; but also Normanton, Pontefract and Castleford to &#8221;Normanton%2C+Pontefract+and+Castleford&#8221; and any other unforeseen characters in similar ways.</p>
<p>Give this new column a name, click <strong>OK</strong> and watch your new column populate itself with the JSON from each URL.</p>
<h2>Creating new columns from the JSON</h2>
<p>Now we can populate new columns with data taken from that JSON as follows:</p>
<p>Click on the arrow at the top of the <em>new </em>JSON column and select Edit Column &gt; Add column based on this column&#8230;</p>
<p>Type this GREL:</p>
<p>value.parseJson().bbc_constituency_id</p>
<p><em>(This looks in the JSON in each cell and pulls out the bit after bbc_constituency_id <img src='http://onlinejournalismblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em> And click OK.</p>
<p>Repeat the process for further columns as follows:</p>
<p><a>value.parseJson().guardian_election_results</a></p>
<p><a>value.parseJson().pa_id</a></p>
<p><a>value.parseJson().guardian_id</a></p>
<h2>Going further</h2>
<p>That&#8217;s just a demonstration of how to use a small part of the They Work For You API &#8211; there are lots of other functions that you can use to get other information. Have a play with those.</p>
<p>Meanwhile, what about those IDs? Well, the Guardian ID <a href="http://www.guardian.co.uk/open-platform/politics-api/getting-started" onclick="urchinTracker('/outgoing/www.guardian.co.uk/open-platform/politics-api/getting-started?referer=');">will allow you to play with The Guardian&#8217;s API</a> &#8211; which gives lots more information on each constituency. For an example see http://www.guardian.co.uk/politics/api/constituency/664/json</p>
<p>Based on that URL you can repeat the process above to grab more data.</p>
<p><em>Is this useful? Anything you can add? Or other data problems?</em></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F22%2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Getting full addresses for data from an FOI response (using APIs)</title>
		<link>http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/</link>
		<comments>http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 13:00:29 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[online journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[concatenate]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[foi]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[grel]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[sean mcgrath]]></category>
		<category><![CDATA[spreadsheets]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=13647</guid>
		<description><![CDATA[Here&#8217;s an example of how APIs can be useful to journalists when they need to combine two sets of data. I recently spoke to Lincoln investigative journalism student Sean McGrath who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic). He had spent 3 days cleaning<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F18%2Fgetting-full-addresses-for-school-data-in-an-foi-response%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F03_2F18_2Fgetting-full-addresses-for-school-data-in-an-foi-response_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F18%2Fgetting-full-addresses-for-school-data-in-an-foi-response%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a rel="attachment wp-att-13687" href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/heatfullcolour1-2/"><img class="aligncenter size-thumbnail wp-image-13687" src="http://http://ojb.journallocal.co.uk/files/2011/03/heatfullcolour11-400x426.jpg" alt="Heat Map" width="400" height="426" /></a></p>
<p>Here&#8217;s an example of how APIs can be useful to journalists when they need to combine two sets of data.</p>
<p>I recently spoke to Lincoln investigative journalism student <a href="http://www.seanmcgrath.co.uk/" onclick="urchinTracker('/outgoing/www.seanmcgrath.co.uk/?referer=');">Sean McGrath</a> who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic).</p>
<p>He had spent 3 days cleaning up the data and manually adding postcodes to it. This seemed a good example where using an API might cut down your work considerably, and so in this post I explain how you make a start on the same problem in less than an hour using Excel, Google Refine and the Google Maps API.</p>
<h2>Step 1: Get the data in the right format to work with an API</h2>
<p>APIs can do all sorts of things, but one of the things they do which is particularly useful for journalists is <em>answer questions</em>.<span id="more-13647"></span></p>
<p>If we give the <strong>Google Maps API</strong> an address, for example, it will give us all sorts of information in return, such as latitude and longitude, postcode, and so on (I&#8217;ll explain how to do this later in the post). That&#8217;s what we&#8217;re going to use here &#8211; but we might use other APIs or datasets instead.</p>
<p>Sean&#8217;s <a href="https://spreadsheets.google.com/pub?hl=en_GB&amp;hl=en_GB&amp;key=0ApTo6f5Yj1iJdDkzS3NHRl9rQ0Y5ODNhQzZSS1R5Rmc&amp;output=html" onclick="urchinTracker('/outgoing/spreadsheets.google.com/pub?hl=en_GB_amp_hl=en_GB_amp_key=0ApTo6f5Yj1iJdDkzS3NHRl9rQ0Y5ODNhQzZSS1R5Rmc_amp_output=html&amp;referer=');">spreadsheet</a> had one column for school names and another for its town or city &#8211; but we needed those details to be together so we had a complete &#8216;address&#8217;. In order to do that we needed to open the spreadsheet in Excel and create a new column that combined the two.</p>
<p>I created the new column on the left (column A) and typed the following into cell A3:</p>
<pre>=CONCATENATE(B3, ", ", I3)</pre>
<p>This copies the value in cell B3, puts a comma and space after it (&#8220;, &#8220;), and then copies whatever is in I3. In other words, it combines the two cells to create a full address.</p>
<p>To copy the formula down the whole spreadsheet for all the other rows I used my favourite ever shortcut: hold down CTRL and clicked on the + in the bottom right corner of that cell.</p>
<p>Now it&#8217;s ready to use in Google Refine.</p>
<h2>Step 2: Using Google Refine to ask Google Maps API a question</h2>
<p>Now open the spreadsheet in Google Refine.</p>
<p>In Refine click on the arrow at the top of column A (the one you created using =CONCATENATE) and select <strong>Edit Column &gt; Add Column by fetching URLs</strong></p>
<p>A new window will appear with a code box. Type this:</p>
<pre>"http://maps.googleapis.com/maps/api/geocode/json?sensor=false&amp;address=" + escape(value, "url")</pre>
<p>That basically creates a URL by adding the address in column A (&#8216;value&#8217;) to the Google Maps API URL. The URL itself is basically the spreadsheet &#8216;asking&#8217; the Google Maps API to give it all the information it has about the address &#8211; it also asks it to provide that information in a format called JSON (note &#8216;json&#8217; in the URL)</p>
<p>You can see all this being done in <a href="http://www.youtube.com/watch?v=m5ER2qRH1OQ" onclick="urchinTracker('/outgoing/www.youtube.com/watch?v=m5ER2qRH1OQ&amp;referer=');">Google Refine&#8217;s own video</a>:</p>
<p><iframe width="600" height="338" src="http://www.youtube.com/embed/5tsyz3ibYzk?fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>Give your new column a name, and click OK. You&#8217;ll see that the new column contains a raft of code &#8211; JSON &#8211; about each school. This contains all that geographical information &#8211; but we still need to extract an address from that.*</p>
<h2>Step 3: Using Google Refine to extract the address from the Google Maps data</h2>
<p>Create a further column based on the one you&#8217;ve just created by clicking the arrow at the top and selecting <strong>Edit Column &gt; Add Column based on this column</strong></p>
<p>We need to write some more code. This took a bit of trial and error but here&#8217;s what I ended up with:</p>
<pre>value.parseJson().results[0].formatted_address</pre>
<p>&#8216;value&#8217; is the value in the column we&#8217;re basing this new column on. parseJson looks through the JSON code. If you look in it you&#8217;ll see there&#8217;s a bit called &#8216;results&#8217;, and within that a bit called &#8216;formatted address&#8217; which has what we need.</p>
<p>Now we have a new column with the full address &#8211; including postcode.</p>
<h2>Step 4: Using Excel to split the address up again</h2>
<p>We can now export this (<strong>Export</strong> is in the upper right corner of Google Refine) as a spreadsheet and open it up in Excel again.</p>
<p>To split that address column into its parts select it and then select <strong>Data &gt; Text to columns</strong> to split that address into separate items, with postcodes in their own. (There are other ways you could do this, for example extracting the last 5 characters of each cell instead).</p>
<p>Alternatively, you could get the postcode from the JSON directly, with a different line of code at Step 3 (if you work this out let me know) &#8211; or you could extract the lat/long as detailed in the video and use the Postcodes API at <a href="http://www.uk-postcodes.com/api.php" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/api.php?referer=');">http://www.uk-postcodes.com/api.php</a> to get the postcode from that. As always there are various ways to crack the nut.</p>
<p>Note: I haven&#8217;t &#8216;learned&#8217; JSON or GREL or any other language in this &#8211; just done a bit of searching (it took around an hour) to find the code that I needed and adapted it with educated guesswork.</p>
<p>*Problems 1 and 2: not all addresses return results from Google Maps because we haven&#8217;t given it enough detail. Also, there&#8217;s a 2500 limit on &#8216;free&#8217; calls to their API &#8211; and we have 5000+ records, so almost 2000 are returned &#8216;LIMIT EXCEEDED&#8217;. A possible solution to the latter would be to split this into 2 spreadsheets and then merge the results later. A possible solution to the former may be to find &#8211; or create by scraping &#8211; another dataset that has more address information (<a href="http://www.programmableweb.com/mashup/uk-schools" onclick="urchinTracker('/outgoing/www.programmableweb.com/mashup/uk-schools?referer=');">for example this one</a>).</p>
<p>FROM THE COMMENTS: <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/#comment-297261">Chip Oglesby suggests some other workarounds</a>, including doing it all in Refine and using the Yahoo Maps API for half of the calls.</p>
<p>UPDATE: Tony Hirst follows on from this and <a href="http://blog.ouseful.info/2011/04/08/a-first-quick-viz-of-uk-university-fees/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/04/08/a-first-quick-viz-of-uk-university-fees/?referer=');">finds other solutions to some of the problems outlined</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F18%2Fgetting-full-addresses-for-school-data-in-an-foi-response%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Signals of churnalism?</title>
		<link>http://onlinejournalismblog.com/2011/03/02/signals-of-churnalism/</link>
		<comments>http://onlinejournalismblog.com/2011/03/02/signals-of-churnalism/#comments</comments>
		<pubDate>Wed, 02 Mar 2011 13:24:47 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[andy williams]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[churnalism]]></category>
		<category><![CDATA[churnalism.com]]></category>
		<category><![CDATA[Jon Bounds]]></category>
		<category><![CDATA[media standards trust]]></category>
		<category><![CDATA[pr]]></category>
		<category><![CDATA[surveys]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=13263</guid>
		<description><![CDATA[On Friday I had quite a bit of fun with Churnalism.com, a new site from the Media Standards Trust which allows you to test how much of a particular press release has been reproduced verbatim by media outlets. The site has an API, which got me thinking whether you might be able to &#8216;mash&#8217; it with an RSS feed from<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/03/02/signals-of-churnalism/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F02%2Fsignals-of-churnalism%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F03_2F02_2Fsignals-of-churnalism_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F02%2Fsignals-of-churnalism%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div class="wp-caption alignnone" style="width: 450px"><a href="http://www.tomscott.com/warnings/" onclick="urchinTracker('/outgoing/www.tomscott.com/warnings/?referer=');"><img src="http://www.tomscott.com/warnings/warning-2.jpg" alt="Journalism warning labels" width="440" height="240" /></a><p class="wp-caption-text">Journalism warning labels by Tom Scott</p></div>
<p>On Friday I had quite a bit of fun with <a href="http://Churnalism.com" onclick="urchinTracker('/outgoing/Churnalism.com?referer=');">Churnalism.com</a>, a new site from the Media Standards Trust which allows you to test how much of a particular press release has been reproduced verbatim by media outlets.</p>
<p>The site has an API, which got me thinking whether you might be able to &#8216;mash&#8217; it with an RSS feed from Google News to check particular types of articles &#8211; and what &#8216;signals&#8217; you might use to choose those articles.</p>
<p>I started with that classic PR trick: the survey. A <a href="http://www.google.co.uk/search?sourceid=chrome&amp;ie=UTF-8&amp;q=%22a+survey+*+found%22#q=%22a+survey+*+found%22&amp;um=1&amp;ie=UTF-8&amp;tbo=u&amp;tbs=nws:1&amp;source=og&amp;sa=N&amp;hl=en&amp;tab=wn&amp;fp=4c118964355ea416" onclick="urchinTracker('/outgoing/www.google.co.uk/search?sourceid=chrome_amp_ie=UTF-8_amp_q=_22a+survey+_+found_22_q=_22a+survey+_+found_22_amp_um=1_amp_ie=UTF-8_amp_tbo=u_amp_tbs=nws_1_amp_source=og_amp_sa=N_amp_hl=en_amp_tab=wn_amp_fp=4c118964355ea416&amp;referer=');">search on Google News for &#8220;a survey * found&#8221;</a> (the * is a wildcard, meaning it can be anything) brings some interesting results to start investigating.</p>
<p>Jon Bounds added a favourite of his: <a href="http://thebounder.co.uk/blog/699/did-sucess-peak-in-2004/" onclick="urchinTracker('/outgoing/thebounder.co.uk/blog/699/did-sucess-peak-in-2004/?referer=');">&#8220;hailed a success&#8221;</a>.</p>
<p>And then it continued:<span id="more-13263"></span></p>
<ul>
<li>&#8220;Research commissioned by&#8221;</li>
<li>&#8220;A spokesperson said&#8221;</li>
<li>&#8220;Can increase your risk of&#8221; and &#8220;Can reduce your risk of&#8221;</li>
</ul>
<p>On Twitter, Andy Williams <a href="http://twitter.com/llantwit/status/41117464855052288" onclick="urchinTracker('/outgoing/twitter.com/llantwit/status/41117464855052288?referer=');">added the use of taxonomies of consumers</a> &#8211; although it was difficult to pin that down to a phrase. He also <a href="http://twitter.com/llantwit/status/41119995610148864" onclick="urchinTracker('/outgoing/twitter.com/llantwit/status/41119995610148864?referer=');">added &#8220;independent researchers</a>&#8221;</p>
<p>Contributors to the MySociety mailing list added:</p>
<ul>
<li>&#8220;Proud to announce&#8221;</li>
<li>&#8220;Today launches&#8221;</li>
<li>&#8220;Revolutionary new&#8221;</li>
<li>&#8220;It was revealed today&#8221; (Andy Mabbett)</li>
<li>&#8220;According to research&#8221;, &#8220;research published today&#8221; and &#8220;according to a new report&#8221;</li>
</ul>
<p>And of course there is &#8220;A press release said&#8221;.</p>
<h2>Signal &#8211; or sign?</h2>
<p>The idea kicked off a discussion on Twitter on whether certain phrases were signals of churnalism, or just journalistic cliches. The answer, of course, is both.</p>
<p>By brainstorming for &#8216;signals&#8217; I wasn&#8217;t arguing that <em>any</em> material using these phrases would be guilty of churnalism &#8211; or even the majority &#8211; just that they might be represent one way of narrowing your sample. Once you have a feed of <a href="http://www.google.co.uk/search?sourceid=chrome&amp;ie=UTF-8&amp;q=%22revolutionary+new%22#q=%22revolutionary+new%22&amp;um=1&amp;ie=UTF-8&amp;tbo=u&amp;tbs=nws:1&amp;source=og&amp;sa=N&amp;hl=en&amp;tab=wn&amp;fp=4c118964355ea416" onclick="urchinTracker('/outgoing/www.google.co.uk/search?sourceid=chrome_amp_ie=UTF-8_amp_q=_22revolutionary+new_22_q=_22revolutionary+new_22_amp_um=1_amp_ie=UTF-8_amp_tbo=u_amp_tbs=nws_1_amp_source=og_amp_sa=N_amp_hl=en_amp_tab=wn_amp_fp=4c118964355ea416&amp;referer=');">stories containing &#8220;Revolutionary new&#8221;</a> you can then use the API to test what proportion of those articles are identical to the text in a press release &#8211; or another news outlet.</p>
<p>The signal determines the sample, the API calculates the results.</p>
<p>Indeed, there&#8217;s an interesting research project to be done &#8211; perhaps using the Churnalism API &#8211; on whether the phrases above are more likely to contain passages copied wholesale from press releases, than a general feed of stories from Google News.</p>
<p>(Another research project might involve looking at press releases to identify common phrases used by press officers that might be used by the API)</p>
<p>You may have another opinion of course &#8211; or other phrases you might suggest?</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F02%2Fsignals-of-churnalism%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/03/02/signals-of-churnalism/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Investigations tool DocumentCloud goes public (PS: documents drive traffic)</title>
		<link>http://onlinejournalismblog.com/2011/01/26/investigations-tool-documentcloud-goes-public-ps-documents-drive-traffic/</link>
		<comments>http://onlinejournalismblog.com/2011/01/26/investigations-tool-documentcloud-goes-public-ps-documents-drive-traffic/#comments</comments>
		<pubDate>Wed, 26 Jan 2011 20:37:07 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[amanda hickman]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[DocumentCloud]]></category>
		<category><![CDATA[documents]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=12683</guid>
		<description><![CDATA[The rather lovely DocumentCloud &#8211; a tool that allows journalists to share, annotate, connect and organise documents &#8211; has finally emerged from its closet and made itself available to public searches. This means that anyone can now search the powerful database (some tips here) of newsworthy documents. If you want to add your own, however, you still need approval. If<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/01/26/investigations-tool-documentcloud-goes-public-ps-documents-drive-traffic/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F01%2F26%2Finvestigations-tool-documentcloud-goes-public-ps-documents-drive-traffic%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F01_2F26_2Finvestigations-tool-documentcloud-goes-public-ps-documents-drive-traffic_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F01%2F26%2Finvestigations-tool-documentcloud-goes-public-ps-documents-drive-traffic%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The rather lovely DocumentCloud &#8211; a tool that allows journalists to share, annotate, connect and organise documents &#8211; has finally <a href="http://blog.documentcloud.org/blog/2011/01/going-public/" onclick="urchinTracker('/outgoing/blog.documentcloud.org/blog/2011/01/going-public/?referer=');">emerged from its closet</a> and made itself available to public searches.</p>
<p>This means that anyone can now search the powerful database (<a href="http://www.documentcloud.org/help/searching" onclick="urchinTracker('/outgoing/www.documentcloud.org/help/searching?referer=');">some tips here</a>) of newsworthy documents. If you want to add your own, however, you still <a href="http://www.documentcloud.org/contact" onclick="urchinTracker('/outgoing/www.documentcloud.org/contact?referer=');">need approval</a>.</p>
<p>If you do end up on <a href="http://www.documentcloud.org/contributors" onclick="urchinTracker('/outgoing/www.documentcloud.org/contributors?referer=');">this list</a> you&#8217;ll find it&#8217;s quite a powerful tool, with quick conversion of PDFs into text files, analytic tools and semantic tagging (so you can connect all documents with a <a href="http://www.documentcloud.org/public/#search/organization%3A%20%22Federal%20Bureau%20of%20Investigation%22%20person%3A%20%22Barack%20Obama%22" onclick="urchinTracker('/outgoing/www.documentcloud.org/public/_search/organization_3A_20_22Federal_20Bureau_20of_20Investigation_22_20person_3A_20_22Barack_20Obama_22?referer=');">particular person</a>, or <a href="http://www.documentcloud.org/public/#search/organization%3A%20%22Federal%20Bureau%20of%20Investigation%22" onclick="urchinTracker('/outgoing/www.documentcloud.org/public/_search/organization_3A_20_22Federal_20Bureau_20of_20Investigation_22?referer=');">organisation</a>) among its best features. The site is <a href="http://www.documentcloud.org/opensource" onclick="urchinTracker('/outgoing/www.documentcloud.org/opensource?referer=');">open source</a> and has an <a href="http://www.documentcloud.org/help/api" onclick="urchinTracker('/outgoing/www.documentcloud.org/help/api?referer=');">API</a> too.</p>
<p>I asked Program Director <strong>Amanda B Hickman</strong> what she&#8217;s learned on the project so far. Her response suggests that documents have a particular appeal for online readers:</p>
<blockquote><p>&#8220;If we&#8217;ve learned anything, it is that people really love documents. It is pretty clear that when there&#8217;s something interesting going on in the news, plenty of people want to dig a little deeper. When Arizona Republic posted an annotated version of that state&#8217;s new immigration law, it got more traffic than their weekly entertainment round up. WNYC told us that <a href="http://www.wnyc.org/articles/wnyc-news/2011/jan/20/indictments-organized-crime-sweep" onclick="urchinTracker('/outgoing/www.wnyc.org/articles/wnyc-news/2011/jan/20/indictments-organized-crime-sweep?referer=');">the page listing the indictments in last week&#8217;s mob roundup</a> was still getting more traffic than any other single news story even a week later.</p>
<p>&#8220;These were big news documents, to be sure, but it still seems pretty clear that people do want to dig deeper and explore the documents behind the news, which is great for us and great for news.&#8221;</p></blockquote>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F01%2F26%2Finvestigations-tool-documentcloud-goes-public-ps-documents-drive-traffic%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/01/26/investigations-tool-documentcloud-goes-public-ps-documents-drive-traffic/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Adding geographical information to a spreadsheet based on postcodes – Google Refine and APIs</title>
		<link>http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/</link>
		<comments>http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 08:59:00 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[postcodes]]></category>
		<category><![CDATA[spreadsheets]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=12093</guid>
		<description><![CDATA[If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week &#8211; and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F12%2F16%2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F12_2F16_2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F12%2F16%2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week &#8211; and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.</p>
<p>You can <a href="http://www.screencast.com/users/paulbradshaw/folders/Jing/media/b2c5c0d1-21ce-40a0-ad7a-1f67bba7d2e1" onclick="urchinTracker('/outgoing/www.screencast.com/users/paulbradshaw/folders/Jing/media/b2c5c0d1-21ce-40a0-ad7a-1f67bba7d2e1?referer=');">watch a video tutorial of this here</a>.<br />
<h3>1. Find a website that gives information based on a postcode</h3>
<p>First, I needed to find an API which would return a page of information on any postcode in JSON&#8230;</p>
<p>If that sounds like double-dutch, don&#8217;t worry, try this instead.</p>
<p><em>Translation</em>: First, I needed either of these websites: <a href="http://www.uk-postcodes.com/" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/?referer=');">http://www.uk-postcodes.com/</a> or <a href="http://mapit.mysociety.org/" onclick="urchinTracker('/outgoing/mapit.mysociety.org/?referer=');">http://mapit.mysociety.org/</a></p>
<p>Both of these will generate a page giving you details about any given postcode. The formatting of these pages is consistent, e.g.
<ul>	
<li><a href="http://www.uk-postcodes.com/postcode/B422SU.json" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/postcode/B422SU.json?referer=');">http://www.uk-postcodes.com/postcode/B422SU.json</a></li>
<p>	
<li><a href="http://mapit.mysociety.org/postcode/B42%202SU" onclick="urchinTracker('/outgoing/mapit.mysociety.org/postcode/B42_202SU?referer=');">http://mapit.mysociety.org/postcode/B42%202SU</a></li>
<p></ul>
<p>(The first removes the space between the two parts of the postcode, and adds .json; the second replaces the space with %20 &#8211; although I&#8217;m <a href="http://twitter.com/dracos/statuses/14442954189967360" onclick="urchinTracker('/outgoing/twitter.com/dracos/statuses/14442954189967360?referer=');">told by Matthew Somerville that it will work with spaces and postcodes without spaces</a>)</p>
<p>This information will be important when we start to use Google Refine&#8230;<br />
<h3>2. Create a new column that has text in the same format as the webpages you want to fetch</h3>
<p>In Google Refine click on the arrow at the top of your postcode column and <a href="http://excelnotes.posterous.com/grel-remove-spaces-from-a-column" onclick="urchinTracker('/outgoing/excelnotes.posterous.com/grel-remove-spaces-from-a-column?referer=');">follow the instructions here</a> to create a new column which has the same postcode information, but with no spaces. To replace the space with %20 instead you would replace the express with<br />
<blockquote>
<pre>value.split(" ").join("%20")</pre>
<p></p></blockquote>
<p>Let&#8217;s name this column &#8216;SpacesRemoved&#8217; and click OK.</p>
<p>Now that we&#8217;ve got postcodes in the same format as the webpages above, we can start to fetch a bunch of code giving us extra information on those postcodes.<br />
<h3>3. Write some code that goes to a webpage and fetches information about each postcode</h3>
<p>In Google Refine click on the arrow at the top of your &#8216;SpacesRemoved&#8217; column and create a new column by selecting <em>&#8216;Edit column&#8217; &gt; &#8216;Add column by fetching URLs&#8230;&#8217;</em></p>
<p>You can <a href="http://code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices?referer=');">read more about this functionality here</a>.</p>
<p>This time you will type the expression:<br />
<blockquote>
<pre>"http://www.uk-postcodes.com/postcode/"+value+".json"</pre>
<p></p></blockquote>
<p>That basically creates a URL that inserts &#8216;value&#8217; (the value in the previous column) where you want it.</p>
<p>Call this column &#8216;JSON for postcode&#8217; and click OK.</p>
<p>Each cell will now be filled with the results of that webpage. This might take a while.<br />
<h3>4. Write some code that pulls out a specific piece of information from that</h3>
<p>In Google Refine click on the arrow at the top of your &#8216;SpacesRemoved&#8217; column and create a new column by selecting <em>&#8216;Edit column&#8217; &gt; &#8216;Add column based on this column&#8230;&#8217;</em></p>
<p>Write the following expression:<br />
<blockquote>
<pre>value.parseJson()["administrative"]["district"]["title"]</pre>
<p></p></blockquote>
<p>Look at the preview as you type this and you&#8217;ll see information become more specific as you add each term in square brackets.</p>
<p>Call this &#8216;Council&#8217; and click OK.</p>
<p>This column will now be populated with the council names for each postcode. You can repeat this process for other information, adapting the expression for different pieces of information such as constituency, easting and northing, and so on.<br />
<h3>5. Export as a standard spreadsheet</h3>
<p>Click <em>Export</em> in the top right corner and save your spreadsheet in the format you prefer. You can then upload this to Google Docs and share it publicly.<br />
<h3>Other possibilities</h3>
<p>Although this post is about postcode data you can use the same principles to add information based on any data that you can find an API for. For example if you had a column of charities you could use the Open Charities API to pull further details (<a href="http://opencharities.org/info/about" onclick="urchinTracker('/outgoing/opencharities.org/info/about?referer=');">http://opencharities.org/info/about</a>). For local authority data you could pull from the OpenlyLocal API (<a href="http://openlylocal.com/info/api" onclick="urchinTracker('/outgoing/openlylocal.com/info/api?referer=');">http://openlylocal.com/info/api</a>).</p>
<p>If you know of other similarly useful APIs let me know. </p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F12%2F16%2Fadding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;The mass market was a hack&#8221;: Data and the future of journalism</title>
		<link>http://onlinejournalismblog.com/2010/09/23/the-mass-market-was-a-hack-data-and-the-future-of-journalism/</link>
		<comments>http://onlinejournalismblog.com/2010/09/23/the-mass-market-was-a-hack-data-and-the-future-of-journalism/#comments</comments>
		<pubDate>Thu, 23 Sep 2010 07:20:11 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[online journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[bloggers cut]]></category>
		<category><![CDATA[brave news world]]></category>
		<category><![CDATA[brave news worlds]]></category>
		<category><![CDATA[Guardian]]></category>
		<category><![CDATA[ipi]]></category>
		<category><![CDATA[New York Times]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=9979</guid>
		<description><![CDATA[The following is an unedited version of an article written for the International Press Institute report &#8216;Brave News Worlds (PDF)&#8216; For the past two centuries journalists have dealt in the currency of information: we transmuted base metals into narrative gold. But information is changing. At first, the base metals were eye witness accounts, and interviews. Later we learned to melt<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2010/09/23/the-mass-market-was-a-hack-data-and-the-future-of-journalism/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F23%2Fthe-mass-market-was-a-hack-data-and-the-future-of-journalism%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F09_2F23_2Fthe-mass-market-was-a-hack-data-and-the-future-of-journalism_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F23%2Fthe-mass-market-was-a-hack-data-and-the-future-of-journalism%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>The following is an unedited version of an article written for the International Press Institute report &#8216;</em><a href="http://www.poynter.org/resource/190466/IPI_Poynter_report.pdf" onclick="urchinTracker('/outgoing/www.poynter.org/resource/190466/IPI_Poynter_report.pdf?referer=');"><em>Brave News Worlds (PDF)</em></a><em>&#8216;</em></p>
<p>For the past two centuries journalists have dealt in the currency of information: we transmuted base metals into narrative gold. But information is changing.</p>
<p>At first, the base metals were eye witness accounts, and interviews. Later we learned to melt down official reports, research papers, and balance sheets. And most recently our alloys have been diluted by statements and press releases.</p>
<p>But now journalists are having to get to grips with a new type of information: data. And this is a very rich seam indeed.</p>
<h2>Data: what, how and why</h2>
<p>Data is a broad term so I should define it here: I am not talking here about statistics or numbers in general, because those are nothing new to journalists. When I talk about data I mean information that can be processed by computers.</p>
<p>This is a crucial distinction: it is one thing for a journalist to look at a balance sheet on paper; it is quite another to be able to dig through those figures on a spreadsheet, or to write a programming script to analyse that data, and match it to other sources of information. We can also more easily analyse new types of data, such as live data, large amounts of text, user behaviour patterns, and network connections.</p>
<p>And that, for me, is hugely important. Indeed, it is potentially transformational. Adding computer processing power to our journalistic arsenal allows us to do more, faster, more accurately, and with others. All of which opens up new opportunities &#8211; and new dangers. Things are going to change.<span id="more-9979"></span></p>
<p>We&#8217;ve had over 40 years to see this coming. The growth of the spreadsheet and the database from the 1960s onwards kicked things off by making it much easier for organisations &#8211; including governments &#8211; to digitise information from what they spent our money on to how many people were being treated for which diseases, and where.</p>
<p>In the 1990s the invention of the world wide web accelerated the data at journalists&#8217; disposal by providing a platform for those spreadsheets and databases to be published and accessed by both humans and computer programs &#8211; and a network to distribute it.</p>
<p>And now two cultural movements have combined to add a political dimension to the spread of data: the open data movement, and the linked data movement. Journalists should be familiar with these movements: the arguments that they have developed in holding power to account are a lesson in dealing with entrenched interests, while their experiments with the possibilities of data journalism show the way forward.</p>
<p>While the open data movement campaigns for important information &#8211; such as government spending, scientific information and maps &#8211; to be made publicly available for the benefit of society both democratically and economically, the linked data movement (championed by the inventor of the web, Sir Tim Berners-Lee) campaigns for that data to be made available in such a way that it can be linked to other sets of data so that, for instance, a computer can see that the director of a company named in a particular government contract is the same person who was paid as a consultant on a related government policy document. Advocates argue that this will also result in economic and social benefits.</p>
<p>Concrete results of both movements can be seen in the US and UK &#8211; most visibly with the launch of government data repositories Data.gov and Data.gov.uk in 2009 and 2010 respectively &#8211; but also less publicised experiments such as Where Does My Money Go? &#8211; which uses data to show how public expenditure is distributed &#8211; and Mapumental &#8211; which combines travel data, property prices and public ratings of &#8216;scenicness&#8217; to help you see at a glance which areas of a city might be the best place to live based on your requirements.</p>
<p>But there are dozens if not hundreds of similar examples in industries from health and science to culture and sport. We are experiencing an unprecedented release of data &#8211; some have named it &#8216;Big Data&#8217; &#8211; and yet for the most part, media organisations have been slow to react.</p>
<p>That is about to change.</p>
<h2>The data journalist</h2>
<p>Over the last year an increasing number of news organisations have started to wake from their story-centric production lines and see the value of data. In the UK the MPs&#8217; expenses story was seminal: when a newspaper dictates the news agenda for six weeks, the rest of Fleet Street pays attention &#8211; and at the core of this story was a million pieces of data on a disc. Since then every serious news organisation has expanded its data operations.</p>
<p>In the US the journalist-programmer Adrian Holovaty has pioneered the form with the data mashup ChicagoCrime.org and its open source offspring Everyblock, while Aron Pilhofer has innovated at the interactive unit at The New York Times, and new entrants from Talking Points Memo to ProPublica have used data as a launchpad for interrogating the workings of government.</p>
<p>To those involved, it feels like heady days. In reality, it&#8217;s very early days indeed. Data journalism takes in a huge range of disciplines, from Computer Assisted Reporting (CAR) and programming, to visualisation and statistics. If you are a journalist with a strength in one of those areas, you are currently exceptional. This cannot last for long: the industry will have to skill up, or it will have nothing left to sell.</p>
<p>Because while news organisations for years made a business out of being a middleman processing content between commerce and consumers, and government and citizens, the internet has made that business model obsolete. It is not enough any more for a journalist to simply be good at writing &#8211; or rewriting. There are a million others out there who can write better &#8211; large numbers of them working in PR, marketing, or government. While we will always need professional storytellers, many journalists are simply factory line workers.</p>
<p>So on a commercial level if nothing else, publishing will need to establish where the value lies in this new environment &#8211; and the new efficiencies to make journalism viable.</p>
<p>Data journalism is one of those areas. With a surfeit of public data being made available, there is a rich supply of raw material. The scarcity lies in the skills to locate and make sense of that &#8211; whether the programming skills to scrape it and compare it with other sources in the first place, the design flair to visualise it, or the statistical understanding to unpick it.</p>
<h2>&#8220;The mass market was a hack&#8221;: opportunities for the new economy</h2>
<p>The technological opportunity is massive. As processing power continues to grow, the ability to interrogate, combine and present data continues to increase. The development of augmented reality provides a particularly attractive publishing opportunity: imagine being able to see local data-based stories through your mobile phone, or indeed add data to the picture through your own activity. The experiments of the past five years will come to see crude in comparison.</p>
<p>And then there is the commercial opportunity. Publishing is for most publishers, after all, not about selling content but about selling advertising. And here also data has taken on increasing importance. The mass market was a hack. As the saying goes: &#8220;Half the money I spend on advertising is wasted; the trouble is I don&#8217;t know which half.&#8221;</p>
<p>But Google, Facebook and others have used the measurability of the web to reduce the margin of error, and publishers will have to follow suit. It makes sense to put data at the centre of that &#8211; while you allow users to drill into the data you have gathered around automotive safety, the offering to advertisers is likely to say &#8220;We can display different adverts based on what information the user is interested in&#8221;, or &#8220;We can point the user to their local dealership based on their location&#8221;.</p>
<h2>A collaborative future</h2>
<p>I&#8217;m skeptical of the ability of established publishers to adapt to such a future but, whether they do or not, others will. And the backgrounds of journalists will have to change. The profession has a history of arts graduates who are highly literate but not typically numerate. That has already been the source of ongoing embarrassment for the profession as expert bloggers have highlighted basic errors in the way journalists cover science, health and finance &#8211; and it cannot continue.</p>
<p>We will need more journalists who can write a killer Freedom of Information request; more researchers with a knowledge of the hidden corners of the web where databases &#8211; the &#8216;invisible web&#8217; &#8211; reside. We will need programmer-journalists who can write a screen scraper to acquire, sort, filter and store that information, and combine or compare it with other sources. We will need designers who can visualise that data in the clearest way possible &#8211; not just for editorial reasons but distribution too: infographics are an increasingly significant source of news site traffic.</p>
<p>There is a danger of &#8216;data churnalism&#8217; &#8211; taking public statistics and visualising them in a spectacular way that lacks insight or context. Editors will need the statistical literacy to guard against this, or they will be found out.</p>
<p>And it is not just in editorial that innovation will be needed. Advertising sales will need to experience the same revolution that journalists have experienced, learning the language of web metrics, behavioural advertising and selling the benefits to advertisers.</p>
<p>And as publishers of data too, executives will need to adopt the philosophies of the open data and linked data movements to take advantage of the efficiencies that they provide. The New York Times and The Guardian have both published APIs that allow others to build web services with their content. In return they get access to otherwise unaffordable technical, mathematical and design expertise, and benefit from new products and new audiences, as (in the Guardian&#8217;s case) advertising is bundled in with the service. As these benefits become more widely recognised, other publishers will follow.</p>
<p>I have a hope that this will lead to a more collaborative form of journalism. The biggest resource a publisher has is its audience. Until now publishers have simply packaged up that resource for advertisers. But now that the audience is able to access the same information and tools as journalists, to interact with publishers and with each other, they are valuable in different ways.</p>
<p>At the same time the value of the newsroom has diminished: its size has shrunk, its competitive advantage reduced; and no single journalist has the depth and breadth of skillset needed across statistics, CAR, programming and design that data journalism requires. A new medium &#8211; and a new market &#8211; demands new rules. The more networked and iterative form of journalism that we&#8217;ve already seen emerge online is likely to become even more conventional as publishers move from a model that sees the story as the unit of production, to a model that starts with data.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F09%2F23%2Fthe-mass-market-was-a-hack-data-and-the-future-of-journalism%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/09/23/the-mass-market-was-a-hack-data-and-the-future-of-journalism/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

