<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; google docs</title>
	<atom:link href="http://onlinejournalismblog.com/tag/google-docs/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Sat, 11 Feb 2012 12:06:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>4 ways to publish your data online</title>
		<link>http://onlinejournalismblog.com/2011/12/08/4-ways-to-publish-your-data-online/</link>
		<comments>http://onlinejournalismblog.com/2011/12/08/4-ways-to-publish-your-data-online/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 09:22:47 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[buzzdata]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[google docs]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15550</guid>
		<description><![CDATA[I&#8217;ve written a post on the Help Me Investigate blog on a number of different ways to publish data online, from converting Excel spreadsheets into HTML tables, to using Google Docs, or using data-sharing platforms like BuzzData. You may find it useful.]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F12%2F08%2F4-ways-to-publish-your-data-online%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F12_2F08_2F4-ways-to-publish-your-data-online_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F12%2F08%2F4-ways-to-publish-your-data-online%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve written<a title="How do I publish my data online" href="http://helpmeinvestigate.posterous.com/how-do-i-publish-my-data-online" onclick="urchinTracker('/outgoing/helpmeinvestigate.posterous.com/how-do-i-publish-my-data-online?referer=');"> a post on the Help Me Investigate blog on a number of different ways to publish data online</a>, from converting Excel spreadsheets into HTML tables, to using Google Docs, or using data-sharing platforms like BuzzData. You may find it useful.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F12%2F08%2F4-ways-to-publish-your-data-online%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/12/08/4-ways-to-publish-your-data-online/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scraping data from a list of webpages using Google Docs</title>
		<link>http://onlinejournalismblog.com/2011/10/14/scraping-data-from-a-list-of-webpages-using-google-docs/</link>
		<comments>http://onlinejournalismblog.com/2011/10/14/scraping-data-from-a-list-of-webpages-using-google-docs/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 09:48:14 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[importxml]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[Something for the weekend]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15172</guid>
		<description><![CDATA[Quite often when you&#8217;re looking for data as part of a story, that data will not be on a single page, but on a series of pages. To manually copy the data from each one &#8211; or even scrape the data individually &#8211; would take time. Here I explain a way to use Google Docs to grab the data for<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/10/14/scraping-data-from-a-list-of-webpages-using-google-docs/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F10%2F14%2Fscraping-data-from-a-list-of-webpages-using-google-docs%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F10_2F14_2Fscraping-data-from-a-list-of-webpages-using-google-docs_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F10%2F14%2Fscraping-data-from-a-list-of-webpages-using-google-docs%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Quite often when you&#8217;re looking for data as part of a story, that data will not be on a single page, but on a series of pages. To manually copy the data from each one &#8211; or even scrape the data individually &#8211; would take time. Here I explain a way to use Google Docs to grab the data for you.</p>
<h2>Some basic principles</h2>
<p>Although Google Docs is a pretty clumsy tool to use to scrape webpages, the method used is much the same as if you were writing a scraper in a programming language like Python or Ruby. For that reason, I think this is a good quick way to introduce the basics of certain types of scrapers.</p>
<p>Here&#8217;s how it works:</p>
<p>Firstly, you need a list of links to the pages containing data.</p>
<p>Quite often that list might be on a webpage which links to them all, but if not you should look at whether the links have any common structure, for example &#8220;http://www.country.com/data/australia&#8221; or &#8220;http://www.country.com/data/country2&#8243;. If it does, then you can generate a list by filling in the part of the URL that changes each time (in this case, the country name or number), assuming you have a list to fill it from (i.e. a list of countries, codes or simple addition).</p>
<p>Second, you need the destination pages to have some consistent structure to them. In other words, they should look the same (although looking the same doesn&#8217;t mean they have the same structure &#8211; more on this below).</p>
<p>The scraper then cycles through each link in your list, grabs particular bits of data from each linked page (because it is always in the same place), and saves them all in one place.</p>
<h2>Scraping with Google Docs using =importXML &#8211; a case study</h2>
<p>If you&#8217;ve not used =importXML before it&#8217;s worth catching up on my previous 2 posts <a rel="bookmark" href="http://onlinejournalismblog.com/2011/07/29/sftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml/">How to scrape webpages and ask questions with Google Docs and =importXML</a> and <a rel="bookmark" href="http://onlinejournalismblog.com/2011/08/05/sftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change/">Asking questions of a webpage – and finding out when those answers change</a>.</p>
<p>This takes things a little bit further.</p>
<p>In this case I&#8217;m going to scrape some data for a story about local history &#8211; the data for which is helpfully <a href="http://www.dmm.org.uk/mindex.htm" onclick="urchinTracker('/outgoing/www.dmm.org.uk/mindex.htm?referer=');">published by the Durham Mining Museum</a>. Their homepage has a list of local mining disasters, with the date and cause of the disaster, the name and county of the colliery, the number of deaths, and links to the names and to a page about each colliery.</p>
<p>However, there is not enough geographical information here to map the data. That, instead, is <a href="http://www.dmm.org.uk/colliery/h029.htm" onclick="urchinTracker('/outgoing/www.dmm.org.uk/colliery/h029.htm?referer=');">provided on each colliery&#8217;s individual page</a>.</p>
<p>So we need to go through this list of webpages, grab the location information, and pull it all together into a single list.</p>
<h2>Finding the structure in the HTML</h2>
<p>To do this we need to isolate which part of the homepage contains the list. If you right-click on the page to &#8216;view source&#8217; and search for &#8216;Haig&#8217; (the first colliery listed) we can see it&#8217;s in a table that has a beginning tag like so: &lt;table border=0 align=center style=&#8221;font-size:10pt&#8221;&gt;</p>
<p>We can use =importXML to grab the contents of the table like so:</p>
<p>=Importxml(&#8220;http://www.dmm.org.uk/mindex.htm&#8221;, &#8221;//table[starts-with(@style, 'font-size:10pt')]&#8220;)</p>
<p>But we only want the links, so how do we grab just those instead of the whole table contents?</p>
<p>The answer is to add more detail to our request. If we look at the HTML that contains the link, it looks like this:</p>
<p>&lt;td valign=top&gt;&lt;a href=&#8221;<a href="http://www.dmm.org.uk/colliery/h029.htm" target="_blank" onclick="urchinTracker('/outgoing/www.dmm.org.uk/colliery/h029.htm?referer=');">http://www.dmm.org.uk/colliery/h029.htm</a>&#8220;&gt;Haig&amp;nbsp;Pit&lt;/a&gt;&lt;/td&gt;</p>
<p>So it&#8217;s within a &lt;td&gt; tag &#8211; but <em>all</em> the data in this table is, not surprisingly, contained within &lt;td&gt; tags. The key is to identify which &lt;td&gt; tag we want &#8211; and in this case, it&#8217;s always the fourth one in each row.</p>
<p>So we can add &#8220;//td[4]&#8221; (&#8216;<em>look for the fourth &lt;td&gt; tag&#8217;</em>) to our function like so:</p>
<p>=Importxml(&#8220;http://www.dmm.org.uk/mindex.htm&#8221;, &#8221;//table[starts-with(@style, 'font-size:10pt')]//td[4]&#8220;)</p>
<p>Now we should have a list of the collieries &#8211; but we want the actual URL of the page that is linked to with that text. That is contained within the value of the href attribute &#8211; or, put in plain language: it comes after the bit that says href=&#8221;.</p>
<p>So we just need to add one more bit to our function: &#8220;//@href&#8221;:</p>
<p>=Importxml(&#8220;http://www.dmm.org.uk/mindex.htm&#8221;, &#8221;//table[starts-with(@style, 'font-size:10pt')]//td[4]//@href&#8221;)</p>
<p>So, reading from the far right inwards, this is what it says: &#8220;<em>Grab the value of href, within the fourth &lt;td&gt; tag on every row, of the table that has a style value of font-size:10pt</em>&#8221;</p>
<p>Note: if there was only one link in every row, we wouldn&#8217;t need to include //td[4] to specify the link we needed.</p>
<h2>Scraping data from each link in a list</h2>
<p>Now we have a list &#8211; but we still need to scrape some information from each link in that list</p>
<p>Firstly, we need to identify the location of information that we need on the linked pages. Taking <a href="http://www.dmm.org.uk/colliery/h029.htm" onclick="urchinTracker('/outgoing/www.dmm.org.uk/colliery/h029.htm?referer=');">the first page</a>, view source and search for &#8216;Sheet 89&#8242;, which are the first two words of the &#8216;Map Ref&#8217; line.</p>
<p>The HTML code around that information looks like this:</p>
<p>&lt;td valign=top&gt;(Sheet 89) NX965176, 54&amp;#176; 32&amp;#39; 35&amp;#34; N, 3&amp;#176; 36&amp;#39; 0&amp;#34; W&lt;/td&gt;</p>
<p>Looking a little further up, the table that contains this cell uses HTML like this:</p>
<p>&lt;table border=0 width=&#8221;95%&#8221;&gt;</p>
<p>So if we needed to scrape this information, we would write a function like this:</p>
<p>=importXML(&#8220;http://www.dmm.org.uk/colliery/h029.htm&#8221;, &#8220;//table[starts-with(@width, '95%')]//tr[2]//td[2]&#8220;)</p>
<p>&#8230;And we&#8217;d have to write it for every URL.</p>
<p>But because we have a list of URLs, we can do this much quicker by using cell references instead of the full URL.</p>
<p>So. Let&#8217;s assume that your formula was in cell C2 (<a href="https://docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDZ5RTFXcThPeExLcWt6dVJLZERhLWc&amp;hl=en_GB" onclick="urchinTracker('/outgoing/docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDZ5RTFXcThPeExLcWt6dVJLZERhLWc_amp_hl=en_GB&amp;referer=');">as it is in this example</a>), and the results have formed a column of links going from C2 down to C11. Now we can write a formula that looks at each URL in turn and performs a scrape on it.</p>
<p>In D2 then, we type the following:</p>
<p>=importXML(C2, &#8220;//table[starts-with(@width, '95%')]//tr[2]//td[2]&#8220;)</p>
<p>If you copy the cell all the way down the column, it will change the function so that it is performed on each neighbouring cell.</p>
<p>In fact, we could simplify things even further by putting the second part of the function in cell D1 &#8211; without the quotation marks &#8211; like so:</p>
<p>//table[starts-with(@width, '95%')]//tr[2]//td[2]</p>
<p>And then in D2 change the formula to this:</p>
<p>=ImportXML(C2,$D$1)</p>
<p>(The dollar signs keep the D1 reference the same even when the formula is copied down, while C2 will change in each cell)</p>
<p>Now it works &#8211; we have the data from each of 8 different pages. Almost.</p>
<h2>Troubleshooting with =IF</h2>
<p>The problem is that the structure of those pages is not as consistent as we thought: the scraper is producing extra cells of data for some, which knocks out the data that should be appearing there from other cells.</p>
<p>So I&#8217;ve used an IF formula to clean that up as follows:</p>
<p>In cell E2 I type the following:</p>
<p>=if(D2=&#8221;", ImportXML(C2,$D$1), D2)</p>
<p>Which says &#8216;<em>If D2 is empty, then run the importXML formula again and put the results here, but if it&#8217;s not empty then copy the values across</em>&#8216;</p>
<p>That formula is copied down the column.</p>
<p>But there&#8217;s still one empty column even now, so the same formula is used again in column F:</p>
<p>=if(E2=&#8221;", ImportXML(C2,$D$1), E2)</p>
<h2>A hack, but an instructive one</h2>
<p>As I said earlier, this isn&#8217;t the best way to write a scraper, but it is a useful way to start to understand how they work, and a quick method if you don&#8217;t have huge numbers of pages to scrape. With hundreds of pages, it&#8217;s more likely you will miss problems &#8211; so watch out for inconsistent structure and data that doesn&#8217;t line up.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F10%2F14%2Fscraping-data-from-a-list-of-webpages-using-google-docs%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/10/14/scraping-data-from-a-list-of-webpages-using-google-docs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SFTW: Asking questions of a webpage &#8211; and finding out when those answers change</title>
		<link>http://onlinejournalismblog.com/2011/08/05/sftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change/</link>
		<comments>http://onlinejournalismblog.com/2011/08/05/sftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 07:50:58 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[importxml]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14970</guid>
		<description><![CDATA[Previously I wrote on how to use the =importXML formula in Google Docs to pull information from an XML page into a conventional spreadsheet. In this Something For The Weekend post I&#8217;ll show how to take that formula further to grab information from webpages &#8211; and get updates when that information changes. Asking questions of a webpage &#8211; or find<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/08/05/sftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F05%2Fsftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F08_2F05_2Fsftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F05%2Fsftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://onlinejournalismblog.com/2011/07/29/sftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml/">Previously I wrote on how to use the =importXML formula in Google Docs to pull information from an XML page into a conventional spreadsheet</a>. In this <a href="http://onlinejournalismblog.com/tag/something-for-the-weekend/">Something For The Weekend</a> post I&#8217;ll show how to take that formula further to grab information from webpages &#8211; and get updates when that information changes.</p>
<div class="wp-caption alignnone" style="width: 518px"><a href="http://www.labnol.org/internet/monitor-web-pages-changes-with-google-docs/4536/" onclick="urchinTracker('/outgoing/www.labnol.org/internet/monitor-web-pages-changes-with-google-docs/4536/?referer=');"><img src="http://img.labnol.org/di/googledocsmovie.gif" alt="Animation from Digital Inspiration" width="508" height="200" /></a><p class="wp-caption-text">Animation from Digital Inspiration</p></div>
<h2>Asking questions of a webpage &#8211; or find out when the answer changes</h2>
<p>Despite its name, the =importXML formula can be used to grab information from HTML pages as well. <a href="http://seogadget.co.uk/playing-around-with-importxml-in-google-spreadsheets/" onclick="urchinTracker('/outgoing/seogadget.co.uk/playing-around-with-importxml-in-google-spreadsheets/?referer=');">This post on SEO Gadget</a>, for example, gives a series of examples ranging from grabbing information on Twitter users to price information and web analytics (it also has some further guidance on using these techniques, and is well worth a read for that).</p>
<p>Asking questions of webpages typically requires more advanced use of XPath than I outlined previously &#8211; and more trial and error.</p>
<p>This is because, while XML is a language designed to provide structure around data, HTML &#8211; used as it is for a much wider range of purposes &#8211; isn&#8217;t quite so tidy.</p>
<h2>Finding the structure</h2>
<p>To illustrate how you can use =importXML to grab data from a webpage, I&#8217;m going to grab data from Gorkana, a job ads site.</p>
<p><span id="more-14970"></span></p>
<p>If you look at <a href="http://www.gorkanajobs.co.uk/jobs/journalist/" onclick="urchinTracker('/outgoing/www.gorkanajobs.co.uk/jobs/journalist/?referer=');">their journalists jobs page</a>, you&#8217;ll see all sorts of information, from navigation and ads to feeds and policies. This is how you could grab a specific piece of data from a page, and put it into a table structure, to answer any questions you might have:</p>
<p>Make a note of the first word or phrase in the section you want (e.g. &#8220;Senior account executive&#8221;) then right-click on the page and select <strong>View Source</strong> or whatever option allows you to see the HTML code behind the page.</p>
<p>You could scroll through this to try to find the bit you want, but it&#8217;s easier to use your search facility to find that key phrase you noted earlier (e.g. &#8220;Senior account executive&#8221;)</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2011/07/Picture-32.png"><img class="alignnone size-full wp-image-14979" title="Searching within HTML" src="http://onlinejournalismblog.com/wp-content/uploads/2011/07/Picture-32.png" alt="Searching within HTML" width="437" height="134" /></a></p>
<p>What you&#8217;re hoping to find is some sort of div class tag just above that key phrase &#8211; and in this case there&#8217;s one called div class=&#8221;jobWrap&#8221;</p>
<p>This means that the creator of the webpage has added some structure to it, wrapping all their job ads in that div.</p>
<p>We just need to write a formula that is equally specific.</p>
<h2>Writing the formula</h2>
<p>Open up a spreadsheet in Google Docs and write the following formula in cell B1:</p>
<p><strong>=importXML(&#8220;http://www.gorkanajobs.co.uk/jobs/journalist/&#8221;, &#8220;//div[starts-with(@class, 'jobWrap')]&#8220;)</strong></p>
<p>When you press Enter you should see 3 columns filled with values from that particular part of the webpage: the job title; the package; and a brief description. Now that you have this data in a structured format you could, for example, work out average wages of job ads, or the most common job titles.</p>
<p>But how did that formula work? As I&#8217;ve explained most of =importXML in the previous post, I&#8217;ll just explain the query part here. So:</p>
<p><strong>//div</strong></p>
<p>is looking for a tag that begins
<div. However, there are a few divs, so we need to be more specific...</p>
<p><strong>[starts-with</strong></p>
<p>is specifying that the this div must begin in a particular way, and it does so by grabbing one thing, and looking for another thing within it:</p>
<p><strong>(@class, 'jobWrap')</strong></p>
<p>is saying that the div class should contain 'jobWrap' <em>(bonus points: the @ sign indicates an attribute; class is an attribute of the div; 'jobWrap' is the value of the attribute)</em></p>
<p><strong>]</strong></p>
<p>&#8230;finishes off that test.</p>
<p>Even if you don&#8217;t understand the code itself, you can adapt it to your own purposes as long as you can find the right div class tag and replace &#8216;jobWrap&#8217; with whatever the value is in your case.</p>
<p>It doesn&#8217;t even have to be a div class &#8211; you could replace //div with other tags such as //p for each paragraph fitting a particular criteria.</p>
<p>You can also replace @class with another attribute, such as @id or @title. It depends on the HTML of the page you&#8217;re trying to grab information from.</p>
<h2>Where&#8217;s the structure come from?</h2>
<p>Why has this data been put into 3 columns? The answer is in the HTML again. You&#8217;ll see that the job title is between h4 tags. The location and package is within ul tags and the description is within p tags, before each div is closed with the tag /div</p>
<p>But if you keep reading that HTML you&#8217;ll also see that after that /div there is some more information within a different div tag: div class=&#8221;adBody&#8221;. This contains the name of the recruiter and a link to a page where you can apply. There&#8217;s also a third div with a link to an image of the recruiter.</p>
<p>You could adapt your importXML formula to grab these instead &#8211; or add a new formula in D2 to add this extra information alongside the others (watch out for mismatches where one div may be missing for some reason).</p>
<h2>Finding and cleaning links</h2>
<p>You&#8217;ll notice that the formula above grabs text, but not the links within each job ad. To do that we need to adapt the formula as follows. Try typing this in cell D2:</p>
<p><strong>=ImportXML(<strong>&#8220;http://www.gorkanajobs.co.uk/jobs/journalist/&#8221;</strong>, &#8220;//div[starts-with(@class, 'jobWrap')]//@href&#8221;)</strong></p>
<p>This is identical to the previous formula, with one addition at the end:</p>
<p><strong>//@href</strong></p>
<p>What this does is grab the value of any link in the HTML. In other words, the bit after a href=&#8221;</p>
<p>You&#8217;ll notice that the results are partial URLs, such as /job/3807/senior-account-executive-account-manager/</p>
<p>These are known as <em>relative</em> URLs, because they are relative to the site they are on, but will not work when placed on another site.</p>
<p>This is easily cleaned up. In cell E2 type the following:</p>
<p><strong>=CONCATENATE(&#8220;http://www.gorkanajobs.co.uk&#8221;,D2)</strong></p>
<p>This creates a new URL beginning with http://www.gorkanajobs.co.uk and ending with the contents of cell D2 &#8211; the relative URL. Copy the formula down the column so it works for all cells in column D.</p>
<h2>Not always so tidy</h2>
<p>Not all websites will be so structured. The more structured the webpage &#8211; or the data within it &#8211; the better. But you may have to dig into the HTML and/or tweak your formula to find that structure. Or you may have to settle for some rough and ready data that you clean later.</p>
<p>The key advantage of =importXML, however, is how it is able to <strong>pull information from HTML into a table that you can then interrogate</strong>, with different columns for different parts of that data.</p>
<p>For more help with these processes you can <a href="http://www.w3schools.com/xpath/xpath_syntax.asp" onclick="urchinTracker('/outgoing/www.w3schools.com/xpath/xpath_syntax.asp?referer=');">find explanations of how to write expressions in XPath here</a> but be prepared to use trial and error to get the right expression for the question you&#8217;re asking. <a href="http://vancouverdata.blogspot.com/2011/02/how-to-web-scraping-xpath-html-google.html" onclick="urchinTracker('/outgoing/vancouverdata.blogspot.com/2011/02/how-to-web-scraping-xpath-html-google.html?referer=');">The Vancouver Data Blog offers some specific examples</a> that can be easily adapted.</p>
<h2>Getting updates from your spreadsheet</h2>
<p>Finally, this can be useful because Google Docs allows you to receive notifications whenever any changes are made, and to publish your spreadsheet as an RSS feed. This is <a href="http://www.labnol.org/internet/monitor-web-pages-changes-with-google-docs/4536/" onclick="urchinTracker('/outgoing/www.labnol.org/internet/monitor-web-pages-changes-with-google-docs/4536/?referer=');">explained in this blog post</a>, which is also the source of the movie above.</p>
<p>And if you want to see all of this in action <em><a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdGFuMVlsZzZySHNqZGhkdjNPMENtRHc&#038;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdGFuMVlsZzZySHNqZGhkdjNPMENtRHc_038_hl=en_GB&amp;referer=');">I&#8217;ve published an example spreadsheet demonstrating all the above techniques here</a>.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F05%2Fsftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/08/05/sftw-asking-questions-of-a-webpage-and-finding-out-when-those-answers-change/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SFTW: How to scrape webpages and ask questions with Google Docs and =importXML</title>
		<link>http://onlinejournalismblog.com/2011/07/29/sftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml/</link>
		<comments>http://onlinejournalismblog.com/2011/07/29/sftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 08:24:51 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[importxml]]></category>
		<category><![CDATA[openlylocal]]></category>
		<category><![CDATA[Something for the weekend]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14943</guid>
		<description><![CDATA[Here&#8217;s another Something for the Weekend post. Last week I wrote a post on how to use the =importFeed formula in Google Docs spreadsheets to pull an RSS feed (or part of one) into a spreadsheet, and split it into columns. Another formula which performs a similar function more powerfully is =importXML. There are at least 2 distinct journalistic uses<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/07/29/sftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F29%2Fsftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F07_2F29_2Fsftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F29%2Fsftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div class="wp-caption alignnone" style="width: 301px"><a href="http://www.flickr.com/photos/dullhunk/3448804778/" onclick="urchinTracker('/outgoing/www.flickr.com/photos/dullhunk/3448804778/?referer=');"><img src="http://farm4.static.flickr.com/3663/3448804778_6fc1876655_o.png" alt="XML puzzle cube" width="291" height="300" /></a><p class="wp-caption-text">Image by dullhunk on Flickr</p></div>
<p>Here&#8217;s another <a href="http://onlinejournalismblog.com/tag/something-for-the-weekend/">Something for the Weekend</a> post. Last week I wrote a post on <a href="http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/">how to use the =importFeed formula in Google Docs spreadsheets</a> to pull an RSS feed (or part of one) into a spreadsheet, and split it into columns.  Another formula which performs a similar function more powerfully is <strong>=importXML</strong>.</p>
<p>There are at least 2 distinct journalistic uses for =importXML:</p>
<ol>
<li>You have found information that is only available in XML format and need to put it into a standard spreadsheet to interrogate it or combine it with other data.</li>
<li>You want to extract some information from a webpage &#8211; perhaps on a regular basis &#8211; and put that in a structured format (a spreadsheet) so you can more easily ask questions of it.</li>
</ol>
<p>The first task is the easiest, so I&#8217;ll explain how to do that in this post. I&#8217;ll use a separate post to explain the latter.<span id="more-14943"></span></p>
<h2>Converting an XML feed into a table</h2>
<p>If you have some information in XML format it helps if you have some understanding of how XML is structured. A backgrounder on how to understand XML is covered in this post explaining <a href="http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/">XML for journalists</a>.</p>
<p>It also helps if you are using a browser which is good at displaying XML pages: Chrome, for example, not only staggers and indents different pieces of information, but also allows you to expand or collapse parts of that, and colours elements, values and attributes (which we&#8217;ll come on to below) differently.</p>
<p>Say, for example, you wanted a spreadsheet of UK council data, including latitude, longitude, CIPFA code, and so on &#8211; and you found the data, but it was in XML format at a page like this:  <a href="http://openlylocal.com/councils/all.xml" onclick="urchinTracker('/outgoing/openlylocal.com/councils/all.xml?referer=');">http://openlylocal.com/councils/all.xml</a></p>
<p>To pull that into a neatly structured spreadsheet in Google Docs, type the following into the cell where you want the import to begin (try typing in cell A2, leaving the first row free for you to add column headers):</p>
<p><strong>=ImportXML(&#8220;http://openlylocal.com/councils/all.xml&#8221;, &#8221;//council&#8221;)</strong></p>
<p>The formula (or, more accurately, function) needs two pieces of information, which are contained in the parentheses and separated by a comma: a web address (URL), and a query. Or, put another way:</p>
<p>=importXML(&#8220;theURLinQuotationMarks&#8221;, &#8220;theBitWithinTheURLthatYouWant&#8221;)</p>
<p>The URL is relatively easy &#8211; it is the address of the XML file you are reading (it should end in .xml). The query needs some further explanation.</p>
<p>The query tells Google Docs which bit of the XML you want to pull out. It uses a language called <strong>XPath</strong> &#8211; but don&#8217;t worry, you will only need to note down a few queries for most purposes.</p>
<p>Here&#8217;s an example of part of that XML file shown in the Chrome browser:</p>
<p><a href="http://onlinejournalismblog.com/wp-content/uploads/2011/07/Picture-12.png"><img class="alignnone size-full wp-image-14968" title="XML from OpenlyLocal" src="http://onlinejournalismblog.com/wp-content/uploads/2011/07/Picture-12.png" alt="XML from OpenlyLocal" width="471" height="190" /></a></p>
<p>The indentation and triangles indicate the way the data is structured. So, the &lt;councils&gt; tag contains at least one item called &lt;council&gt; (if you scrolled down, or clicked on the triangle to collapse &lt;council&gt; you would see there are a few hundred).</p>
<p>And each &lt;council&gt; contains an &lt;address&gt;, &lt;authority-type&gt;, and many other pieces of information.</p>
<p>If you wanted to grab every &lt;council&gt; from this XML file, then, you use the query &#8220;//council&#8221; as shown above. Think of the // as a replacement for the &lt; in a tag &#8211; you are saying: &#8216;grab the contents of every item that begins &lt;council&gt;&#8217;.</p>
<p>You&#8217;ll notice that in your spreadsheet where you have typed the formula above, it gathers the contents (called a value) of each tag within &lt;council&gt;, each tag&#8217;s value going into their own column &#8211; giving you dozens of columns.</p>
<p>You can continue this logic to look for tags within tags. For example, if you wanted to grab the &lt;name&gt; value from within each &lt;council&gt; tag, you could use:</p>
<p><strong>=ImportXML(&#8220;http://openlylocal.com/councils/all.xml&#8221;, &#8221;//council//name&#8221;)</strong></p>
<p>You would then only have one column, containing the names of all the councils &#8211; if that&#8217;s all you wanted. You could of course adapt the formula again in cell B2 to pull another piece of information. However, you may <a href="https://spreadsheets0.google.com/spreadsheet/pub?hl=en_GB&amp;hl=en_GB&amp;key=0ApTo6f5Yj1iJdGFuMVlsZzZySHNqZGhkdjNPMENtRHc&amp;single=true&amp;gid=10&amp;output=html" onclick="urchinTracker('/outgoing/spreadsheets0.google.com/spreadsheet/pub?hl=en_GB_amp_hl=en_GB_amp_key=0ApTo6f5Yj1iJdGFuMVlsZzZySHNqZGhkdjNPMENtRHc_amp_single=true_amp_gid=10_amp_output=html&amp;referer=');">end up with a mismatch of data</a> where that information is missing &#8211; so it&#8217;s always better to grab all the XML once, then clean it up on a copy.</p>
<p>If the XML is more complex then you can ask more complex questions &#8211; which I&#8217;ll cover in the second part of this post. You can also put the URL and/or query in other cells to simplify matters, e.g.</p>
<p><strong>=ImportXML(A1, B1)</strong></p>
<p>Where cell A1 contains <strong>http://openlylocal.com/councils/all.xml</strong> and B1 contains <strong>//council </strong>(note the lack of quotation marks). You then only need to change the contents of A1 or B1 to change the results, rather than having to edit the formula directly)</p>
<p>If you&#8217;ve any other examples, ideas or corrections, let me know. Meanwhile, <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdGFuMVlsZzZySHNqZGhkdjNPMENtRHc&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdGFuMVlsZzZySHNqZGhkdjNPMENtRHc_amp_hl=en_GB&amp;referer=');">I&#8217;ve published an example spreadsheet demonstrating all the above techniques here</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F29%2Fsftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/07/29/sftw-how-to-scrape-webpages-and-ask-questions-with-google-docs-and-importxml/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How to collaborate (or crowdsource) by combining Delicious and Google Docs</title>
		<link>http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/</link>
		<comments>http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/#comments</comments>
		<pubDate>Wed, 20 Jul 2011 14:42:53 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[collaboration]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[delicious]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[importfeed]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14851</guid>
		<description><![CDATA[During some training in open data I was doing recently, I ended up explaining (it&#8217;s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is. In a Google Docs spreadsheet the formula =importfeed will pull information from an RSS feed and put it<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F20%2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F07_2F20_2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F20%2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div class="wp-caption alignnone" style="width: 250px"><a href="http://www.flickr.com/photos/heatherweaver/2809992904/" onclick="urchinTracker('/outgoing/www.flickr.com/photos/heatherweaver/2809992904/?referer=');"><img title="RSS girl by Heather Weaver" src="http://farm4.static.flickr.com/3211/2809992904_23bbfbccd5.jpg" alt="RSS girl by Heather Weaver" width="240" height="400" /></a><p class="wp-caption-text">RSS girl by HeatherWeaver on Flickr</p></div>
<p>During some training in open data I was doing recently, I ended up explaining (it&#8217;s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is.</p>
<p>In a Google Docs spreadsheet the formula <strong>=importfeed</strong> will pull information from an RSS feed and put it into that spreadsheet. Titles, links, datestamps and other parts of the feed will each be separated into their own columns.</p>
<p>When combined with Delicious, this can be a useful way to collect together pages that have been bookmarked by a group of people, or any other feed that you want to analyse.</p>
<p>Here&#8217;s how you do it:<span id="more-14851"></span></p>
<h2>1. Decide on your tag, network or user</h2>
<p>The spreadsheet will pull data from an RSS feed. Delicious provides so many of these that you are spoilt for choice. Here are the main three:</p>
<h3><strong>A tag</strong></h3>
<p>Used by various people.</p>
<p><em>Advantages</em>: quick startup &#8211; all you need to do is tell people the tag (make sure this is unique, such as &#8216;unguessable2012&#8242;).</p>
<p><em>Disadvantages</em>: others can hijack the tag &#8211; although this can be cleaned from the resulting data.</p>
<h3><strong>A network</strong></h3>
<p>Consisting of the group of people who are bookmarking:</p>
<p><em>Advantages</em>: group cannot be infiltrated.</p>
<p><em>Disadvantages</em>: setup time &#8211; may need to create a new account to build the network around.</p>
<h3><strong>A user</strong></h3>
<p>Created for this purpose:</p>
<p><em>Advantages</em>: if users are not confident in using Delicious, this can be a useful workaround.</p>
<p><em>Disadvantages</em>: longer set up time &#8211; you&#8217;ll need to create a new account, and work out an easy way for it to automatically capture bookmarks from the group. One way is to pull an RSS feed of any mentions on Twitter and use <a href="http://Twitterfeed.com" onclick="urchinTracker('/outgoing/Twitterfeed.com?referer=');">Twitterfeed</a> to auto-tweet them with a hashtag, and then <a href="http://Packrati.us" onclick="urchinTracker('/outgoing/Packrati.us?referer=');">Packrati.us</a> to auto-bookmark all tweeted links (<a href="http://onlinejournalismblog.com/2011/07/11/an-experiment-in-creating-an-auto-debunker-twitter-account/">a similar process is detailed here</a>).</p>
<p>The RSS feed for each will be found at the bottom of pages, and is consistently formatted like so:</p>
<p>Delicious.com/<strong>tag</strong>/unguessable2012</p>
<p>Delicious.com/<strong>network</strong>/unguessable2012</p>
<p><strong>Delicious.com/</strong>unguessable2012</p>
<h2>2. Create your spreadsheet</h2>
<p>In Google Docs, create a new spreadsheet and in the first cell type the following formula:</p>
<p><strong>=importfeed(&#8220;</strong></p>
<p>&#8230;adding your RSS feed after the quotation mark, and then this at the end:</p>
<p><strong>&#8220;)</strong></p>
<p>So it looks something like this:</p>
<p><strong>=importfeed(&#8220;http://feeds.delicious.com/v2/rss/tag/unguessable2012?count=15&#8243;)</strong></p>
<p>Now press enter and after a moment the spreadsheet should populate with data from that feed.</p>
<p>You&#8217;ll note, however, that at most you will have only 15 rows of data here. That&#8217;s because the RSS feed you&#8217;ve copied includes that limitation.</p>
<p>If you look at the RSS feed you&#8217;ll see an easy clue on how to change this&#8230;</p>
<p>So, try editing it so that the <strong>count=15</strong> part of that URL reads <strong>count=20</strong> instead. You can put a higher number &#8211; but Google Docs will limit results to 20 at a time.</p>
<h2>3. Collecting contributions</h2>
<p>Technically, you&#8217;re now all set up. The bigger challenge is, of course, in getting people to contribute. It helps if they can see the results &#8211; so think about publishing your spreadsheet.</p>
<p>You&#8217;ll also need to make sure that you check it regularly and copy into a backup spreadsheet so you don&#8217;t miss results after that top 20.</p>
<p>But if you find it doesn&#8217;t work it may be worth thinking of other ways of doing this &#8211; for example, with a <a href="https://docs.google.com/support/bin/answer.py?answer=87809" onclick="urchinTracker('/outgoing/docs.google.com/support/bin/answer.py?answer=87809&amp;referer=');">Google Form</a>, or using =importfeed with the RSS feed for a search on results for a Twitter hashtag containing links (<a href="http://search.twitter.com/advanced" onclick="urchinTracker('/outgoing/search.twitter.com/advanced?referer=');">Twitter&#8217;s advanced search</a> allows you to limit results accordingly &#8211; and all search results come with an RSS feed link <a href="http://search.twitter.com/search.atom?q=+%23murdoch+filter%3Alinks" onclick="urchinTracker('/outgoing/search.twitter.com/search.atom?q=+_23murdoch+filter_3Alinks&amp;referer=');">like this one</a>)</p>
<p>Of course there are far more powerful ways of doing this which are worth exploring once you&#8217;ve understood the basic possibilities.</p>
<h2>Doing more with =importfeed</h2>
<p>The =importfeed formula has some other elements that we haven&#8217;t used.</p>
<p>Another way to do this, for example, is to paste your RSS feed URL into cell A1 and type the following anywhere else:</p>
<p><strong>=importfeed(A1, &#8221;Items Title&#8221;, FALSE, 20)</strong></p>
<p>This has 4 parts in the parentheses:</p>
<ol>
<li>A1 &#8211; this points at the URL you just pasted in cell A1, and means that you only have to change what&#8217;s in A1 to change the feed being grabbed, rather than having to edit the formula itself</li>
<li>&#8220;Items Title&#8221; &#8211; this is the part of the feed that is being grabbed. If you look in the feed you will see a part that says &lt;item&gt; and within that, an element called &lt;title&gt; &#8211; that&#8217;s it. You could change this to &#8220;Items URL&#8221; to get the &lt;URL&gt; part of &lt;title&gt; instead, for example. Or you could just put &#8220;Items&#8221; and get all 5 parts of each item (title, author, URL, date created, and summary). You can also use &#8220;feed&#8221; to get information about the feed itself, or &#8220;feed URL&#8221; or &#8220;feed title&#8221; or &#8220;feed description&#8221; to get that single piece of information.</li>
<li>FALSE &#8211; this just says whether you want a header row or not. Setting to TRUE will add an extra row saying &#8216;Title&#8217;, for example.</li>
<li>20 &#8211; the number of results you want.</li>
</ol>
<p>You can <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEo0WDFvZTBQWTdXUTJMRWJ3dTBEVUE&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEo0WDFvZTBQWTdXUTJMRWJ3dTBEVUE_amp_hl=en_GB&amp;referer=');">see an example spreadsheet with 3 sheets demonstrating different uses of this formula here</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F20%2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Data journalism pt4: visualising data &#8211; tools and publishing (comments wanted)</title>
		<link>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/</link>
		<comments>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 11:47:15 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[charttool]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[factual]]></category>
		<category><![CDATA[fusioncharts]]></category>
		<category><![CDATA[google chart tools]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[google fusion tables]]></category>
		<category><![CDATA[icharts]]></category>
		<category><![CDATA[jing]]></category>
		<category><![CDATA[kwout]]></category>
		<category><![CDATA[manyeyes]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[skitch]]></category>
		<category><![CDATA[socrata]]></category>
		<category><![CDATA[swivel]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[tag clouds]]></category>
		<category><![CDATA[tagxedo]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[verifiable]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[widgenie]]></category>
		<category><![CDATA[word clouds]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8413</guid>
		<description><![CDATA[This is a draft from a book chapter on data journalism (here are parts 1; two; and three, which looks the charts side of visualisation). I’d really appreciate any additions or comments you can make &#8211; particularly around tips and tools. UPDATE: It has now been published in The Online Journalism Handbook. Visualisation tools So if you want to visualise some<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F04_2F28_2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>This is a draft from a book chapter on data journalism (here are </em><em><a href="../2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/">parts 1</a></em><em>; </em><em><a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">two</a>; and <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/">three</a>, which looks the charts side of visualisation</em><em>). I’d really appreciate any additions or comments you can make &#8211; particularly around tips and tools.</em></p>
<p><strong>UPDATE: <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">It has now been published in The Online Journalism Handbook</a>.</strong></p>
<h2>Visualisation tools</h2>
<p>So if you want to visualise some data or text, how do you do it? Thankfully there are now dozens of free and cheap pieces of software that you can use to quickly turn your tables into charts, graphs and clouds.</p>
<p>The best-known tool for creating word clouds is <strong>Wordle </strong>(<a href="http://wordle.net" onclick="urchinTracker('/outgoing/wordle.net?referer=');">wordle.net</a>). Simply paste a block of text into the site, or the address of an RSS feed, and the site will generate a word cloud whose fonts and colours you can change to your preferences. Similar tools include <strong>Tagxedo </strong>(<a href="http://tagxedo.com/" onclick="urchinTracker('/outgoing/tagxedo.com/?referer=');">tagxedo.com</a>) and Wordlings (<a href="http://wordlin.gs/" onclick="urchinTracker('/outgoing/wordlin.gs/?referer=');">http://wordlin.gs</a>), both of which allow you to put your word cloud into a particular shape.</p>
<p><strong>ManyEyes </strong>(<a href="http://manyeyes.alphaworks.ibm.com/manyeyes/" onclick="urchinTracker('/outgoing/manyeyes.alphaworks.ibm.com/manyeyes/?referer=');">manyeyes.alphaworks.ibm.com/manyeyes/</a>) also allows you to create word clouds and tag clouds &#8211; as well as word trees and phrase nets that allow you to see common phrases. But it is perhaps most useful in allowing you to easily create scattergrams, bar charts, bubble charts and other forms. The site also contains a raft of existing data that you can play with to get a feel for the site. Similar tools that allow access to other data include <strong>Factual </strong>(<a href="http://factual.com" onclick="urchinTracker('/outgoing/factual.com?referer=');">factual.com</a>), <strong>Swivel </strong>(<a href="http://swivel.com" onclick="urchinTracker('/outgoing/swivel.com?referer=');">swivel.com</a>)[see comments], <strong>Socrata </strong>(<a href="http://socrata.com" onclick="urchinTracker('/outgoing/socrata.com?referer=');">socrata.com</a>) and <strong>Verifiable.com</strong> (<a href="http://verifiable.com" onclick="urchinTracker('/outgoing/verifiable.com?referer=');">verifiable.com</a>). And <strong>Google Fusion Tables</strong> (<a href="http://tables.googlelabs.com" onclick="urchinTracker('/outgoing/tables.googlelabs.com?referer=');">tables.googlelabs.com</a>) is particularly useful if you want to collaborate on tables of data, as well as offering visualisation options.</p>
<p>More general visualisation tools include <strong>widgenie </strong>(<a href="http://widgenie.com" onclick="urchinTracker('/outgoing/widgenie.com?referer=');">widgenie.com</a>), <strong>iCharts </strong>(<a href="http://icharts.net" onclick="urchinTracker('/outgoing/icharts.net?referer=');">icharts.net</a>), <strong>ChartTool </strong>(<a href="http://onlinecharttool.com" onclick="urchinTracker('/outgoing/onlinecharttool.com?referer=');">onlinecharttool.com</a>) and <strong>ChartGo </strong>(<a href="http://www.chartgo.com" onclick="urchinTracker('/outgoing/www.chartgo.com?referer=');">www.chartgo.com</a>). <strong>FusionCharts </strong>is a piece of visualisation software with a Google Gadget service that publishers may find useful. You can find instructions on how to use it at <a href="http://www.fusioncharts.com/GG/Docs/Index.html" onclick="urchinTracker('/outgoing/www.fusioncharts.com/GG/Docs/Index.html?referer=');">www.fusioncharts.com/GG/Docs</a></p>
<p>If you want more control over your visualisation &#8211; or want it to update dynamically when the source information is updated, <strong>Google Chart Tools</strong> (<a href="http://code.google.com/apis/charttools" onclick="urchinTracker('/outgoing/code.google.com/apis/charttools?referer=');">code.google.com/apis/charttools</a>) is worth exploring. This requires some technical knowledge, but there is a lot of guidance and help on the site to get you started quickly.</p>
<p><strong>Tableau Public </strong>is a piece of free software you can download (<a href="http://tableausoftware.com/public" onclick="urchinTracker('/outgoing/tableausoftware.com/public?referer=');">tableausoftware.com/public</a>) with some powerful visualisation options. You will also find visualisation options on spreadsheet applications such as <strong>Excel </strong>or the free <strong>Google Docs spreadsheet</strong> service. These are worth exploring as a way to quickly generate charts from your data on the fly.</p>
<h2>Publishing your visualisation</h2>
<p>There will come a point when you&#8217;ve visualised your data and need to publish it somehow. The simplest way to do this is to take an image (screengrab) of the chart or graph. This can be done with a web-based screencapture tool like <strong>Kwout </strong>(<a href="http://kwout.com" onclick="urchinTracker('/outgoing/kwout.com?referer=');">kwout.com</a>), a free desktop application like <strong>Skitch </strong>(<a href="http://skitch.com" onclick="urchinTracker('/outgoing/skitch.com?referer=');">skitch.com</a>) or <strong>Jing </strong>(<a href="http://jingproject.com" onclick="urchinTracker('/outgoing/jingproject.com?referer=');">jingproject.com</a>), or by simply using the &#8216;Print Screen&#8217; button on a PC keyboard (cmd+shift+3 on a Mac) and pasting the screengrab into a graphics package such as <strong>Photoshop</strong>.</p>
<p>The advantage of using a screengrab is that the image can be easily distributed on social networks, image sharing websites (such as Flickr), and blogs &#8211; driving traffic to the page on your site where it is explained.</p>
<p>If you are more technically minded, you can instead choose to embed your chart or graph. Many visualisation tools will give you a piece of code which you can copy and paste into the HTML of an article or blog post in the place you wish to display it (this will not work on most third party blog hosting services, such as WordPress.com). One particular advantage of this approach is that the visualisation can update itself if the source data is updated.</p>
<p>Alternatively, an understanding of Javascript can allow you to build &#8216;progressively enhanced&#8217; charts which allow users to access the original data or see what happens when it is changed.</p>
<h2>Showing your raw data</h2>
<p>It is generally a good idea to give users access to your raw data alongside its visualisation. This not only allows them to check it against your visualisation but add insights you may not otherwise gain. It is relatively straightforward to publish a spreadsheet online using Google Docs (see the sidebar on publishing a spreadsheet)</p>
<h2>SIDEBAR: How to: publish a spreadsheet online</h2>
<p><strong>Google Docs</strong> (<a href="http://docs.google.com" onclick="urchinTracker('/outgoing/docs.google.com?referer=');">docs.google.com</a>) is a free website which allows you to create and share documents. You can share them via email, by publishing them as a webpage, or by embedding your document in another webpage, such as a blog post. This is how you share a spreadsheet:</p>
<ol>
<li>Open your spreadsheet in Google Docs. You can upload a spreadsheet into Google Docs if you&#8217;ve created it elsewhere &#8211; there is a size limit, however, so if you are told the file is too big try removing unnecessary sheets or columns.</li>
<li>Look for the &#8216;Share&#8217; button (currently in the top right corner) and click on it.</li>
<li>A drop-down menu should appear. Click on &#8216;Publish as a web page&#8217;</li>
<li>A new window should appear asking which sheets you want to publish. Select the sheet you want to publish and click &#8216;Start publishing&#8217; (you should also make sure &#8216;Automatically republish when changes are made&#8217; is ticked if you want the public version of the spreadsheet to update with any data you add.)</li>
<li>Now the bottom half of that window &#8211; &#8216;Get a link to the published data&#8217; &#8211; should become active. In the bottom box should be a web address where you can now see the public version of your spreadsheet. If you want to share that, copy the address and test that it works in a web browser. You can now link to it from any webpage.</li>
<li>Alternatively, you can embed your spreadsheet &#8211; or part of it &#8211; in another webpage. To do this click on the first drop-down menu in this area &#8211; it will currently say &#8216;Web page&#8217; &#8211; and change it to &#8216;HTML to embed in a page&#8217;. Now the bottom box on this window should show some HTML that begins with</li>
<li>If you want to embed just part of a spreadsheet, in the box that currently says &#8216;All cells&#8217; type the range of cells you wish to show. For example, typing A1:G10 will select all the cells in your spreadsheet from A1 (the first row of column A) to G10 (the 10th row of column G). Once again, the HTML below will change so that it only displays that section of your spreadsheet.</li>
</ol>
<p><em>Once again, I&#8217;d welcome any comments on things I may have missed or tips you can add. <a href="http://onlinejournalismblog.com/2010/05/04/data-journalism-pt5-mashing-data-comments-wanted/">Part 5, on mashups, is now available here</a>.</em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Data journalism pt3: visualising data &#8211; charts and graphs (comments wanted)</title>
		<link>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/</link>
		<comments>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 09:49:59 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[bar charts]]></category>
		<category><![CDATA[bubble charts]]></category>
		<category><![CDATA[charlie beckett]]></category>
		<category><![CDATA[chartgo]]></category>
		<category><![CDATA[charttool]]></category>
		<category><![CDATA[data journalism]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[factual]]></category>
		<category><![CDATA[fusioncharts]]></category>
		<category><![CDATA[google chart tools]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[google fusion tables]]></category>
		<category><![CDATA[histograms]]></category>
		<category><![CDATA[humanisation]]></category>
		<category><![CDATA[icharts]]></category>
		<category><![CDATA[jing]]></category>
		<category><![CDATA[kwout]]></category>
		<category><![CDATA[line graphs]]></category>
		<category><![CDATA[manyeyes]]></category>
		<category><![CDATA[marcos weskamp]]></category>
		<category><![CDATA[newsmap]]></category>
		<category><![CDATA[online journalism book]]></category>
		<category><![CDATA[personalisation]]></category>
		<category><![CDATA[pictograms]]></category>
		<category><![CDATA[pie charts]]></category>
		<category><![CDATA[scattergrams]]></category>
		<category><![CDATA[skitch]]></category>
		<category><![CDATA[small multiples]]></category>
		<category><![CDATA[socrata]]></category>
		<category><![CDATA[swivel]]></category>
		<category><![CDATA[tableau]]></category>
		<category><![CDATA[tableau public]]></category>
		<category><![CDATA[tag clouds]]></category>
		<category><![CDATA[tagxedo]]></category>
		<category><![CDATA[treemaps]]></category>
		<category><![CDATA[verifiable]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[widgenie]]></category>
		<category><![CDATA[word clouds]]></category>
		<category><![CDATA[wordle]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=8407</guid>
		<description><![CDATA[This is a draft from a book chapter on data journalism (the first, on gathering data, is here; the section on interrogating data is here). I’d really appreciate any additions or comments you can make &#8211; particularly around considerations in visualisation. A further section on visualisation tools, can be found here. UPDATE: It has now been published in The Online Journalism<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt3-visualising-data-comments-wanted%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2010_2F04_2F28_2Fdata-journalism-pt3-visualising-data-comments-wanted_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt3-visualising-data-comments-wanted%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>This is a draft from a book chapter on data journalism (<a href="../2010/04/21/data-journalism-pt1-finding-data-draft-comments-invited/">the first, on gathering data, is here</a>; the <a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">section on interrogating data is here</a>). I’d really appreciate any additions or comments you can make &#8211; particularly around considerations in visualisation. A further section <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/">on visualisation tools, can be found here</a></em><em>.</em></p>
<p><strong>UPDATE: <a href="http://www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?&amp;camp=2486&amp;linkCode=wey&amp;tag=onlijourblog-21&amp;creative=8882" onclick="urchinTracker('/outgoing/www.amazon.co.uk/Online-Journalism-Handbook-Survive-Digital/dp/140587340X/ref=as_li_ss_mfw?_amp_camp=2486_amp_linkCode=wey_amp_tag=onlijourblog-21_amp_creative=8882&amp;referer=');">It has now been published in The Online Journalism Handbook</a>.</strong></p>
<blockquote><p>&#8220;At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers &#8211; even a very large set &#8211; is to look at pictures of those numbers.&#8221; (Edward Tufte, <em><a href="http://bit.ly/9rMX9D" onclick="urchinTracker('/outgoing/bit.ly/9rMX9D?referer=');">The Visual Display of Quantitative Information</a>, 2001)</em></p></blockquote>
<p>Visualisation is the process of giving a graphic form to information which is often otherwise dry or impenetrable. Classic examples of visualisation include turning a table into a bar chart, or a series of percentage values into a pie chart &#8211; but the increasing power of both computer analysis and graphic design software have seen the craft of visualisation develop with increasing sophistication. In larger organisations the data journalist may work with a graphic artist to produce an infographic that visualises their story &#8211; but in smaller teams, in the initial stages of a story, or when speed is of the essence they are likely to need to use visualisation tools to give form to their data.</p>
<p>Broadly speaking there are two typical reasons for visualising data: to find a story; or to tell one. Quite often, it is both.<span id="more-8407"></span></p>
<p>In the parking tickets story <a href="http://onlinejournalismblog.com/2010/04/26/data-journalism-pt2-interrogating-data/">above</a>, for example, it was the process of visualisation that tipped off Adrian Short and Guardian journalist Charles Arthur to the story &#8211; and led to further enquiries.</p>
<p>In most cases, however, the story will not be as immediately visible. Sometimes the data will need to be visualised in different ways before a story becomes clear. And an understanding of the strengths of different types of visualisation can be particularly useful here.</p>
<p>UPDATE (Dec 7, 2010): Visualisation probably needs to be extended to include humanisation and personalisation. <a href="http://onlinejournalismblog.com/2010/12/07/wikileaks-cablegate/">More detail here</a>, and to come.</p>
<h2>Types of visualisation</h2>
<p>Visualisation can take on a range of forms. The most familiar are those we know from maths and statistics: <strong>pie charts</strong>, for example, allow you to show how one thing is divided &#8211; for example, how a budget is spent, or how a population is distributed. They are thought to be particularly useful when the proportions represented are large (for example, above 25%), but less useful when lower percentages are involved, due to issues with perception and the ability to compare different elements.</p>
<p>More useful in those circumstances are <strong>bar charts</strong> or <strong>histograms</strong>. Although these look the same there are subtle differences between them: the bars in bar charts represent categories (such as different cities), whereas bars in histograms represent different values on a continuum (for instance: ages, weights or amounts). You should avoid using 3D or shadow effects in bar charts as these do not add to the information or clarity (histograms do not have gaps between bars). The advantage of both types of chart over pie charts is that users can more easily see the difference between one quantity and another. Bar charts also allow you to show change over time.</p>
<p><strong>Pictograms</strong> are like bar charts but use an icon to represent quantity &#8211; so a population of 50,000 might be represented by 5 &#8216;person&#8217; icons. It is not advisable to use pictograms if quantities are close together as the user will find it harder to discern the differences.</p>
<p>Also useful for showing change over time are <strong>line graphs</strong>. Lines are &#8220;suited for showing trend, acceleration or deceleration, and volatility, including sudden peaks or troughs&#8221; (<a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0393072959" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0393072959?referer=');">Wong, 2010</a>, p51). In addition, a series of lines overlaid upon each other can also quickly show if any variables change at different points or at simultaneous points, suggesting either relationships or shared causes (but by no means proving it &#8211; these should be taken as starting points for further investigation. You should also avoid plotting more than four lines in one chart for purposes of clarity).</p>
<p>Line graphs should not be used to show unrelated events. As Seth Godin (<a href="http://sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html" onclick="urchinTracker('/outgoing/sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html?referer=');">2009</a>) puts it: &#8220;A graph of IQs of everyone in your kindergarten class should be a series of unrelated points, not a line graph. On the other hand, your weight loss is in fact a continuous function, so each piece of data should be attached.&#8221;</p>
<p><strong>Scattergrams </strong>are similar to line graphs, showing the distribution of individual elements against two axes, but can be particularly useful in showing up &#8216;outliers&#8217;. Outliers are pieces of data which differ noticeably from the rest. These may be of particular interest journalistically when they show, for example, an MP claiming substantially more (or less) expenses than their peers.</p>
<p>A number of charts can be visualised together in what is sometimes called <strong>&#8216;small multiples</strong>&#8216;, allowing the journalist or users to display a number of pie charts, line graphs or other charts alongside each other &#8211; allowing comparison, for example, between different populations.</p>
<p>Two increasingly popular forms of visualisation online are treemaps and bubble charts. Unlike other charts which allow you to visualise two aspects of the data (i.e. their place on each axis) <strong>bubble charts</strong> allow you to visualise three aspects of the data &#8211; the third being represented by the size of the bubble itself. A particularly good example of bubble charts in action can be seen in <a href="http://www.youtube.com/watch?v=RUwS1uAdUcI" onclick="urchinTracker('/outgoing/www.youtube.com/watch?v=RUwS1uAdUcI&amp;referer=');">Hans Rosling&#8217;s TED talk on debunking third-world myths</a> &#8211; a presentation which also demonstrates the potential of other forms of visualisation, and animation, in presenting complex information in an easy-to-understand way.</p>
<p>Finally, <strong>Treemaps </strong>visualise hierarchical data in a way that could be described as rectangular pie charts-within-pie charts. This is particularly useful for representing different parts of a whole and their relationship to each other, for instance, different budgets within a government.</p>
<p>Perhaps the best-known example of a treemap is <a href="http://newsmap.jp/" onclick="urchinTracker('/outgoing/newsmap.jp/?referer=');">Newsmap</a>, created in 2004 by Marcos Weskamp. This visualises the amount of coverage given to stories by news organisations based on a feed from Google News. Weskamp explains it as follows:</p>
<blockquote><p>&#8220;Google News automatically groups news stories with similar content and places them based on algorithmic results into clusters. In Newsmap, the size of each cell is determined by the amount of related articles that exist inside each news cluster that the Google News Aggregator presents. In that way users can quickly identify which news stories have been given the most coverage, viewing the map by region, topic or time. Through that process it still accentuates the importance of a given article.&#8221; (<a href="http://marumushi.com/projects/newsmap" onclick="urchinTracker('/outgoing/marumushi.com/projects/newsmap?referer=');">Weskamp, 2005</a>)</p></blockquote>
<p>These are just the most common forms of visualisation, but there are dozens more to explore. <a href="http://www.visual-literacy.org/periodic_table/periodic_table.html" onclick="urchinTracker('/outgoing/www.visual-literacy.org/periodic_table/periodic_table.html?referer=');">The Periodic Table of Visualisation</a> is a particularly useful webpage giving an overview of the various forms.</p>
<h2>Considerations in visualisation</h2>
<p>Charlie Beckett <a href="http://www.charliebeckett.org/?p=3930" onclick="urchinTracker('/outgoing/www.charliebeckett.org/?p=3930&amp;referer=');">makes a useful distinction</a> between using visualisation for &#8220;rational understanding (I now get the figures) and emotional understanding (I  now care about the figures and want to do something).&#8221; It is worth deciding which of the two you are aiming for.</p>
<p>When visualising data it is also important to ensure that any comparisons are meaningful, or like-for-like. In one visualisation of how many sales a musician needs to make to earn the minimum wage, for example, a comparison is made between sites selling albums, sites selling individual tracks, and those providing music streams. Clearly this is misleading &#8211; and was criticised for being so (<a href="http://techdirt.com/articles/20100413/1647599007.shtml" onclick="urchinTracker('/outgoing/techdirt.com/articles/20100413/1647599007.shtml?referer=');">Techdirt, 2010</a>).</p>
<p>The <a href="http://astore.amazon.co.uk/onlijourblog-21/detail/0393072959" onclick="urchinTracker('/outgoing/astore.amazon.co.uk/onlijourblog-21/detail/0393072959?referer=');">Wall Street Journal Guide to Information Graphics (2010</a>) offers a wealth of tips on elements to consider and mistakes to avoid in both visualisation and data research and is well worth reading for more on this area. Here are just a selection:</p>
<ul>
<li>&#8220;Choose the best data series to illustrate your point, e.g. market share vs. total revenue</li>
<li>&#8220;Filter and simplify the data to deliver the essence of the data to your intended audience</li>
<li>&#8220;Make numerical adjustments to the raw data to enhance your point, e.g. absolute values vs. percentage change</li>
<li>&#8220;Choose the appropriate chart settings, e.g. scale, y-axis increments and baseline</li>
<li>&#8220;If the raw data is insufficient to tell the story, do not add decorative elements. Instead, research additional sources and adjust data to stay on point</li>
<li>&#8220;Data is only as good as its source. Getting data from reputable and impartial sources is critical. For example, data should be benchmarked against a third party to avoid bias and add credibility</li>
<li>&#8220;In the research stage, a bigger data set allows more in-depth analysis. In the edit phase, it is important to assess whether all your extra information buries the main point of the story or enhancwes [it].&#8221;</li>
</ul>
<h2>Visualising large amounts of text</h2>
<p>If you are working with text rather than numbers there are ways to visualise that as well. <strong>Word clouds</strong>, for instance, show which words are used most often in a particular document (such as a speech, bill, or manifesto) or data stream (such as an RSS feed of what people are saying on Twitter or blogs). This can be particularly useful in drawing out the themes of a politician&#8217;s speech, for example, or the reaction from people online to a particular event. They can also be used to draw comparisons &#8211; word clouds have been used in the past to compare the inaugural speeches of Barack Obama with those of Bush and Clinton; and to compare the 2010 UK election manifestos of the Labour and Conservative parties. The <strong>tag cloud</strong> is similar to the word cloud, but typically allows you to click on an individual tag (word or phrase) to see where it has been used.</p>
<p>There are other forms for word visualisation too, particularly around showing relationships between words &#8211; when they occur together, or how often. The terminology varies: visualisation tool <strong>ManyEyes</strong>, for example, calls these <strong>word trees</strong> and <strong>phrase nets</strong> but other tools will have different names.</p>
<p><em>Once again, I&#8217;d welcome any comments on areas I may have missed or things journalists should consider. I&#8217;ve had to split this section into two, so <a href="http://onlinejournalismblog.com/2010/04/28/data-journalism-pt4-visualising-data-tools-and-publishing-comments-wanted/">Part 4 continues to look at visualisation, and focuses on tools and publishing</a>. </em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2010%2F04%2F28%2Fdata-journalism-pt3-visualising-data-comments-wanted%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2010/04/28/data-journalism-pt3-visualising-data-comments-wanted/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>MPs expenses data: now it&#8217;s The Telegraph&#8217;s turn</title>
		<link>http://onlinejournalismblog.com/2009/06/23/mps-expenses-data-now-its-the-telegraphs-turn/</link>
		<comments>http://onlinejournalismblog.com/2009/06/23/mps-expenses-data-now-its-the-telegraphs-turn/#comments</comments>
		<pubDate>Tue, 23 Jun 2009 12:40:16 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[expenses]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[mp expenses]]></category>
		<category><![CDATA[Telegraph]]></category>
		<category><![CDATA[visualisation]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=2880</guid>
		<description><![CDATA[The Telegraph have finally published their MPs&#8217; expenses data online &#8211; and it&#8217;s worth the wait. Here are some initial thoughts and reactions: Firstly, they&#8217;ve made user behaviour an editorial feature. In plain English: they&#8217;re showing the most searched-for MPs and constituencies, which is not only potentially interesting in itself, but also makes it easier for the majority of users<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2009/06/23/mps-expenses-data-now-its-the-telegraphs-turn/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F06%2F23%2Fmps-expenses-data-now-its-the-telegraphs-turn%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2009_2F06_2F23_2Fmps-expenses-data-now-its-the-telegraphs-turn_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F06%2F23%2Fmps-expenses-data-now-its-the-telegraphs-turn%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The Telegraph have finally <a href="http://parliament.telegraph.co.uk/mpsexpenses/home" onclick="urchinTracker('/outgoing/parliament.telegraph.co.uk/mpsexpenses/home?referer=');">published their MPs&#8217; expenses data online</a> &#8211; and it&#8217;s worth the wait. Here are some initial thoughts and reactions:</p>
<ul>
<li>Firstly, they&#8217;ve made user behaviour an editorial feature. In plain English: they&#8217;re showing the most searched-for MPs and constituencies, which is not only potentially interesting in itself, but also makes it easier for the majority of users who are making those searches (i.e. they can access it with a click rather than by typing)</li>
<li>There&#8217;s also a table for most expensive MPs. As this is going to remain static, it would be good to see a dedicated page with more information &#8211; in the same way the paper did in its weekend supplement.</li>
<li>The results page for a particular MP has a search engine-friendly URL. Very often, database-generated pages have poor search engine optimisation, partly because the URLs are full of digits and symbols, and partly because they are dynamically generated. This appears to avoid both problems &#8211; the URL for the second home allowance of Khalid Mahmood MP, for example, is <a href="http://parliament.telegraph.co.uk/mpsexpenses/second-home/Khalid-Mahmood/mp-11087" onclick="urchinTracker('/outgoing/parliament.telegraph.co.uk/mpsexpenses/second-home/Khalid-Mahmood/mp-11087?referer=');">http://parliament.telegraph.co.uk/mpsexpenses/second-home/Khalid-Mahmood/mp-11087</a></li>
<li>The <a href="http://parliament.telegraph.co.uk/mpsexpenses/uncensored-files/Khalid-Mahmood/mp-11087" onclick="urchinTracker('/outgoing/parliament.telegraph.co.uk/mpsexpenses/uncensored-files/Khalid-Mahmood/mp-11087?referer=');">uncensored expenses files themselves</a> are embedded using Issuu. This seems a strange choice as it doesn&#8217;t allow users to tag or comment &#8211; and the email/embed option is disabled for &#8220;secret documents&#8221;</li>
<li>There&#8217;s some nice subtle animation on the second home part of expenses, and clear visualisation on other parts.</li>
<li>The MP Details page is intelligently related both to the Telegraph site (related articles) and the wider web, with the facility to easily email that MP, go to their Wikipedia entry, and &#8216;bookmark&#8217;.</li>
<li>Joy of joys, you can also download the MPs expenses spreadsheet from here (on Google Docs) &#8211; although this is for all MPs rather than the one being viewed. Curiously, while viewing you can see who else is viewing and even (as I did) attempt to chat (no, they didn&#8217;t chat back).</li>
</ul>
<p>I&#8217;ll most likely update this post later as I get some details from behind the curtain.</p>
<p>And there are more general thoughts around the online treatment of expenses generally which I&#8217;ll try to blog at another point.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F06%2F23%2Fmps-expenses-data-now-its-the-telegraphs-turn%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2009/06/23/mps-expenses-data-now-its-the-telegraphs-turn/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using Google Spreadsheets as a database (no, it really is very interesting, honest)</title>
		<link>http://onlinejournalismblog.com/2009/05/19/using-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest/</link>
		<comments>http://onlinejournalismblog.com/2009/05/19/using-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest/#comments</comments>
		<pubDate>Tue, 19 May 2009 15:05:24 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[computer aided reporting]]></category>
		<category><![CDATA[data store]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[Guardian]]></category>
		<category><![CDATA[spreadsheets]]></category>
		<category><![CDATA[tony hirst]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=2708</guid>
		<description><![CDATA[This post by Tony Hirst should be recommended reading for every journalist interested in the potential of computers for reporting. Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users. Put aside any intimidation you might<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2009/05/19/using-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F05%2F19%2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2009_2F05_2F19_2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F05%2F19%2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://ouseful.wordpress.com/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/" onclick="urchinTracker('/outgoing/ouseful.wordpress.com/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/?referer=');">This post by Tony Hirst</a> should be recommended reading for every journalist interested in the potential of computers for reporting.</p>
<p>Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users.</p>
<p>Put aside any intimidation you might feel at the mention of APIs and query languages. What it boils down to is this: <strong>you can alter the web address of a Google spreadsheet to filter the data and find the story.</strong></p>
<p>Simple as that. </p>
<p>Hirst uses the example of the <a href="http://spreadsheets.google.com/ccc?key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/ccc?key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">spreadsheet of MPs expenses</a> recently <a href="http://www.guardian.co.uk/news/datablog/2009/may/15/mps-expenses-houseofcommons" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2009/may/15/mps-expenses-houseofcommons?referer=');">released by The Guardian</a> (they&#8217;ve also published <a href="http://www.guardian.co.uk/news/datablog/2009/may/15/lordreform-mps-expenses" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2009/may/15/lordreform-mps-expenses?referer=');">Lords expenses</a>). By altering the URLs this is what he generates (I&#8217;m quoting his bullet points):</p>
<ul>
<li>the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,I%20where%20I=23083&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_I_20where_20I=23083_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,I where I=23083</a> (column I is the additional costs allowance column);</li>
<li>How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20count(I)%20where%20I=23083&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20count_I_20where_20I=23083_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select count(I) where I=23083</a></li>
<li>So which people <em>did not</em> claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,I%20where%20I!=23083&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_I_20where_20I_=23083_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,I where I!=23083</a> (using &lt;&gt; for ‘not equals’ also works); NB here’s a more refined take on that query: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,I%20where%20(I!=23083%20and%20I%3E=0)%20order%20by%20I&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_I_20where_20_I_=23083_20and_20I_3E=0_20order_20by_20I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,I where (I!=23083 and I&gt;=0) order by I</a></li>
<li>search for the name, party (column D) and constituency (column E) of people whose first name is <em>Jane</em> or is recorded as <em>John</em> (rather than “Mr John”, or “Rt Hon John”): <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20B,C,D,E%20where%20(C%20contains%20'Joan'%20or%20C%20matches%20'John')&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20B_C_D_E_20where_20_C_20contains_20_Joan_20or_20C_20matches_20_John_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)</a></li>
<li>only show the people who have claimed less than £100,000 in total allowances : <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20*%20where%20F%3C100000&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20_20where_20F_3C100000_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select * where F&lt;100000</a></li>
<li>what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20sum(I)&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20sum_I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select sum(I)</a></li>
<li>So how many MPs are there? Count the number of rows in an arbitrary column: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20count(I)&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20count_I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select count(I)</a></li>
<li>Find the average amount claimed by the MPs: <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20sum(I)/count(I)&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20sum_I_/count_I_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select sum(I)/count(I)</a></li>
<li>Find out how much has been claimed by each party (column D): <a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20D,sum(I)%20where%20I%3E=0%20group%20by%20D&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20D_sum_I_20where_20I_3E=0_20group_20by_20D_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select D,sum(I) where I&gt;=0 group by D</a> (Setting I&gt;0 just ensures there is something in the column)</li>
<li>For each party, find out how much (on average) each party member claims:<a href="http://spreadsheets.google.com/tq?tqx=out:html&amp;tq=select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D&amp;key=phNtm3LmDZEObQ2itmSqHIA" onclick="urchinTracker('/outgoing/spreadsheets.google.com/tq?tqx=out_html_amp_tq=select_20D_sum_I_/count_I_20where_20I_3E=0_20group_20by_20D_amp_key=phNtm3LmDZEObQ2itmSqHIA&amp;referer=');">select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D”&gt;select D,sum(I)/count(I) where I&gt;=0 group by D</a></li>
</ul>
<p>OK, you need to know the words to use (and <strong>if you have a link to an easy reference for these let me know*</strong>), but this is still a lot easier than using programming languages and databases.</p>
<p>As I say, this also illustrates the importance of publishing raw data so users can interrogate it in their own ways, which is precisely what <a href="http://www.guardian.co.uk/data-store" onclick="urchinTracker('/outgoing/www.guardian.co.uk/data-store?referer=');">The Guardian&#8217;s Data Store</a> has been doing, meaning that people like Tony can <a href="http://ouseful.open.ac.uk/mpExpensesSearch.html" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/mpExpensesSearch.html?referer=');">create interfaces like this</a>.</p>
<p>Wonderful.</p>
<p>*Tony has very generously created <a href="http://ouseful.open.ac.uk/datastore/gspreadsheetdb2.php" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/datastore/gspreadsheetdb2.php?referer=');">this page which helps you formulate your search &#8211; and generates the URL</a>. If you were working on a different spreadsheet you could just replace the spreadsheet URL and change any column references accordingly.</p>
<p>UPDATE: Tony also has <a href="http://ouseful.open.ac.uk/datastore/gspreadsheetdb4.php" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?referer=');">a version which allows you to pick from Guardian datasets</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2009%2F05%2F19%2Fusing-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2009/05/19/using-google-spreadsheets-as-a-database-no-it-really-is-very-interesting-honest/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>

