<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; tutorial</title>
	<atom:link href="http://onlinejournalismblog.com/tag/tutorial/feed/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Sat, 11 Feb 2012 12:06:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>SFTW: How to grab useful political data with the They Work For You API</title>
		<link>http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/</link>
		<comments>http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 08:35:47 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[constituencies]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[grel]]></category>
		<category><![CDATA[guardian api]]></category>
		<category><![CDATA[Politics]]></category>
		<category><![CDATA[Something for the weekend]]></category>
		<category><![CDATA[they work for you]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14930</guid>
		<description><![CDATA[It&#8217;s been over 2 years since I stopped doing the &#8216;Something for the Weekend&#8217; series. I thought I would revive it with a tutorial on They Work For You and Google Refine&#8230; If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F22%2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F07_2F22_2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F22%2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><img src="http://www.theyworkforyou.com/images/logo.png" alt="They Work For You" /></p>
<p><em>It&#8217;s been over 2 years since I stopped doing the &#8216;<a href="http://onlinejournalismblog.com/tag/something-for-the-weekend/">Something for the Weekend&#8217; series</a>. I thought I would revive it with a tutorial on They Work For You and Google Refine&#8230;<br />
</em><br />
If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs for those constituencies – the <a href="http://www.theyworkforyou.com/api/" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/?referer=');">They Work For You API</a> can save you hours of fiddling &#8211; if you know how to use it.</p>
<p>An API is – for the purposes of journalists – a way of asking questions for reams of data. For example, you can use an API to ask “What constituency is each of these postcodes in?” or “When did these politicians enter office?” or even “Can you show me an image of these people?”</p>
<p>The They Work For You API will give answers to a range of UK political questions on subjects including Lords, MLAs (Members of the Legislative Assembly in Northern Ireland), MPs, MSPs (Members of the Scottish Parliament), select committees, debates, written answers, statements and constituencies.</p>
<p>When you combine that API with <strong>Google Refine</strong> you can fill a whole spreadsheet with additional political data, allowing you to answer questions you might otherwise not be able to.</p>
<p>I’ve written before on <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/">how to use Google Refine to pull data into a spreadsheet from the Google Maps API</a> and <a href="http://onlinejournalismblog.com/2010/12/16/adding-geographical-information-to-a-spreadsheet-based-on-postcodes-google-refine-and-apis/">the UK Postcodes API</a>, but this post takes things a bit further because the They Work For You API requires something called a ‘key’. This is quite common with APIs so knowing how to use them is &#8211; well &#8211; <em>key</em>. If you need extra help, try those tutorials first.<span id="more-14930"></span></p>
<h2>The They Work For You API key</h2>
<p>Unlike the previous APIs I’ve written about, the They Work For You API requires you to register for a ‘key’ to use it. If you don’t understand how this works the <a href="http://www.theyworkforyou.com/api/" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/?referer=');">instructions on the TWFY website</a> can be a little confusing. So here’s how it works:</p>
<p>The key is a password of sorts, used when you ask the API a question.</p>
<p>As your ‘question’ takes the form of a web address (URL) then that key needs to be included at a particular part of that URL.</p>
<p>You’ll see how that works when we get to asking the URL questions. But first, go to <a href="http://www.theyworkforyou.com/api/key" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/key?referer=');">http://www.theyworkforyou.com/api/key </a>to get a key.</p>
<p>Got it? OK, now copy it into a text document – or just keep this window open. You’ll need to paste it later.</p>
<h2>Using the TWFY key</h2>
<p>The API has a number of pre-set questions, called ‘functions’. These are listed in the right hand column, and include <a href="http://www.theyworkforyou.com/api/docs/getMPs" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getMPs?referer=');">getMPs</a>, <a href="http://www.theyworkforyou.com/api/docs/getLord" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getLord?referer=');">getLord</a>, <a href="http://www.theyworkforyou.com/api/docs/getDebates" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getDebates?referer=');">getDebates</a> and so on. If you click on any of these you will be given information on how they work, and you can also test the function with the ‘Explorer’.</p>
<p>To demonstrate how to use these functions, <a href="http://www.theyworkforyou.com/api/docs/getConstituency" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/docs/getConstituency?referer=');">click on getConstituency</a>.</p>
<p>If you use the ‘Explorer’ to test it (in this case with &#8216;Edinburgh South&#8221;) you will be shown a bunch of results at a URL like this:</p>
<p><strong>http://www.theyworkforyou.com/api/docs/getConstituency?name=edinburgh+south&amp;postcode=&amp;output=js#output</strong></p>
<p>Now you could manually use the Explorer to get information for each of the cells in a spreadsheet, but it&#8217;s much, much quicker to use the API to automate the process instead.</p>
<p>On that front the Explorer can be a little misleading. Because although it shows you the information you might get from the API, this is not the URL that you will need.</p>
<p>The URL you really need is shown above the results, and below the word ‘<strong><em>Output</em></strong>’ like so:</p>
<p><strong>http://www.theyworkforyou.com/api/getConstituency?name=edinburgh+south&amp;output=js</strong></p>
<p>If you copy and paste that URL into your browser you will get the following warning:</p>
<p><strong>{</strong><br />
<strong> error: &#8220;No API key provided. Please see http://www.theyworkforyou.com/api/key for more information.&#8221;</strong><br />
<strong> }</strong></p>
<p>So now we need that key.</p>
<h2>Using your key</h2>
<p>Assuming you still have your API key copied somewhere, or still open in another window, you can find instructions on how to use it at <a href="http://www.theyworkforyou.com/api/" onclick="urchinTracker('/outgoing/www.theyworkforyou.com/api/?referer=');">http://www.theyworkforyou.com/api/</a></p>
<p>Here you are told to use the key as part of the following structure:</p>
<p><strong>http://www.theyworkforyou.com/api/function?key=key&amp;output=output&amp;other_variables</strong></p>
<p>The important bit is where it says <strong>key=key&amp;</strong></p>
<p>That is where you need to add your own key, so that that part of the URL looks <em>something </em>like</p>
<p><strong>key=aTh0jklerJaHui7&amp;</strong></p>
<p>(where that random assortment of characters is your key, copied earlier, followed by the <strong>&amp;</strong> sign)</p>
<p>Going back for a moment to the URL that wasn’t working without a key, we can see that it can be split into two parts:</p>
<p><strong><strong>http://www.theyworkforyou.com/api/getConstituency?</strong><br />
</strong></p>
<p><em>and</em></p>
<p><strong><strong><strong>name=edinburgh+south&amp;output=js</strong></strong><br />
</strong></p>
<p>Adding in the <em>key</em> in the middle makes up a <em>third</em> part, like so:</p>
<p><strong><strong>http://www.theyworkforyou.com/api/getConstituency?</strong><br />
</strong></p>
<p><em>and</em></p>
<p><strong><strong><strong>key=key&amp;</strong></strong></strong></p>
<p><em>and</em></p>
<p><strong><strong><strong>name=edinburgh+south&amp;output=js</strong></strong></strong></p>
<p>So, you now need to <em>edit the output URL to include your API key</em>. It should then look something like this:</p>
<p>http://www.theyworkforyou.com/api/getConstituency?<strong>key=AHdajHUShajshaJ&#038;</strong>name=edinburgh+south&#038;output=js</p>
<p><em>UPDATE: Matthew Somerville points out that the key can be used anywhere after the ? so you can tag it on the end if that&#8217;s easier.</em></p>
<h2>The URL broken down further</h2>
<p>Just to clarify, these are the parts:</p>
<p><strong>http://www.theyworkforyou.com/</strong></p>
<p>(The website hosting the API)</p>
<p><strong>api/</strong></p>
<p>(The API)</p>
<p><strong>getConstituency?</strong></p>
<p>(The function – or question being asked)</p>
<p><strong>key=AHdajHUShajshaJ</strong></p>
<p>(Our API key – or password)</p>
<p><strong>&amp;name=edinburgh+south</strong></p>
<p>(and the constituency name that we are asking the API for information on)</p>
<p><strong>&amp;output=js</strong></p>
<p>(and the format we want the answer in &#8211; JSON, in this case)</p>
<p>You should now get a page of JSON code giving data for the question. If your browser doesn&#8217;t display it particularly well, try Chrome or Firefox.</p>
<h2>Using with Google Refine to get a bunch of results</h2>
<p>Great. But we could get one result by using the ‘Explorer’, so why did we need to do all that? Because we can now use Google Refine to automate the process of asking the same question hundreds of times.</p>
<p>To demonstrate this, <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDA3V010RUlqTjhYalN6ejh0T2ZGN0E&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdDA3V010RUlqTjhYalN6ejh0T2ZGN0E_amp_hl=en_GB&amp;referer=');">here&#8217;s a spreadsheet with 4 constituencies</a>. Open it, and select <strong>File &gt; Download as&#8230; &gt; CSV </strong></p>
<p>Open Google Refine (<a href="http://code.google.com/p/google-refine/wiki/Downloads?tm=2" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/Downloads?tm=2&amp;referer=');">download here</a>) and create a new project with that spreadsheet. Create a new column from the one you have by clicking on the arrow at the top of the column and selecting <strong>Edit Column &gt; Add Column by fetching URLs</strong></p>
<p>In the window that appears adapt the following piece of Google Refine Expression Language (GREL) with your own API key (shown in bold):</p>
<div id="left-panel">
<div>
<div id="refine-tabs-history">
<div>
<div><a>&#8220;http://www.theyworkforyou.com/api/getConstituency?<strong>key=Gr7jUUlKdhB3fsihFnHzab&amp;</strong>name=&#8221;+value+&#8221;&amp;output=js&#8221;</a></div>
</div>
<div>This generates a URL in each cell based on the value of the original column: the start and end of the URL are in quotation marks; the value is inserted in the middle where it says +value+</div>
<div><strong> </strong></div>
</div>
</div>
</div>
<p>(NOTE: Avoid copying and pasting as quotation marks may cause you problems. Instead try typing it in yourself &#8211; this also helps you remember things) This generates a URL in each cell based on the value of the original column: the start and end of the URL are in quotation marks; the value is inserted in the middle where it says<strong> +value+</strong></p>
<p>Give the column a name and click <strong>OK</strong>. It will now run &#8211; this test example only has 4 rows so you can see the results quickly.</p>
<p>You&#8217;ll see that only one row has actually worked &#8211; Tatton. The others have failed. Why? Because they have more than one word.</p>
<p>Take another look at that URL that the API returned earlier with the test of Edinburgh South:</p>
<p>http://www.theyworkforyou.com/api/getConstituency?key=AHdajHUShajshaJ&#038;<strong>name=edinburgh+south</strong>&#038;output=js</p>
<p>When a constituency has two words the space between them is represented by a plus sign &#8211; so we need to format our data in the same way for it to work.</p>
<h2>Formatting data for the API</h2>
<p>You could use Find and Replace in Excel to replace all spaces in that column with a plus sign but you will still hit problems with unusual constituency names. But this is how to do it in Google Refine:</p>
<p><del>Click on the arrow at the top of the constituency column and selecting <strong>Edit Column &gt; Add column based on this column&#8230;</strong></del></p>
<p><del> </del></p>
<p><del>In the window that appears type the following GREL:</del></p>
<p><del> </del></p>
<p>value.split(&#8221; &#8220;).join(&#8220;+&#8221;)</p>
<p>To explain:</p>
<p><em>&#8216;Value&#8217; is the value in each cell.</em></p>
<p><em>&#8216;.split(&#8221; &#8220;)&#8217; splits each value where there is a space (&#8221; &#8220;).</em></p>
<p>&nbsp;</p>
<p><del><em>&#8216;.join(&#8220;+&#8221;) then joins the resulting items together, with a plus sign.</em></del></p>
<p><del>Give it a name and click <strong>OK</strong>. You&#8217;ll see a new column with plus signs replacing the spaces. </del><em>[see comment from Matthew Somerville for explanation]</em></p>
<p>Create a new column from the one you have by clicking on the arrow at the top of the column and selecting <strong>Edit Column &gt; Add Column by fetching URLs</strong></p>
<p>In the window that appears adapt the following piece of Google Refine Expression Language (GREL) with your own API key (shown in bold):</p>
<p>&#8220;http://www.theyworkforyou.com/api/getConstituency?name=&#8221; + escape(value, &#8220;url&#8221;) + &#8220;<strong>&amp;<strong>key=Gr7jUUlKdhB3fsihFnHzab&amp;</strong></strong>output=js&#8221;</p>
<p><strong> </strong></p>
<p>The key part here is between the + signs. Whereas before we simply inserted the value of each cell, here we <em>escape</em> that value at the same time so that it will work in a URL.</p>
<p>This will change Edinburgh South to &#8220;edinburgh+south&#8221; but also Normanton, Pontefract and Castleford to &#8221;Normanton%2C+Pontefract+and+Castleford&#8221; and any other unforeseen characters in similar ways.</p>
<p>Give this new column a name, click <strong>OK</strong> and watch your new column populate itself with the JSON from each URL.</p>
<h2>Creating new columns from the JSON</h2>
<p>Now we can populate new columns with data taken from that JSON as follows:</p>
<p>Click on the arrow at the top of the <em>new </em>JSON column and select Edit Column &gt; Add column based on this column&#8230;</p>
<p>Type this GREL:</p>
<p>value.parseJson().bbc_constituency_id</p>
<p><em>(This looks in the JSON in each cell and pulls out the bit after bbc_constituency_id <img src='http://onlinejournalismblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em> And click OK.</p>
<p>Repeat the process for further columns as follows:</p>
<p><a>value.parseJson().guardian_election_results</a></p>
<p><a>value.parseJson().pa_id</a></p>
<p><a>value.parseJson().guardian_id</a></p>
<h2>Going further</h2>
<p>That&#8217;s just a demonstration of how to use a small part of the They Work For You API &#8211; there are lots of other functions that you can use to get other information. Have a play with those.</p>
<p>Meanwhile, what about those IDs? Well, the Guardian ID <a href="http://www.guardian.co.uk/open-platform/politics-api/getting-started" onclick="urchinTracker('/outgoing/www.guardian.co.uk/open-platform/politics-api/getting-started?referer=');">will allow you to play with The Guardian&#8217;s API</a> &#8211; which gives lots more information on each constituency. For an example see http://www.guardian.co.uk/politics/api/constituency/664/json</p>
<p>Based on that URL you can repeat the process above to grab more data.</p>
<p><em>Is this useful? Anything you can add? Or other data problems?</em></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F22%2Fhow-to-grab-useful-political-data-with-the-they-work-for-you-api%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/07/22/how-to-grab-useful-political-data-with-the-they-work-for-you-api/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>How to collaborate (or crowdsource) by combining Delicious and Google Docs</title>
		<link>http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/</link>
		<comments>http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/#comments</comments>
		<pubDate>Wed, 20 Jul 2011 14:42:53 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[collaboration]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[delicious]]></category>
		<category><![CDATA[google docs]]></category>
		<category><![CDATA[importfeed]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14851</guid>
		<description><![CDATA[During some training in open data I was doing recently, I ended up explaining (it&#8217;s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is. In a Google Docs spreadsheet the formula =importfeed will pull information from an RSS feed and put it<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F20%2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F07_2F20_2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F20%2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div class="wp-caption alignnone" style="width: 250px"><a href="http://www.flickr.com/photos/heatherweaver/2809992904/" onclick="urchinTracker('/outgoing/www.flickr.com/photos/heatherweaver/2809992904/?referer=');"><img title="RSS girl by Heather Weaver" src="http://farm4.static.flickr.com/3211/2809992904_23bbfbccd5.jpg" alt="RSS girl by Heather Weaver" width="240" height="400" /></a><p class="wp-caption-text">RSS girl by HeatherWeaver on Flickr</p></div>
<p>During some training in open data I was doing recently, I ended up explaining (it&#8217;s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is.</p>
<p>In a Google Docs spreadsheet the formula <strong>=importfeed</strong> will pull information from an RSS feed and put it into that spreadsheet. Titles, links, datestamps and other parts of the feed will each be separated into their own columns.</p>
<p>When combined with Delicious, this can be a useful way to collect together pages that have been bookmarked by a group of people, or any other feed that you want to analyse.</p>
<p>Here&#8217;s how you do it:<span id="more-14851"></span></p>
<h2>1. Decide on your tag, network or user</h2>
<p>The spreadsheet will pull data from an RSS feed. Delicious provides so many of these that you are spoilt for choice. Here are the main three:</p>
<h3><strong>A tag</strong></h3>
<p>Used by various people.</p>
<p><em>Advantages</em>: quick startup &#8211; all you need to do is tell people the tag (make sure this is unique, such as &#8216;unguessable2012&#8242;).</p>
<p><em>Disadvantages</em>: others can hijack the tag &#8211; although this can be cleaned from the resulting data.</p>
<h3><strong>A network</strong></h3>
<p>Consisting of the group of people who are bookmarking:</p>
<p><em>Advantages</em>: group cannot be infiltrated.</p>
<p><em>Disadvantages</em>: setup time &#8211; may need to create a new account to build the network around.</p>
<h3><strong>A user</strong></h3>
<p>Created for this purpose:</p>
<p><em>Advantages</em>: if users are not confident in using Delicious, this can be a useful workaround.</p>
<p><em>Disadvantages</em>: longer set up time &#8211; you&#8217;ll need to create a new account, and work out an easy way for it to automatically capture bookmarks from the group. One way is to pull an RSS feed of any mentions on Twitter and use <a href="http://Twitterfeed.com" onclick="urchinTracker('/outgoing/Twitterfeed.com?referer=');">Twitterfeed</a> to auto-tweet them with a hashtag, and then <a href="http://Packrati.us" onclick="urchinTracker('/outgoing/Packrati.us?referer=');">Packrati.us</a> to auto-bookmark all tweeted links (<a href="http://onlinejournalismblog.com/2011/07/11/an-experiment-in-creating-an-auto-debunker-twitter-account/">a similar process is detailed here</a>).</p>
<p>The RSS feed for each will be found at the bottom of pages, and is consistently formatted like so:</p>
<p>Delicious.com/<strong>tag</strong>/unguessable2012</p>
<p>Delicious.com/<strong>network</strong>/unguessable2012</p>
<p><strong>Delicious.com/</strong>unguessable2012</p>
<h2>2. Create your spreadsheet</h2>
<p>In Google Docs, create a new spreadsheet and in the first cell type the following formula:</p>
<p><strong>=importfeed(&#8220;</strong></p>
<p>&#8230;adding your RSS feed after the quotation mark, and then this at the end:</p>
<p><strong>&#8220;)</strong></p>
<p>So it looks something like this:</p>
<p><strong>=importfeed(&#8220;http://feeds.delicious.com/v2/rss/tag/unguessable2012?count=15&#8243;)</strong></p>
<p>Now press enter and after a moment the spreadsheet should populate with data from that feed.</p>
<p>You&#8217;ll note, however, that at most you will have only 15 rows of data here. That&#8217;s because the RSS feed you&#8217;ve copied includes that limitation.</p>
<p>If you look at the RSS feed you&#8217;ll see an easy clue on how to change this&#8230;</p>
<p>So, try editing it so that the <strong>count=15</strong> part of that URL reads <strong>count=20</strong> instead. You can put a higher number &#8211; but Google Docs will limit results to 20 at a time.</p>
<h2>3. Collecting contributions</h2>
<p>Technically, you&#8217;re now all set up. The bigger challenge is, of course, in getting people to contribute. It helps if they can see the results &#8211; so think about publishing your spreadsheet.</p>
<p>You&#8217;ll also need to make sure that you check it regularly and copy into a backup spreadsheet so you don&#8217;t miss results after that top 20.</p>
<p>But if you find it doesn&#8217;t work it may be worth thinking of other ways of doing this &#8211; for example, with a <a href="https://docs.google.com/support/bin/answer.py?answer=87809" onclick="urchinTracker('/outgoing/docs.google.com/support/bin/answer.py?answer=87809&amp;referer=');">Google Form</a>, or using =importfeed with the RSS feed for a search on results for a Twitter hashtag containing links (<a href="http://search.twitter.com/advanced" onclick="urchinTracker('/outgoing/search.twitter.com/advanced?referer=');">Twitter&#8217;s advanced search</a> allows you to limit results accordingly &#8211; and all search results come with an RSS feed link <a href="http://search.twitter.com/search.atom?q=+%23murdoch+filter%3Alinks" onclick="urchinTracker('/outgoing/search.twitter.com/search.atom?q=+_23murdoch+filter_3Alinks&amp;referer=');">like this one</a>)</p>
<p>Of course there are far more powerful ways of doing this which are worth exploring once you&#8217;ve understood the basic possibilities.</p>
<h2>Doing more with =importfeed</h2>
<p>The =importfeed formula has some other elements that we haven&#8217;t used.</p>
<p>Another way to do this, for example, is to paste your RSS feed URL into cell A1 and type the following anywhere else:</p>
<p><strong>=importfeed(A1, &#8221;Items Title&#8221;, FALSE, 20)</strong></p>
<p>This has 4 parts in the parentheses:</p>
<ol>
<li>A1 &#8211; this points at the URL you just pasted in cell A1, and means that you only have to change what&#8217;s in A1 to change the feed being grabbed, rather than having to edit the formula itself</li>
<li>&#8220;Items Title&#8221; &#8211; this is the part of the feed that is being grabbed. If you look in the feed you will see a part that says &lt;item&gt; and within that, an element called &lt;title&gt; &#8211; that&#8217;s it. You could change this to &#8220;Items URL&#8221; to get the &lt;URL&gt; part of &lt;title&gt; instead, for example. Or you could just put &#8220;Items&#8221; and get all 5 parts of each item (title, author, URL, date created, and summary). You can also use &#8220;feed&#8221; to get information about the feed itself, or &#8220;feed URL&#8221; or &#8220;feed title&#8221; or &#8220;feed description&#8221; to get that single piece of information.</li>
<li>FALSE &#8211; this just says whether you want a header row or not. Setting to TRUE will add an extra row saying &#8216;Title&#8217;, for example.</li>
<li>20 &#8211; the number of results you want.</li>
</ol>
<p>You can <a href="https://spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEo0WDFvZTBQWTdXUTJMRWJ3dTBEVUE&amp;hl=en_GB" onclick="urchinTracker('/outgoing/spreadsheets.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJdEo0WDFvZTBQWTdXUTJMRWJ3dTBEVUE_amp_hl=en_GB&amp;referer=');">see an example spreadsheet with 3 sheets demonstrating different uses of this formula here</a>.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F07%2F20%2Fhow-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/07/20/how-to-collaborate-or-crowdsource-by-combining-delicious-and-google-docs/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Postcards from a Text Processing Excursion</title>
		<link>http://blog.ouseful.info/2011/06/03/postcards-from-a-text-processing-excursion/</link>
		<comments>http://blog.ouseful.info/2011/06/03/postcards-from-a-text-processing-excursion/#comments</comments>
		<pubDate>Fri, 03 Jun 2011 11:53:54 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[text processing]]></category>
		<category><![CDATA[text wrangling]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[Uncourse]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5579</guid>
		<description><![CDATA[It never ceases to amaze me how I lack even the most basic computer skills, but that&#8217;s one of the reasons I started this blog: to demonstrate and record my fumbling learning steps so that others maybe don&#8217;t have to spend so much time being as dazed and confused as I am most of the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5579&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>It never ceases to amaze me how I lack even the most basic computer skills, but that&#8217;s one of the reasons I started this blog: to demonstrate and record my fumbling learning steps so that others maybe don&#8217;t have to spend so much time being as dazed and confused as I am most of the time&#8230;</p>
<p>Anyway, I spent a fair chunk of yesterday trying to find a way of getting started with grappling with CSV data text files that are just a bit too big to comfortably manage in a text editor or simple spreadsheet (so files over 50,000 or so rows, up to low millions) and that should probably be dumped into a database <em>if</em> that option was available, but for whatever reason, isn&#8217;t&#8230; (Not feeling comfortable with setting up and populating a database is one example&#8230;But I doubt I&#8217;ll get round to blogging my SQLite 101 for a bit yet&#8230;)</p>
<p>Note that the following tools are Unix tools &#8211; so they work on Linux and on a Mac, but probably not on Windows unless you install a unix tools package (such as <a href="http://gnuwin32.sourceforge.net/" onclick="urchinTracker('/outgoing/gnuwin32.sourceforge.net/?referer=');">GnuWin</a> &#8211; <em>coreutils</em> and <em>sed</em>, which look good for starters&#8230;). Another alternative would be to download the <a href="http://susegallery.com/a/RQrRBY/data-journalism-developer-studio--2" onclick="urchinTracker('/outgoing/susegallery.com/a/RQrRBY/data-journalism-developer-studio--2?referer=');">Data Journalism Developer Studio</a> and run it either as a bootable CD/DVD, or as a virtual machine using something like <a href="http://www.vmware.com/" onclick="urchinTracker('/outgoing/www.vmware.com/?referer=');">VMWare</a> or <a href="http://www.virtualbox.org/" onclick="urchinTracker('/outgoing/www.virtualbox.org/?referer=');">VirtualBox</a>.</p>
<p>All the tools below are related to the basic mechanics of wrangling with text files, which include CSV (comma separated) and TSV (tab separated) files. Your average unix jockey will look at you with sympathetic eyes if you rave bout them, but for us mere mortals, they may make life easier for you than you ever thought possible&#8230;</p>
<p><em>[If you know of simple tricks in the style of what follows that I haven't included here, please feel free to add them in as a comment, and I'll maybe try to work then into a continual updating of this post...]</em></p>
<p>If you want to play along, why not check out this <a href="http://openurl.ac.uk/doc/data/data.html" onclick="urchinTracker('/outgoing/openurl.ac.uk/doc/data/data.html?referer=');">openurl data from EDINA</a> (<a href="http://openurl.ac.uk/doc/data/sample.html" onclick="urchinTracker('/outgoing/openurl.ac.uk/doc/data/sample.html?referer=');">data sample</a>; a more comprehensive set is also available if you&#8217;re feeling brave: <a href="http://openurl.ac.uk/doc/data/thedata.html" onclick="urchinTracker('/outgoing/openurl.ac.uk/doc/data/thedata.html?referer=');">monthly openurl data</a>).</p>
<p>So let&#8217;s start at the beginning and imagine your faced with a large CSV file &#8211; 10MB, 50MB, 100MB, 200MB large &#8211; and when you try to open it in your text editor (the file&#8217;s too big for Google spreadsheets and maybe even for Google Fusion tables) the whole thing just grinds to a halt, if doesn&#8217;t actually fall over.</p>
<p>What to do?</p>
<p>To begin with, you may want to take a deep breath and find out just what sort of beast you have to contend with. You know the file size, but what else might you learn? (I&#8217;m assuming the file has a csv suffix, <em>L2sample.csv</em> say, so for starters we&#8217;re assuming it&#8217;s a text file&#8230;)</p>
<p>The <tt>wc</tt> (word count) command is a handy little tool that will give you a quick overview of how many rows there are in the file:</p>
<p><tt>wc -l L2sample.csv</tt></p>
<p>I get the response <em>101 L2sample.csv</em>, so there are presumably 100 data rows and 1 header row.</p>
<p>We can learn a little more by taking the <tt>-l</tt> linecount switch off, and getting a report back on the number of words and characters in the file as well:</p>
<p><tt>wc L2sample.csv</tt></p>
<p>Another thing that you might consider doing is just having a look at the structure of the file, by sampling the first few rows of it and having a peek at them. The <tt>head</tt> command can help you here.</p>
<p><tt>head L2sample.csv</tt></p>
<p>By default, it returns the first 10 rows of the file. IF we want to change the number of rows displayed, we can use the <tt>-n</tt> switch:</p>
<p><tt>head -n 4 L2sample.csv</tt></p>
<p>As well as the <tt>head</tt> command, there is the <tt>tail</tt> command; this can be used to peek at the lines at the end of the file:</p>
<p><tt>tail L2sample.csv<br />
tail -n 15 L2sample.csv</tt></p>
<p>When I look at the rows, I see they have the form:</p>
<pre>logDate	logTime	encryptedUserIP	institutionResolverID	routerRedirectIdentifier ...
2011-04-04	00:00:03	kJJNjAytJ2eWV+pjbvbZTkJ19bk	715781	ukfed ...
2011-04-04	00:00:14	/DAGaS+tZQBzlje5FKsazNp2lhw	289516	wayf ...
2011-04-04	00:00:15	NJIy8xkJ6kHfW74zd8nU9HJ60Bc	569773	athens ...</pre>
<p>So, not <em>comma</em> separated then; <em>tab</em> separated&#8230;;-)</p>
<p>If you were to upload a tab separated file to something like Google Fusion Tables, which I think currently only parses CSV text files for some reason, it will happily spend the time uploading the data &#8211; and then shove it into a single column.</p>
<p><em>I&#8217;m not sure if there are column splitting tools available in Fusion Tables &#8211; there weren&#8217;t last time I looked, though maybe we might expect a fuller range of import tools to appear at some point; many applications that accept text based data files allow you to specify the separator type, as for example in Google spreadsheets:</p>
<p><img src="http://ouseful.files.wordpress.com/2011/06/googspreadsheetimportdialogue.png?w=345&#038;h=360" alt="" title="googspreadsheetimportdialogue" width="345" height="360" class="alignnone size-full wp-image-5582" /></p>
<p>I&#8217;m personally living in hope that some sort of integration with the <a href="http://code.google.com/p/google-refine/" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/?referer=');">Google Refine data cleaning tool</a> will appear one day&#8230;</em></p>
<p>If you want to take a sample of a large data file and put into another smaller file that you can play with or try things out with, the <tt>head</tt> (or <tt>tail</tt>) tool provides one way of doing that thanks to the magic of Unix <em>redirection</em> (which you might like to think of as a &#8220;pipe&#8221;, although that has a slightly different meaning in Unix land&#8230;). The words/jargon may sound confusing, and the syntax may look cryptic, but the effect is really powerful: <strong>take the output from a command and shove it into a file.</strong></p>
<p>So, given a CSV file with a million rows, suppose we want to run a few tests in an application using a couple of hundred rows. <em>This trick will help you generate the file containing the couple of hundred rows.</em></p>
<p>Here&#8217;s an example using <em>L2sample.csv</em> &#8211; we&#8217;ll create a file containing the first 20 rows, plus the header row:</p>
<p><tt>head -n 21 L2sample.csv <strong>&gt;</strong> subSample.csv</tt></p>
<p>See the <strong>&gt;</strong> sign? That says &#8220;take the output from the command on the left, and shove it into the file on the right&#8221;. (Note that if <em>subSample.csv</em> already exists, it will be overwritten, and you will lose the original.)</p>
<p>There&#8217;s probably a better way of doing this, but if you want to generate a CSV file (with headers) containing the last 10 rows, for example, of a file, you can use the <em>cat</em> command to join a file containing the headers with a file containing the last 10 rows:</p>
<p><tt>head -n 1 L2sample.csv &gt; headers.csv<br />
tail -n 20 L2sample.csv &gt; subSample.csv<br />
<strong>cat</strong> headers.csv subSample.csv &gt; subSampleWithHeaders.csv</tt></p>
<p>(Note: don&#8217;t try to <em>cat</em> a file into itself, or Ouroboros may come calling&#8230;)</p>
<p>Another very powerful concept from the Unix command line is the notion of <strong>|</strong> (the <em>pipe</em>). This lets you take the output from one command and direct it to another command (rather than directing it into a file, as &gt; does). So for example, if we want to extract rows 10 to 15 from a file, we can use <em>head</em> to grab the first 15 rows, then <em>tail</em> to grab the last 6 rows of those 15 rows (count them: 10, 11, 12, 13, 14, 15):</p>
<p><tt>head -n 15 L2sample.csv | tail -n 6 &gt; middleSample.csv</tt></p>
<p>Try to read in as an English phrase (the | and &gt; are punctuation): <em>take the the first [<strong>head</strong>] 15 rows [<strong>-n 15</strong>] of the file <strong>L2sample.csv</strong> and use them as input [<strong>|</strong>] to the <tt>tail</tt> command; take the last [<strong>tail</strong>] 6 lines [<strong>-n 6</strong>] of the input data and save them [<strong>&gt;</strong>] as the file <strong>middleSample.csv</strong></em>.</p>
<p>If we want to add in the headers, we can use the <em>cat</em> command:</p>
<p><tt>cat headers.csv middleSample.csv &gt; middleSampleWithHeaders.csv</tt></p>
<p>We can use a pipe to join all sorts of commands. If our file only uses a single word for each column header, we can count the number of columns (single words) by grabbing the header row and sending it to <tt>wc</tt>, which will count the words for us:</p>
<p><tt>head -n 1 L2sample.csv | wc</tt></p>
<p>(Take the first row of L2sample.csv and count the lines/words/characters. If there is one word per column header, the word count gives us the column count&#8230;;-)</p>
<p>Sometimes we just want to split a big file into a set of smaller files. The <tt>split</tt> command is our frind here, and lets us split a file into smaller files containing up to a know number of rows/lines:</p>
<p><tt>split -l 15 L2sample.csv subSamples</tt></p>
<p>This will generate a series of files named <em>subSamples<strong>aa</strong></em>, <em>subSamples<strong>ab</strong></em>, &#8230;, each containing 15 lines (except for the last one, which may contain less&#8230;).</p>
<p>Note that the first file will contain the header and 14 data rows, and the other files will contain 15 data rows but no column headings. To get round this, you might want to <em>split</em> on a file that doesn&#8217;t contain the header. (So maybe use <em>wc -l</em> to find the number of rows in the original file, create a header free version of the data by using <em>tail</em> on one less than the number of rows in the file, then <em>split</em> the header free version. You might then one to use <em>cat</em> to put the header back in to each of the smaller files&#8230;)</p>
<p>A couple of other Unix text processing tools let us use a CSV file as a crude database. The <tt>grep</tt> searches a file for a particular term <em>or text pattern</em> (known as a regular expression, which I&#8217;m not going to cover much in this post&#8230; suffice to note for now that you can do real text processing voodoo magic with regular expressions&#8230;;-)</p>
<p>So for example, in out test file, I can search for rows that contain the word <em>mendeley</em></p>
<p><tt>grep mendeley L2sample.csv</tt></p>
<p>We can also redirect the output into a file:</p>
<p><tt>grep EBSCO L2sample.csv &gt; rowsContainingEBSCO.csv</tt></p>
<p>If the text file contains columns that are separated by a unique delimiter (that is, some symbol that is <em>only</em> ever used to separate the columns), we can use the <tt>cut</tt> command to just pull out particular columns. The cut command assumes a tab delimiter (we can specify other delimiters explicitly if we need to), so we can use it on our testfile to pull out data from the third column in our test file:</p>
<p><tt><strong>cut -f 3</strong> L2sample.csv</tt></p>
<p>We can also pull out multiple columns and save them in a file:</p>
<p><tt><strong>cut -f 1,2,14,17</strong> L2sample.csv &gt; columnSample.csv</tt></p>
<p>If you pull out just a single column, you can sort the entries to see what different entries are included in the column using the <tt>sort</tt> command:</p>
<p><tt>cut -f 40 L2sample.csv | sort</tt></p>
<p>(Take column 40 of the file L2sample.csv and sort the items.)</p>
<p>We can also take this sorted list and identify the unique entries using the <tt>uniq</tt> command; so here are the different entries in column 40 of our test file:</p>
<p><tt>cut -f 40 L2sample.csv | sort | uniq</tt></p>
<p>(Take column 40 of the file L2sample.csv, sort the items, and display the unique values.)</p>
<p>(The <tt>uniq</tt> command appears to make comparaisons between consecutive lines, hence the nee to sort first.)</p>
<p>The <tt>uniq</tt> command will also count the repeat occurrence of unique entries if we ask it nicely (<tt>-c</tt>):</p>
<p><tt>cut -f 40 L2sample.csv | sort | uniq <strong>-c</strong></tt></p>
<p>(Take column 40 of the file L2sample.csv, sort the items, and display the unique values along with how many times they appear in the column as a whole.)</p>
<p>The final command I&#8217;m going to mention here is magic search and replace operator called <tt>sed</tt>. I&#8217;m aware that this post is already over long, so I&#8217;ll maybe return to this in a later post, aside from giving you a tease of scome scarey voodoo&#8230; how to convert a tab delimited file to a comma separated file. One recipe is given by Kevin Ashley as follows:</p>
<p><a href="https://twitter.com/#!/kevingashley/statuses/76274081737084928" onclick="urchinTracker('/outgoing/twitter.com/_/kevingashley/statuses/76274081737084928?referer=');"><img src="http://ouseful.files.wordpress.com/2011/06/tab2csvtweet.png?w=609&#038;h=397" alt="" title="tab2csvtweet" width="609" height="397" class="alignnone size-full wp-image-5583" /></a></p>
<p><tt>sed 's/"/\\\"/g; s/^/"/; s/$/"/; s/<em>ctrl-V&lt;TAB&gt;</em>/","/g;' origFile.tsv &gt; newFile.csv</tt></p>
<p>(See also this related question on #getTheData: <a href="http://getthedata.org/questions/642/converting-large-ish-tab-separated-files-to-csv" onclick="urchinTracker('/outgoing/getthedata.org/questions/642/converting-large-ish-tab-separated-files-to-csv?referer=');">Converting large-ish tab separated files to CSV</a>.)</p>
<p>Note: if you have a small amount of text and need to wrangle it on some way, the <a href="http://textmechanic.com/" onclick="urchinTracker('/outgoing/textmechanic.com/?referer=');">Text Mechanic</a> site might have what you need&#8230;</p>
<p>This lecture note on <a href="http://www.ling.upenn.edu/courses/Spring_2003/ling538/Lecnotes/UnixTools.html" onclick="urchinTracker('/outgoing/www.ling.upenn.edu/courses/Spring_2003/ling538/Lecnotes/UnixTools.html?referer=');">Unix Tools</a> provides a really handy cribsheet of Unix command line text wrangling tools, though the syntax does appear to work for me using some of the commands as given their (the important thing is the <em>idea</em> of what&#8217;s possible&#8230;).</p>
<p>If you&#8217;re looking for regular expression helpers (I haven&#8217;t really mentioned these at all in this post, suffice to say they&#8217;re a mechanism for doing pattern based search and replace, and which in the right hands can look like real voodoo text processing magic!), check out <a href="http://txt2re.com/" onclick="urchinTracker('/outgoing/txt2re.com/?referer=');">txt2re</a> and <a href="http://regexpal.com/" onclick="urchinTracker('/outgoing/regexpal.com/?referer=');">Regexpal</a> (<a href="http://blog.stevenlevithan.com/archives/regexpal" onclick="urchinTracker('/outgoing/blog.stevenlevithan.com/archives/regexpal?referer=');">about regexpal</a>).</p>
<p>TO DO: this is a biggie &#8211; the <em>join</em> command will join rows from two files with common elements in specified columns. I canlt get it working properly with my test files, so I&#8217;m not blogging it just yet, but here&#8217;s a starter for 10 if you want to try&#8230; <a href="http://www.albany.edu/~ig4895/join.htm" onclick="urchinTracker('/outgoing/www.albany.edu/_ig4895/join.htm?referer=');">Unix <em>join</em> examples</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5579/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5579/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5579/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5579/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5579/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5579/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5579/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5579/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5579/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5579/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5579/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5579/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5579/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5579/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5579/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5579/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5579/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5579/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5579/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5579/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5579/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5579&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/06/03/postcards-from-a-text-processing-excursion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/06/googspreadsheetimportdialogue.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/06/tab2csvtweet.png" length="" type="" />
		</item>
		<item>
		<title>Tech Tips: Making Sense of JSON Strings – Follow the Structure</title>
		<link>http://blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/</link>
		<comments>http://blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 10:25:27 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[tony hirst]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5252</guid>
		<description><![CDATA[Reading through the Online Journalism blog post on Getting full addresses for data from an FOI response (using APIs), the following phrase &#8211; relating to the composition of some Google Refine code to parse a JSON string from the Google geocoding API &#8211; jumped out at me: &#8220;This took a bit of trial and error&#8230;&#8221; [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5252&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Reading through the Online Journalism blog post on <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/">Getting full addresses for data from an FOI response (using APIs)</a>, the following phrase &#8211; relating to the composition of some Google Refine code to parse a JSON string from the Google geocoding API &#8211; jumped out at me: &#8220;This took a bit of trial and error&#8230;&#8221;</p>
<p><a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/"><img src="http://ouseful.files.wordpress.com/2011/04/google-refnie-took-a-bit-of-trial-and-error.png?w=700&#038;h=358" alt="" width="700" height="358" class="alignnone size-full wp-image-5255" /></a></p>
<p>Why? Two reasons&#8230; Firstly, because it demonstrates a &#8220;have a go&#8221; attitude which you absolutely need to have if you&#8217;re going to appropriate technology and turn it to your own purposes. Secondly, because it maybe (or maybe not&#8230;) hints at a missed trick or two&#8230;</p>
<p>So what trick&#8217;s missing?</p>
<p>Here&#8217;s <a href="http://maps.googleapis.com/maps/api/geocode/json?sensor=false&amp;address=mk7%206aa,uk" onclick="urchinTracker('/outgoing/maps.googleapis.com/maps/api/geocode/json?sensor=false_amp_address=mk7_206aa_uk&amp;referer=');">an example</a> of the sort of thing you get back from the Google Geocoder:</p>
<blockquote><p><em>{  &#8220;status&#8221;: &#8220;OK&#8221;, &#8220;results&#8221;: [ { "types": [ "postal_code" ], &#8220;formatted_address&#8221;: &#8220;Milton Keynes, Buckinghamshire MK7 6AA, UK&#8221;, &#8220;address_components&#8221;: [ { "long_name": "MK7 6AA", "short_name": "MK7 6AA", "types": [ "postal_code" ] }, { &#8220;long_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;short_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;types&#8221;: [ "locality", "political" ] }, { &#8220;long_name&#8221;: &#8220;Buckinghamshire&#8221;, &#8220;short_name&#8221;: &#8220;Buckinghamshire&#8221;, &#8220;types&#8221;: [ "administrative_area_level_2", "political" ] }, { &#8220;long_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;short_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;types&#8221;: [ "administrative_area_level_2", "political" ] }, { &#8220;long_name&#8221;: &#8220;United Kingdom&#8221;, &#8220;short_name&#8221;: &#8220;GB&#8221;, &#8220;types&#8221;: [ "country", "political" ] }, { &#8220;long_name&#8221;: &#8220;MK7&#8243;, &#8220;short_name&#8221;: &#8220;MK7&#8243;, &#8220;types&#8221;: [ "postal_code_prefix", "postal_code" ] } ], &#8220;geometry&#8221;: { &#8220;location&#8221;: {  &#8220;lat&#8221;: 52.0249136,  &#8220;lng&#8221;: -0.7097474 }, &#8220;location_type&#8221;: &#8220;APPROXIMATE&#8221;, &#8220;viewport&#8221;: {  &#8220;southwest&#8221;: { &#8220;lat&#8221;: 52.0193722, &#8220;lng&#8221;: -0.7161451  },  &#8220;northeast&#8221;: { &#8220;lat&#8221;: 52.0300728, &#8220;lng&#8221;: -0.6977000  } }, &#8220;bounds&#8221;: {  &#8220;southwest&#8221;: { &#8220;lat&#8221;: 52.0193722, &#8220;lng&#8221;: -0.7161451  },  &#8220;northeast&#8221;: { &#8220;lat&#8221;: 52.0300728, &#8220;lng&#8221;: -0.6977000  } } }  } ] }</em></p></blockquote>
<p>The data represents a Javascript object (JSON = JavaScript Object Notation) and as such has a standard form, a hierarchical form.</p>
<p>Here&#8217;s another way of writing the <em>same</em> object code, only this time laid out in a way that reveals the structure of the object:</p>
<pre>{
  &quot;status&quot;: &quot;OK&quot;,
  &quot;results&quot;: [ {
    &quot;types&quot;: [ &quot;postal_code&quot; ],
    &quot;formatted_address&quot;: &quot;Milton Keynes, Buckinghamshire MK7 6AA, UK&quot;,
    &quot;address_components&quot;: [ {
      &quot;long_name&quot;: &quot;MK7 6AA&quot;,
      &quot;short_name&quot;: &quot;MK7 6AA&quot;,
      &quot;types&quot;: [ &quot;postal_code&quot; ]
    }, {
      &quot;long_name&quot;: &quot;Milton Keynes&quot;,
      &quot;short_name&quot;: &quot;Milton Keynes&quot;,
      &quot;types&quot;: [ &quot;locality&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;Buckinghamshire&quot;,
      &quot;short_name&quot;: &quot;Buckinghamshire&quot;,
      &quot;types&quot;: [ &quot;administrative_area_level_2&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;Milton Keynes&quot;,
      &quot;short_name&quot;: &quot;Milton Keynes&quot;,
      &quot;types&quot;: [ &quot;administrative_area_level_2&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;United Kingdom&quot;,
      &quot;short_name&quot;: &quot;GB&quot;,
      &quot;types&quot;: [ &quot;country&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;MK7&quot;,
      &quot;short_name&quot;: &quot;MK7&quot;,
      &quot;types&quot;: [ &quot;postal_code_prefix&quot;, &quot;postal_code&quot; ]
    } ],
    &quot;geometry&quot;: {
      &quot;location&quot;: {
        &quot;lat&quot;: 52.0249136,
        &quot;lng&quot;: -0.7097474
      },
      &quot;location_type&quot;: &quot;APPROXIMATE&quot;,
      &quot;viewport&quot;: {
        &quot;southwest&quot;: {
          &quot;lat&quot;: 52.0193722,
          &quot;lng&quot;: -0.7161451
        },
        &quot;northeast&quot;: {
          &quot;lat&quot;: 52.0300728,
          &quot;lng&quot;: -0.6977000
        }
      },
      &quot;bounds&quot;: {
        &quot;southwest&quot;: {
          &quot;lat&quot;: 52.0193722,
          &quot;lng&quot;: -0.7161451
        },
        &quot;northeast&quot;: {
          &quot;lat&quot;: 52.0300728,
          &quot;lng&quot;: -0.6977000
        }
      }
    }
  } ]
}</pre>
<h2>Making Sense of the Notation</h2>
<p>At its simplest, the structure has the form: {&#8220;attribute&#8221;:&#8221;value&#8221;}</p>
<p>If we parse this object into the <em>jsonObject</em>, we can access the value of the attribute as <em>jsonObject.attribute</em> or <em>jsonObject["attribute"]</em>. The first style of notation is called a <em>dot notation</em>.</p>
<p>We can add more attribute:value pairs into the object by separating them with commas: <em>a={&#8220;attr&#8221;:&#8221;val&#8221;,&#8221;attr2&#8243;:&#8221;val2&#8243;}</em>  and address them (that is, refer to them) uniquely: <em>a.attr</em>, for example, or <em>a["attr2"]</em>.</p>
<p>Try it out for yourself&#8230; Copy and past the following into your browser address bar (where the URL goes) and hit return (i.e. &#8220;go to&#8221; that &#8220;location&#8221;):</p>
<p><tt>javascript:a={"attr":"val","attr2":"val2"}; alert(a.attr);alert(a["attr2"])</tt></p>
<p>(As an aside, what might you learn from this? Firstly, you can &#8220;run&#8221; javascript in the browser via the location bar. Secondly, the javascript command <em>alert()</em> pops up an alert box:-)</p>
<p>Note that the value of an attribute might be another object.</p>
<p><em>obj={ attrWithObjectValue: { &#8220;childObjAttr&#8221;:&#8221;foo&#8221; } }</em></p>
<p>Another thing we can see in the Google geocoder JSON code are square brackets. These define an <em>array</em> (one might also think of it as an ordered list). Items in the list are address numerically. So for example, given:</p>
<p><em>arr[ "item1", "item2", "item3" ]</em></p>
<p>we can locate &#8220;item1&#8243; as <em>arr[0]</em> and &#8220;item3&#8243; as <em>arr[2]</em>. (Note: the index count in the square brackets starts at 0.) Try it in the browser&#8230; (for example, <tt>javascript:list=["apples","bananas","pears"]; alert( list[1] );</tt>).</p>
<p>Arrays can contain objects too:</p>
<p><em>list=[ "item1", {"innerObjectAttr":"innerObjVal"  } ]</em></p>
<p>Can you guess how to get to the <em>innerObjVal</em>? Try this in the browser location bar:</p>
<p><tt>javascript: list=[ "item1", { "innerObjectAttr":"innerObjVal"  } ]; alert( list[1].innerObjectAttr )</tt></p>
<h2>Making Life Easier</h2>
<p>Hopefully, you&#8217;ll now have a sense that there&#8217;s structure in a JSON object, and that that (<em>sic</em>) structure is what we rely on if we want to cut down on the &#8220;trial an error&#8221; when parsing such things. To make life easier, we can also use &#8220;tree widgets&#8221; to display the hierarchical JSON object in a way that makes it far easier to see how to construct the dotted path that leads to the data value we want.</p>
<p>A tool I have appropriated for previewing JSON objects is <a href="http://pipes.yahoo.com" onclick="urchinTracker('/outgoing/pipes.yahoo.com?referer=');">Yahoo Pipes</a>. Rather than necessarily using Pipes to build anything, I simply make use of it as a JSON viewer, loading JSON into the pipe from a URL via the <em>Fetch Data</em> block, and then previewing the result:</p>
<p><a href="http://pipes.yahoo.com" onclick="urchinTracker('/outgoing/pipes.yahoo.com?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/yahoo-pipes-as-a-json-previewer1.png?w=700&#038;h=455" alt="" width="700" height="455" class="alignnone size-full wp-image-5259" /></a></p>
<p>Another tool (and one I&#8217;ve just discovered) is an Air application called <a href="http://code.google.com/p/json-pad/" onclick="urchinTracker('/outgoing/code.google.com/p/json-pad/?referer=');">JSON-Pad</a>. You can paste in JSON code, or pull it in from a URL, and then preview it again via a tree widget:</p>
<p><a href="http://code.google.com/p/json-pad/" onclick="urchinTracker('/outgoing/code.google.com/p/json-pad/?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/json-pad.png?w=652&#038;h=693" alt="" width="652" height="693" class="alignnone size-full wp-image-5257" /></a></p>
<p>Clicking on one of the results in the tree widget provides a crib to the path&#8230;</p>
<h2>Summary</h2>
<p>Getting to grips with writing addresses into JSON objects helps if you have some idea of the structure of a JSON object. Tree viewers make the structure of an object explicit. By walking down the tree to the part of it you want, and &#8220;dotting&#8221; together* the nodes/attributes you select as you do so, you can quickly and easily construct the path you need.</p>
<p>* If the JSON attributes have spaces or non-alphanumeric characters in them, use the <em>obj["attr"]</em> notation rather than the dotted <em>obj.attr</em> notation&#8230;</p>
<p>PS Via my feeds today, though something I had bookmarked already, this <a href="http://www.shancarter.com/data_converter/index.html" onclick="urchinTracker('/outgoing/www.shancarter.com/data_converter/index.html?referer=');">Data Converter</a> tool may be helpful in going the other way&#8230; (Disclaimer: I haven&#8217;t tried using it&#8230;)</p>
<p><a href="http://www.shancarter.com/data_converter/index.html" onclick="urchinTracker('/outgoing/www.shancarter.com/data_converter/index.html?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/data-converter.png?w=700&#038;h=313" alt="" width="700" height="313" class="alignnone size-full wp-image-5260" /></a></p>
<p>If you know of any other related tools, please feel free to post a link to them in the comments:-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5252/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5252&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ouseful.files.wordpress.com/2011/04/yahoo-pipes-as-a-json-previewer1.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/json-pad.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/data-converter.png" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/google-refnie-took-a-bit-of-trial-and-error.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/google-refnie-took-a-bit-of-trial-and-error.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/yahoo-pipes-as-a-json-previewer1.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/data-converter.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/json-pad.png" length="" type="" />
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>How to create basic mashups with Yahoo! Pipes</title>
		<link>http://onlinejournalismblog.com/2008/07/16/how-to-create-basic-mashups-with-yahoo-pipes/</link>
		<comments>http://onlinejournalismblog.com/2008/07/16/how-to-create-basic-mashups-with-yahoo-pipes/#comments</comments>
		<pubDate>Wed, 16 Jul 2008 11:49:57 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[mashup]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[Yahoo! Pipes]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=1175</guid>
		<description><![CDATA[I&#8217;ve blogged previously about what Yahoo! Pipes can do. The following describes some of the basic mashups you can create with Yahoo! Pipes &#8211; please add your own tips in the comments. Signing up First you’ll need to go to pipes.yahoo.com and register with the page. If you already have a Yahoo! or Flickr account you may be able to<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2008/07/16/how-to-create-basic-mashups-with-yahoo-pipes/">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2008%2F07%2F16%2Fhow-to-create-basic-mashups-with-yahoo-pipes%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2008_2F07_2F16_2Fhow-to-create-basic-mashups-with-yahoo-pipes_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2008%2F07%2F16%2Fhow-to-create-basic-mashups-with-yahoo-pipes%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p class="MsoNormal">I&#8217;ve <a href="http://onlinejournalismblog.com/2008/04/25/something-for-the-weekend-6-mashups-with-yahoo-pipes/">blogged previously</a> about what <a class="zem_slink" title="Yahoo! Pipes" rel="homepage" href="http://pipes.yahoo.com/" onclick="urchinTracker('/outgoing/pipes.yahoo.com/?referer=');">Yahoo! Pipes</a> can do. The following describes some of the basic mashups you can create with Yahoo! Pipes &#8211; please add your own tips in the comments.<span id="more-1689"></span></p>
<h2>Signing up</h2>
<p class="MsoNormal">First you’ll need to go to pipes.yahoo.com and register with the page. If you already have a Yahoo! or <a class="zem_slink" title="Flickr" rel="homepage" href="http://www.flickr.com/" onclick="urchinTracker('/outgoing/www.flickr.com/?referer=');">Flickr</a> account you may be able to use that.</p>
<h2>Aggregating feeds into one</h2>
<ol style="margin-top: 0cm" type="1">
<li class="MsoNormal">Log on      to Yahoo! Pipes and click on <strong>Create Pipe. </strong>You should be presented      with a ‘graph’-style page.</li>
<li class="MsoNormal">On the      left column are a number of buttons – called <strong>modules</strong>. These are      arranged within different categories, the first category being Sources. In      the Sources category should be a module called <strong>Fetch Feed. </strong>Click      and drag this onto the graphed area.</li>
<li class="MsoNormal">You      will need to copy the URL of the RSS feed you want to fetch and paste      it into the <strong>Fetch Feed </strong>module input box.</li>
<li class="MsoNormal">To add      extra feeds click on the plus (+) icon next to URL and further      input boxes will appear. Paste the extra feeds into each new box.</li>
<li class="MsoNormal">Finally,      you need to connect the <strong>Fetch Feed </strong>module to the Pipe Output. To do this      click on the circle at the bottom of the Fetch Feed module and drag      it to the circle at the top of <strong>Pipe Output. </strong>You should now see a      pipe appear connecting the two.</li>
<li class="MsoNormal">Click      on Pipe Output to see the results at the bottom of the screen.</li>
<li class="MsoNormal">That’s      it. Click <strong>Save </strong>(top right), give the pipe a name, then click <strong>Run      Pipe… </strong>at the top of the screen. Note: the results may be displayed as      images – click <strong>List </strong>to see the text version.</li>
<li class="MsoNormal">Along      the top of the results you will see various options. <strong>Click on the RSS      symbol </strong>to get the option to have the output of this pipe as a      standalone RSS feed. You can also click <strong>Get as a Badge </strong>to get some      HTML to put on your blog and display results more attractively.</li>
</ol>
<p class="MsoNormal">Note: with more than one feed Pipes will &#8216;cluster&#8217; them together by feed rather than by date. To order the results by date, use the <strong>Sort </strong>module under the <em>Operators </em>category, connect it to Fetch Feed, and sort by <em>item.pubDate in descending order</em>.</p>
<p class="MsoNormal">If you want to aggregate feeds after filtering, etc. you can use the <strong>Union </strong>module under <em>Operators </em>category.</p>
<p class="MsoNormal">
<h2>Filtering feeds</h2>
<ol style="margin-top: 0cm" type="1">
<li class="MsoNormal">Follow      steps 1-4 for <em>Aggregating feeds</em>,      above.</li>
<li class="MsoNormal">On the      left column are a number of buttons – called <strong>modules. </strong>These are      arranged within different categories. Expand the Operators category. There should be a module called <strong>Filter. </strong>Click and drag this onto the graphed area.</li>
<li class="MsoNormal">You      need to connect the <strong>Fetch Feed </strong>module to the <strong>Filter </strong>module. To do this      click on the circle at the bottom of the Fetch Feed module and drag      it to the circle at the top of Filter. You should now see a pipe      appear connecting the two.</li>
<li class="MsoNormal">Using      the settings in the <strong>Filter </strong>module you can choose to filter the aggregated feed by blocking items      (posts) with certain words, or only allowing items with certain words to      come through.</li>
<li class="MsoNormal">You      will then need to choose from the drop-down menu which field (e.g. ‘title’      or ‘category’) you want to be the subject of the filter.</li>
<li class="MsoNormal">Finally,      you need to connect the <strong>Filter </strong>module      to the Pipe Output. To do this click on the circle at the bottom of the <strong>Filter </strong>module and drag it      to the circle at the top of <strong>Pipe Output. </strong>You should now see a pipe      appear connecting the two.</li>
<li class="MsoNormal">Follow      steps 6-8 for <em>Aggregating feeds</em>,      above to finish.</li>
</ol>
<p class="MsoNormal">Note: you can use the <strong>Unique </strong>module instead to filter out multiple versions of the same post (e.g. when you’re using feeds from search results on different engines)</p>
<p class="MsoNormal">
<h2>Translating feeds</h2>
<p>The translation tool in Pipes (BabelFish) is pretty clumsy, and you couldn&#8217;t rely on it to produce a clear and understandable feed for readers &#8211; but it may be useful to highlight leads in other countries that you otherwise wouldn&#8217;t see, and which you can then follow up.</p>
<ol style="margin-top: 0cm" type="1">
<li class="MsoNormal">Follow      steps 1-4 for <em>Aggregating feeds</em>,      above.</li>
<li class="MsoNormal">On the      left column are a number of buttons – called <strong>modules. </strong>These are      arranged within different categories. Expand the Deprecated category. There should be a module called <strong>BabelFish. </strong>Click and drag this onto the graphed area.</li>
<li class="MsoNormal">You      need to connect the <strong>Fetch Feed </strong>module to the <strong>BabelFish </strong>module.      To do this click on the circle at the bottom of the <strong>Fetch Feed </strong>module and drag it to the circle at the top of <strong>BabelFish. </strong>You should now see a pipe appear connecting the two.</li>
<li class="MsoNormal">Using      the settings in the <strong>BabelFish </strong>module you can choose to translate the      feed from one language to another.</li>
<li class="MsoNormal">Finally,      you need to connect the <strong>BabelFish </strong>module to the <strong>Pipe Output. </strong>To do      this click on the circle at the bottom of the <strong>BabelFish </strong>module and drag      it to the circle at the top of <strong>Pipe Output. </strong>You should now see a      pipe appear connecting the two.</li>
<li class="MsoNormal">Follow      steps 6-8 for <em>Aggregating feeds</em>,      above to finish.</li>
</ol>
<p class="MsoNormal">
<h2>Tips and tools</h2>
<ul style="margin-top: 0cm" type="disc">
<li class="MsoNormal">If the       website you want to use doesn’t have an RSS feed you may be able to       create one using a service like Page 2 RSS (<a title="blocked::http://page2rss.com/" href="http://page2rss.com/" onclick="urchinTracker('/outgoing/page2rss.com/?referer=');">http://page2rss.com/</a>).</li>
<li class="MsoNormal">Pipes       has a specific module to help you pull images from Flickr (under Sources).</li>
<li class="MsoNormal">The Feed       Auto-Discovery module allows you to just input the URL of the       website, not the RSS feed itself, and will automatically find feeds. This       may pull in comment and other feeds, however.</li>
<li class="MsoNormal">You       can always search everyone’s pipes to find something that does what you       want to do – and clone it to       adapt it accordingly.</li>
<li class="MsoNormal">Although       Pipes allows you to create email alerts and widgets, <a href="http://Feedburner.com" onclick="urchinTracker('/outgoing/Feedburner.com?referer=');">Feedburner.com</a> is even better at doing the same thing – and       will tell you who has subscribed, etc.</li>
</ul>
<h3>If you liked this&#8230;</h3>
<p><em><a href="http://del.icio.us/paulb/yahoopipes" onclick="urchinTracker('/outgoing/del.icio.us/paulb/yahoopipes?referer=');">Webpages about Yahoo! Pipes I’ve bookmarked</a></em></p>
<p class="MsoNormal">
<p class="MsoNormal">
<div class="zemanta-pixie" style="margin-top: 10px;height: 15px"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/b055a714-ba81-45d6-bc56-4945fefcac35/" onclick="urchinTracker('/outgoing/reblog.zemanta.com/zemified/b055a714-ba81-45d6-bc56-4945fefcac35/?referer=');"><img class="zemanta-pixie-img" style="border: medium none;float: right" src="http://img.zemanta.com/reblog_e.png?x-id=b055a714-ba81-45d6-bc56-4945fefcac35" alt="Zemanta Pixie" /></a></div>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2008%2F07%2F16%2Fhow-to-create-basic-mashups-with-yahoo-pipes%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2008/07/16/how-to-create-basic-mashups-with-yahoo-pipes/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
	</channel>
</rss>

