Tag Archives: google docs

How to collaborate (or crowdsource) by combining Delicious and Google Docs

RSS girl by HeatherWeaver on Flickr

During some training in open data I was doing recently, I ended up explaining (it’s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is.

In a Google Docs spreadsheet the formula =importfeed will pull information from an RSS feed and put it into that spreadsheet. Titles, links, datestamps and other parts of the feed will each be separated into their own columns.

When combined with Delicious, this can be a useful way to collect together pages that have been bookmarked by a group of people, or any other feed that you want to analyse.

Here’s how you do it: Continue reading →

Data journalism pt4: visualising data – tools and publishing (comments wanted)

27 Replies

This is a draft from a book chapter on data journalism (here are parts 1; two; and three, which looks the charts side of visualisation). I’d really appreciate any additions or comments you can make – particularly around tips and tools.

UPDATE: It has now been published in The Online Journalism Handbook.

Visualisation tools

So if you want to visualise some data or text, how do you do it? Thankfully there are now dozens of free and cheap pieces of software that you can use to quickly turn your tables into charts, graphs and clouds.

The best-known tool for creating word clouds is Wordle (wordle.net). Simply paste a block of text into the site, or the address of an RSS feed, and the site will generate a word cloud whose fonts and colours you can change to your preferences. Similar tools include Tagxedo (tagxedo.com) and Wordlings (http://wordlin.gs), both of which allow you to put your word cloud into a particular shape.

ManyEyes (manyeyes.alphaworks.ibm.com/manyeyes/) also allows you to create word clouds and tag clouds – as well as word trees and phrase nets that allow you to see common phrases. But it is perhaps most useful in allowing you to easily create scattergrams, bar charts, bubble charts and other forms. The site also contains a raft of existing data that you can play with to get a feel for the site. Similar tools that allow access to other data include Factual (factual.com), Swivel (swivel.com)[see comments], Socrata (socrata.com) and Verifiable.com (verifiable.com). And Google Fusion Tables (tables.googlelabs.com) is particularly useful if you want to collaborate on tables of data, as well as offering visualisation options.

More general visualisation tools include widgenie (widgenie.com), iCharts (icharts.net), ChartTool (onlinecharttool.com) and ChartGo (www.chartgo.com). FusionCharts is a piece of visualisation software with a Google Gadget service that publishers may find useful. You can find instructions on how to use it at www.fusioncharts.com/GG/Docs

If you want more control over your visualisation – or want it to update dynamically when the source information is updated, Google Chart Tools (code.google.com/apis/charttools) is worth exploring. This requires some technical knowledge, but there is a lot of guidance and help on the site to get you started quickly.

Tableau Public is a piece of free software you can download (tableausoftware.com/public) with some powerful visualisation options. You will also find visualisation options on spreadsheet applications such as Excel or the free Google Docs spreadsheet service. These are worth exploring as a way to quickly generate charts from your data on the fly.

Publishing your visualisation

There will come a point when you’ve visualised your data and need to publish it somehow. The simplest way to do this is to take an image (screengrab) of the chart or graph. This can be done with a web-based screencapture tool like Kwout (kwout.com), a free desktop application like Skitch (skitch.com) or Jing (jingproject.com), or by simply using the ‘Print Screen’ button on a PC keyboard (cmd+shift+3 on a Mac) and pasting the screengrab into a graphics package such as Photoshop.

The advantage of using a screengrab is that the image can be easily distributed on social networks, image sharing websites (such as Flickr), and blogs – driving traffic to the page on your site where it is explained.

If you are more technically minded, you can instead choose to embed your chart or graph. Many visualisation tools will give you a piece of code which you can copy and paste into the HTML of an article or blog post in the place you wish to display it (this will not work on most third party blog hosting services, such as WordPress.com). One particular advantage of this approach is that the visualisation can update itself if the source data is updated.

Alternatively, an understanding of Javascript can allow you to build ‘progressively enhanced’ charts which allow users to access the original data or see what happens when it is changed.

Showing your raw data

It is generally a good idea to give users access to your raw data alongside its visualisation. This not only allows them to check it against your visualisation but add insights you may not otherwise gain. It is relatively straightforward to publish a spreadsheet online using Google Docs (see the sidebar on publishing a spreadsheet)

SIDEBAR: How to: publish a spreadsheet online

Google Docs (docs.google.com) is a free website which allows you to create and share documents. You can share them via email, by publishing them as a webpage, or by embedding your document in another webpage, such as a blog post. This is how you share a spreadsheet:

Open your spreadsheet in Google Docs. You can upload a spreadsheet into Google Docs if you’ve created it elsewhere – there is a size limit, however, so if you are told the file is too big try removing unnecessary sheets or columns.
Look for the ‘Share’ button (currently in the top right corner) and click on it.
A drop-down menu should appear. Click on ‘Publish as a web page’
A new window should appear asking which sheets you want to publish. Select the sheet you want to publish and click ‘Start publishing’ (you should also make sure ‘Automatically republish when changes are made’ is ticked if you want the public version of the spreadsheet to update with any data you add.)
Now the bottom half of that window – ‘Get a link to the published data’ – should become active. In the bottom box should be a web address where you can now see the public version of your spreadsheet. If you want to share that, copy the address and test that it works in a web browser. You can now link to it from any webpage.
Alternatively, you can embed your spreadsheet – or part of it – in another webpage. To do this click on the first drop-down menu in this area – it will currently say ‘Web page’ – and change it to ‘HTML to embed in a page’. Now the bottom box on this window should show some HTML that begins with
If you want to embed just part of a spreadsheet, in the box that currently says ‘All cells’ type the range of cells you wish to show. For example, typing A1:G10 will select all the cells in your spreadsheet from A1 (the first row of column A) to G10 (the 10th row of column G). Once again, the HTML below will change so that it only displays that section of your spreadsheet.

Once again, I’d welcome any comments on things I may have missed or tips you can add. Part 5, on mashups, is now available here.

Data journalism pt3: visualising data – charts and graphs (comments wanted)

7 Replies

This is a draft from a book chapter on data journalism (the first, on gathering data, is here; the section on interrogating data is here). I’d really appreciate any additions or comments you can make – particularly around considerations in visualisation. A further section on visualisation tools, can be found here.

UPDATE: It has now been published in The Online Journalism Handbook.

“At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers – even a very large set – is to look at pictures of those numbers.” (Edward Tufte, The Visual Display of Quantitative Information, 2001)

Visualisation is the process of giving a graphic form to information which is often otherwise dry or impenetrable. Classic examples of visualisation include turning a table into a bar chart, or a series of percentage values into a pie chart – but the increasing power of both computer analysis and graphic design software have seen the craft of visualisation develop with increasing sophistication. In larger organisations the data journalist may work with a graphic artist to produce an infographic that visualises their story – but in smaller teams, in the initial stages of a story, or when speed is of the essence they are likely to need to use visualisation tools to give form to their data.

Broadly speaking there are two typical reasons for visualising data: to find a story; or to tell one. Quite often, it is both. Continue reading →

MPs expenses data: now it’s The Telegraph’s turn

2 Replies

The Telegraph have finally published their MPs’ expenses data online – and it’s worth the wait. Here are some initial thoughts and reactions:

Firstly, they’ve made user behaviour an editorial feature. In plain English: they’re showing the most searched-for MPs and constituencies, which is not only potentially interesting in itself, but also makes it easier for the majority of users who are making those searches (i.e. they can access it with a click rather than by typing)
There’s also a table for most expensive MPs. As this is going to remain static, it would be good to see a dedicated page with more information – in the same way the paper did in its weekend supplement.
The results page for a particular MP has a search engine-friendly URL. Very often, database-generated pages have poor search engine optimisation, partly because the URLs are full of digits and symbols, and partly because they are dynamically generated. This appears to avoid both problems – the URL for the second home allowance of Khalid Mahmood MP, for example, is http://parliament.telegraph.co.uk/mpsexpenses/second-home/Khalid-Mahmood/mp-11087
The uncensored expenses files themselves are embedded using Issuu. This seems a strange choice as it doesn’t allow users to tag or comment – and the email/embed option is disabled for “secret documents”
There’s some nice subtle animation on the second home part of expenses, and clear visualisation on other parts.
The MP Details page is intelligently related both to the Telegraph site (related articles) and the wider web, with the facility to easily email that MP, go to their Wikipedia entry, and ‘bookmark’.
Joy of joys, you can also download the MPs expenses spreadsheet from here (on Google Docs) – although this is for all MPs rather than the one being viewed. Curiously, while viewing you can see who else is viewing and even (as I did) attempt to chat (no, they didn’t chat back).

I’ll most likely update this post later as I get some details from behind the curtain.

And there are more general thoughts around the online treatment of expenses generally which I’ll try to blog at another point.

Using Google Spreadsheets as a database (no, it really is very interesting, honest)

23 Replies

This post by Tony Hirst should be recommended reading for every journalist interested in the potential of computers for reporting.

Why? Because it shows you how you can use Google spreadsheets to interrogate data as if it was a database; and because it demonstrates the importance of news organisations releasing data to their users.

Put aside any intimidation you might feel at the mention of APIs and query languages. What it boils down to is this: you can alter the web address of a Google spreadsheet to filter the data and find the story.

Simple as that.

Hirst uses the example of the spreadsheet of MPs expenses recently released by The Guardian (they’ve also published Lords expenses). By altering the URLs this is what he generates (I’m quoting his bullet points):

the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);
How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083
So which people did not claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: select B,C,I where I!=23083 (using <> for ‘not equals’ also works); NB here’s a more refined take on that query: select B,C,I where (I!=23083 and I>=0) order by I
search for the name, party (column D) and constituency (column E) of people whose first name is Jane or is recorded as John (rather than “Mr John”, or “Rt Hon John”): select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)
only show the people who have claimed less than £100,000 in total allowances : select * where F<100000
what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): select sum(I)
So how many MPs are there? Count the number of rows in an arbitrary column: select count(I)
Find the average amount claimed by the MPs: select sum(I)/count(I)
Find out how much has been claimed by each party (column D): select D,sum(I) where I>=0 group by D (Setting I>0 just ensures there is something in the column)
For each party, find out how much (on average) each party member claims:select%20D,sum(I)/count(I)%20where%20I%3E=0%20group%20by%20D”>select D,sum(I)/count(I) where I>=0 group by D

OK, you need to know the words to use (and if you have a link to an easy reference for these let me know*), but this is still a lot easier than using programming languages and databases.

As I say, this also illustrates the importance of publishing raw data so users can interrogate it in their own ways, which is precisely what The Guardian’s Data Store has been doing, meaning that people like Tony can create interfaces like this.

Wonderful.

*Tony has very generously created this page which helps you formulate your search – and generates the URL. If you were working on a different spreadsheet you could just replace the spreadsheet URL and change any column references accordingly.

UPDATE: Tony also has a version which allows you to pick from Guardian datasets.

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.