Monthly Archives: April 2010

Data journalism pt4: visualising data – tools and publishing (comments wanted)

This is a draft from a book chapter on data journalism (here are parts 1; two; and three, which looks the charts side of visualisation). I’d really appreciate any additions or comments you can make – particularly around tips and tools.

UPDATE: It has now been published in The Online Journalism Handbook.

Visualisation tools

So if you want to visualise some data or text, how do you do it? Thankfully there are now dozens of free and cheap pieces of software that you can use to quickly turn your tables into charts, graphs and clouds.

The best-known tool for creating word clouds is Wordle (wordle.net). Simply paste a block of text into the site, or the address of an RSS feed, and the site will generate a word cloud whose fonts and colours you can change to your preferences. Similar tools include Tagxedo (tagxedo.com) and Wordlings (http://wordlin.gs), both of which allow you to put your word cloud into a particular shape.

ManyEyes (manyeyes.alphaworks.ibm.com/manyeyes/) also allows you to create word clouds and tag clouds – as well as word trees and phrase nets that allow you to see common phrases. But it is perhaps most useful in allowing you to easily create scattergrams, bar charts, bubble charts and other forms. The site also contains a raft of existing data that you can play with to get a feel for the site. Similar tools that allow access to other data include Factual (factual.com), Swivel (swivel.com)[see comments], Socrata (socrata.com) and Verifiable.com (verifiable.com). And Google Fusion Tables (tables.googlelabs.com) is particularly useful if you want to collaborate on tables of data, as well as offering visualisation options.

More general visualisation tools include widgenie (widgenie.com), iCharts (icharts.net), ChartTool (onlinecharttool.com) and ChartGo (www.chartgo.com). FusionCharts is a piece of visualisation software with a Google Gadget service that publishers may find useful. You can find instructions on how to use it at www.fusioncharts.com/GG/Docs

If you want more control over your visualisation – or want it to update dynamically when the source information is updated, Google Chart Tools (code.google.com/apis/charttools) is worth exploring. This requires some technical knowledge, but there is a lot of guidance and help on the site to get you started quickly.

Tableau Public is a piece of free software you can download (tableausoftware.com/public) with some powerful visualisation options. You will also find visualisation options on spreadsheet applications such as Excel or the free Google Docs spreadsheet service. These are worth exploring as a way to quickly generate charts from your data on the fly.

Publishing your visualisation

There will come a point when you’ve visualised your data and need to publish it somehow. The simplest way to do this is to take an image (screengrab) of the chart or graph. This can be done with a web-based screencapture tool like Kwout (kwout.com), a free desktop application like Skitch (skitch.com) or Jing (jingproject.com), or by simply using the ‘Print Screen’ button on a PC keyboard (cmd+shift+3 on a Mac) and pasting the screengrab into a graphics package such as Photoshop.

The advantage of using a screengrab is that the image can be easily distributed on social networks, image sharing websites (such as Flickr), and blogs – driving traffic to the page on your site where it is explained.

If you are more technically minded, you can instead choose to embed your chart or graph. Many visualisation tools will give you a piece of code which you can copy and paste into the HTML of an article or blog post in the place you wish to display it (this will not work on most third party blog hosting services, such as WordPress.com). One particular advantage of this approach is that the visualisation can update itself if the source data is updated.

Alternatively, an understanding of Javascript can allow you to build ‘progressively enhanced’ charts which allow users to access the original data or see what happens when it is changed.

Showing your raw data

It is generally a good idea to give users access to your raw data alongside its visualisation. This not only allows them to check it against your visualisation but add insights you may not otherwise gain. It is relatively straightforward to publish a spreadsheet online using Google Docs (see the sidebar on publishing a spreadsheet)

SIDEBAR: How to: publish a spreadsheet online

Google Docs (docs.google.com) is a free website which allows you to create and share documents. You can share them via email, by publishing them as a webpage, or by embedding your document in another webpage, such as a blog post. This is how you share a spreadsheet:

  1. Open your spreadsheet in Google Docs. You can upload a spreadsheet into Google Docs if you’ve created it elsewhere – there is a size limit, however, so if you are told the file is too big try removing unnecessary sheets or columns.
  2. Look for the ‘Share’ button (currently in the top right corner) and click on it.
  3. A drop-down menu should appear. Click on ‘Publish as a web page’
  4. A new window should appear asking which sheets you want to publish. Select the sheet you want to publish and click ‘Start publishing’ (you should also make sure ‘Automatically republish when changes are made’ is ticked if you want the public version of the spreadsheet to update with any data you add.)
  5. Now the bottom half of that window – ‘Get a link to the published data’ – should become active. In the bottom box should be a web address where you can now see the public version of your spreadsheet. If you want to share that, copy the address and test that it works in a web browser. You can now link to it from any webpage.
  6. Alternatively, you can embed your spreadsheet – or part of it – in another webpage. To do this click on the first drop-down menu in this area – it will currently say ‘Web page’ – and change it to ‘HTML to embed in a page’. Now the bottom box on this window should show some HTML that begins with
  7. If you want to embed just part of a spreadsheet, in the box that currently says ‘All cells’ type the range of cells you wish to show. For example, typing A1:G10 will select all the cells in your spreadsheet from A1 (the first row of column A) to G10 (the 10th row of column G). Once again, the HTML below will change so that it only displays that section of your spreadsheet.

Once again, I’d welcome any comments on things I may have missed or tips you can add. Part 5, on mashups, is now available here.

Data journalism pt3: visualising data – charts and graphs (comments wanted)

This is a draft from a book chapter on data journalism (the first, on gathering data, is here; the section on interrogating data is here). I’d really appreciate any additions or comments you can make – particularly around considerations in visualisation. A further section on visualisation tools, can be found here.

UPDATE: It has now been published in The Online Journalism Handbook.

“At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers – even a very large set – is to look at pictures of those numbers.” (Edward Tufte, The Visual Display of Quantitative Information, 2001)

Visualisation is the process of giving a graphic form to information which is often otherwise dry or impenetrable. Classic examples of visualisation include turning a table into a bar chart, or a series of percentage values into a pie chart – but the increasing power of both computer analysis and graphic design software have seen the craft of visualisation develop with increasing sophistication. In larger organisations the data journalist may work with a graphic artist to produce an infographic that visualises their story – but in smaller teams, in the initial stages of a story, or when speed is of the essence they are likely to need to use visualisation tools to give form to their data.

Broadly speaking there are two typical reasons for visualising data: to find a story; or to tell one. Quite often, it is both. Continue reading

Why do people read online news? (Research summary)

Ioana Epure summarises “Harnessing the potential of online news: Suggestions from a study on the relationship between online news advantages and its post-adoption consequences”, a study by An Nguyen (University of Stirling)

In the last decade journalism has entered a stage in which news organisations are less reluctant to invest in online operations, but An Nguyen’s study starts from the premise that they do so driven not by the desire to innovate and fully exploit the potential of online news, but because of the fear that the internet will replace traditional media in the news market.

As a consequence, they haven’t actually tried to understand what users want from online news and how what they want will affect their behaviour after receiving it.

Surprisingly, the results of Nguyen’s study show that traditional press still has a battle to carry, provided that practitioners understand why people have turned to online news and try to offer them something similar. Continue reading

Data journalism pt2: Interrogating data

This is a draft from a book chapter on data journalism (the first, on gathering data, is here). I’d really appreciate any additions or comments you can make – particularly around ways of spotting stories in data, and mistakes to avoid.

UPDATE: It has now been published in The Online Journalism Handbook.

“One of the most important (and least technical) skills in understanding data is asking good questions. An appropriate question shares an interest you have in the data, tries to convey it to others, and is curiosity-oriented rather than math-oriented. Visualizing data is just like any other type of communication: success is defined by your audience’s ability to pick up on, and be excited about, your insight.” (Fry, 2008, p4)

Once you have the data you need to see if there is a story buried within it. The great advantage of computer processing is that it makes it easier to sort, filter, compare and search information in different ways to get to the heart of what – if anything – it reveals. Continue reading

4 uses for Foursquare for journalists

I’ve been fiddling with the mobile location-based social networking game Foursquare for a few months now. The concept is simple: as you move around a city you ‘check in’ to locations. You can see where your friends last checked in, and you can add comments as you go. But does it have journalistic uses? I think it does. Here are just 4:

1. Finding contacts

Until recently I refrained from pressing the ‘Tell Twitter’ or ‘Tell Facebook’ buttons when I checked into a location. However, that changed when I realised what happens when you do.

In one example, David Nikel, a political candidate in Birmingham, ‘checked in’ at Birmingham New Street train station 5 minutes after I had. Although I hadn’t ‘shared’ my check in via Twitter, because Nikel did, his automatically generated tweet said that I was there too. This alerted me and led to us meeting.

2. Social capital

Foursquare plugs into your existing social networks but adds an extra layer of information. If you know that John spends a lot of time at Urban Coffee Co you can make a point to go there yourself more often, or at least have it as a potential conversation-opener.

3. Tips

Users can add ‘tips’ to locations – a feature which is currently underused but has potential for leads as well as…

4. Distribution

Foursquare has already signed deals with Metro in Canada, Bravo TV and the FT. The potential is obvious: content directly relevant to your location. The big issue for Foursquare is whether it can achieve the scale that most publishers need.

How about you? Are using Foursquare or one of the other location based social networks, such as Brightkite or Gowalla – and how has it been useful?

Data journalism pt1: Finding data (draft – comments invited)

The following is a draft from a book about online journalism that I’ve been working on. I’d really appreciate any additions or comments you can make – particularly around sources of data and legal considerations

The first stage in data journalism is sourcing the data itself. Often you will be seeking out data based on a particular question or hypothesis (for a good guide to forming a journalistic hypothesis see Mark Hunter’s free ebook Story-Based Inquiry (2010)). On other occasions, it may be that the release or discovery of data itself kicks off your investigation.

There are a range of sources available to the data journalist, both online and offline, public and hidden. Typical sources include:

Continue reading

Sutton and Cheam Guardian skewered by blogger for misreporting Election

This is a Guest Article from political blogger Anna Raccoon about questionable Election reporting by the Sutton and Cheam Guardian , where all the candidates but one received objective biographies; it’s also a case study of one way that bloggers can hold journalists to account, and how the media is becoming more permeable.

For me, the more cross-scrutiny we get between journalists and bloggers, the better it will be for our media. It first appeared on Anna’s blog

The interesting, and potentially damning, points of this story are that one candidate was included with a joke biography in a survey of all candidates, that the audit trail seems to show conclusively that the newspaper had a full biography available – having used the photo which was attached to the same email, and that the journalist concerned has been shown to be a supporter of the candidate who would probably benefit most from damage to the Libertarian Party candidate. I’ll be pleased to run a reply from the Sutton and Cheam Guardian; I think it needs one. Continue reading

I've moved my blog – here's why

In the past few days the Online Journalism Blog has moved to hosting on Journal Local, a platform primarily aimed at hyperlocal publishers.

I’ve moved the blog for a number of reasons. Firstly, the platform offers specialist support that doesn’t appear to be available anywhere else. Philip John, who built Journal Local, is an experienced hyperlocal publisher (of the Lichfield Blog) himself, and he knows his stuff. He has already been able to provide technical assistance on all sorts of things I don’t always have the time to look into, from themes and plugins to sorting things out when the blog has been the target of hackers.

In fact, just having someone around who knows when the blog is being targeted by hackers is going to give me a bit more peace of mind.

Secondly, I want to support what Philip is trying to do. Journal Local is an attempt to find one sustainable business model for hyperlocal publishing. It’s not only well thought-out and executed but, for me, could make it easier for hyperlocal publishers generally to continue to operate both editorially and commercially.

It’s a freemium service, with a free, bespoke platform for those who are trying out hyperlocal publishing, but also – in the premium version – more control and support for existing publishers who are looking to make their operations more professional. Both are expanding markets.

And although Journal Local hasn’t yet officially launched, already North West Sheffield News and Inside the M60 have signed up, and the Future of News website is using the platform too.

A key element for me is that Journal Local isn’t just a technical service but an information service as well. If you’ve met Philip, you’ll know he’s an important part of the hyperlocal movement and always ready to offer help to other bloggers and publishers. I think that’s key in any new media business – that it’s a vocation for the founder.

Particularly interesting are the features tailored to hyperlocal site owners and online journalists. The basic setup comes with plugins that pull from TheyWorkForYou.com, WriteToThem.com and Opening Times – as well as an Addiply plugin that allows publishers to instantly sell advertising. The service will also be bolstered in the near future by features that take advantage of such great tools as OpenlyLocal and Patient Opinion, among others.

In that context, I’d much rather give the money I currently pay on hosting and domain name registration to Journal Local. It’s a no-brainer.

And I may well start recommending that students running their own hyperlocal operations use the free version of the service.

In the meantime, I guess if you want to use it yourself you’d need to contact Philip John on Twitter or something.

Telegraph launches powerful election database

The Telegraph have finally launched – in beta – the election database I’ve been waiting for since the expenses scandal broke. And it’s rather lovely.

Starting with the obvious part (skip to the next section for the really interesting bit): the database allows you to search by postcode, candidate or constituency, or to navigate by zooming, moving and clicking on a political map of the UK.

Searches take you to a page on an individual candidate or a constituency. For the former you get a biography, details on their profession and education (for instance, private or state, oxbridge, redbrick or neither), as well as email, website and Twitter page. Not only is there a link to their place in the Telegraph’s ‘Expenses Files’ – but also a link to their allowances page on Parliament.uk. Continue reading

Open Data in Spain: AbreDatos

I come from Argentina, where the government isn’t obliged by law to give away public information to citizens or NGOs that request it. There are, though, some access-to-information projects ready to be discussed in Congress in the next few days. Still, this is why I’m always amazed by all the open data initiatives in the USA and UK.

But now I can show you an open data project from Spain called Desafío AbreDatos, organized by the ProBonoPúblico association and supported by the Basque Government.

AbreDatos 2010 consists of two days’ programming by groups of 4 developers building websites, apps, widgets or mashups with at least one source coming from a public organization in digital format (APIs, XML, CSV, SPARQL / RDF, HTML, PDF, scanned images). Many of those sources can be found in datospublicos.jottit.com.

Of course the initiative wants to encourage the opening up of public data and transparency of administrations, and some of the projects are very interesting (my favorite is a website that shows if Congress staff really earn their salaries).

One to keep an eye on.