Online Journalism Blog


Category Archive

The following is a list of all entries from the databases category.

Charities data opened up – journalists: say thanks.

Having made significant inroads in opening up council and local election data, Chris Taggart has now opened up charities data from the less-than-open Charity Commission website. The result: a new website – Open Charities.

The man deserves a round of applause. Charity data is enormously important in all sorts of ways – and is likely to become more so as the government leans on the third sector to take on a bigger role in providing public services. Making it easier to join the dots between charitable organisations, the private and public sector, contracts and individuals – which is what Open Charities does – will help journalists and bloggers enormously.

A blog post by Chris explains the site and its background in more depth. In it he explains that:

“For now, it’s just a the simplest of things, a web application with a unique URL for every charity based on its charity number, and with the basic information for each charity available as data (XML, JSON and RDF). It’s also searchable, and sortable by most recent income and spending, and for linked data people there are dereferenceable Resource URIs.

“The entire database is available to download and reuse (under an open, share-alike attribution licence). It’s a compressed CSV file, weighing in at just under 20MB for the compressed version, and should probably only attempted by those familiar with manipulating large datasets (don’t try opening it up in your spreadsheet, for example). I’m also in the process of importing it into Google Fusion Tables (it’s still churning away in the background) and will post a link when it’s done.”

Chris promises to add more features “if there’s any interest”.

Well, go on…


5 tips on data journalism projects from ProPublica

A few months ago I heard ProPublica’s Olga Pierce and Jeff Larson speak at the Digital Editors Network Data Meet, giving their advice on data journalism projects. I thought I might publish notes of five tips they had here for the record:

1. Three-quarters of the top 10 stories on the site were news apps

Online applications prove very popular with users – but they are more often a landing page for further exploration via stories.

2. When you publish your story, ask for data

Publication is not the end of the process. If you invite users to submit their own information, it can lead to follow-ups and useful contacts.

3. Have both quantitative and qualitative fields in your forms

In other words, ask for basic details such as location, age, etc. but also ask for ‘their story’ if they have one.

4. Aim for a maximum of 12 questions

That seems to be the limit that people will realistically respond to. Use radio buttons and dropdown menus to make it easier for people to complete. At the end, ask whether it is okay for the organisation to contact them to ensure you’re meeting data protection regulations.

5. Share data left over from your investigation

Just because you didn’t use it doesn’t mean someone else can’t find something interesting in it.


The New Online Journalists #6: Conrad Quilty-Harper

As part of an ongoing series on recent graduates who have gone into online journalism, The Telegraph’s new Data Mapping Reporter Conrad Quilty-Harper talks about what got him the job, what it involves, and what skills he feels online journalists need today.

I got my job thanks to Twitter. Chris Brauer, head of online journalism at City University, was impressed by my tweets and my experience, and referred me to the Telegraph when they said they were looking for people to help build the UK Political database.

I spent six weeks working on the database, at first manually creating candidate entries, and later mocking up design elements and cleaning the data using Freebase Gridworks, Excel and Dabble DB. At the time the Telegraph was advertising for a “data juggler” role, and I interviewed for the job and was offered it. Continue reading this entry »


Video: BBC at the 2012 Olympics: visualisations, maps and augmented reality

With 2 years to go to the 2012 Olympics, the BBC are already starting to plan their online coverage of the event. With a large, creative team at hand who have experimented with maps, visualisations and interactive content in the past, the pressure is on them to keep the standards high.

At the recent News:Rewired event, OJB caught up with Olympics Reporter Ollie Williams, himself a visualisation guru, to find out exactly what they were planning for 2012.

You need to a flashplayer enabled browser to view this YouTube video


Get used to reading this…

“We have a team of developers going through the data now – and we’ll let you know here what we learn as and when we learn it.”

If you had any doubt over the concept of ‘programmer as journalist’, that quote above from The Guardian’s liveblog of the opening of the COINS database gives you a preview of things to come. While you’re at it, you might as well add in ‘statistician as journalist‘ and ‘information designer as journalist‘ – or look at my post from 2008 on New Journalists for New Information Flows. Are we there yet?


Coins Expenditure Database Published by Government – Open Data

This looks like an excellent start. The Coalition Government has just published the COINS database, which is the detailed database of Government spending:

The release of COINS data is just the first step in the Government’s commitment to data transparency on Government spending.

You can get the database from the data.gov website here. There are explanations to help you get to grips with it here.

Tim Almond notes (via chat) that it is a 68mb zipped file which extracts to 4GB, i.e., huge. It will require significant database tools to get to grips with this, but I’m predicting that easier ways of querying may be created by someone in 48 hours.


Dealing with live data and sentiment analysis: Q&A with The Guardian’s Martyn Inglis

As part of the research for my book on online journalism, I interviewed Martyn Inglis about The Guardian’s Blairometer, which measured a live stream of data from Twitter as Tony Blair appeared before the Chilcot inquiry. I’m reproducing it in full here, with permission:

How did you prepare for dealing with live data and sentiment analysis?

I think it was important to be aware of our limitations. We can process a limited amount of data – due to Twitter quotas and so on. This is not a definitive sample. Once we accept that (a) we are not going to rank every tweet and (b) this is therefore going to be a limited exercise it frees us to make concessions that provide an easier technology solution.

Sentiment analysis is hard programatically, given the short time span of the event in which we can do this manually. We had an interface view onto incoming tweets which we had pulled from a twitter search. This allows us to be really accurate in our assessment. This does not work over a long period of time – the Chilcot inquiry is one thing, you couldn’t do it for an event lasting a week or so on. Continue reading this entry »


UK general election 2010 – online journalism is ordinary

Has online journalism become ordinary? Are the approaches starting to standardise? Little has stood out in the online journalism coverage of this election – the innovation of previous years has been replaced by consolidation.

Here are a few observations on how the media approached their online coverage: Continue reading this entry »


Data journalism pt2: Interrogating data

This is a draft from a book chapter on data journalism (the first, on gathering data, is here). I’d really appreciate any additions or comments you can make – particularly around ways of spotting stories in data, and mistakes to avoid.

“One of the most important (and least technical) skills in understanding data is asking good questions. An appropriate question shares an interest you have in the data, tries to convey it to others, and is curiosity-oriented rather than math-oriented. Visualizing data is just like any other type of communication: success is defined by your audience’s ability to pick up on, and be excited about, your insight.” (Fry, 2008, p4)

Once you have the data you need to see if there is a story buried within it. The great advantage of computer processing is that it makes it easier to sort, filter, compare and search information in different ways to get to the heart of what – if anything – it reveals. Continue reading this entry »


Data journalism pt1: Finding data (draft – comments invited)

The following is a draft from a book about online journalism that I’ve been working on. I’d really appreciate any additions or comments you can make – particularly around sources of data and legal considerations

The first stage in data journalism is sourcing the data itself. Often you will be seeking out data based on a particular question or hypothesis (for a good guide to forming a journalistic hypothesis see Mark Hunter’s free ebook Story-Based Inquiry (2010)). On other occasions, it may be that the release or discovery of data itself kicks off your investigation.

There are a range of sources available to the data journalist, both online and offline, public and hidden. Typical sources include:

Continue reading this entry »