Category Archives: data journalism

Why we need open courts data – and newspapers need to improve too

Justice

Justice photo by mira66

Few things sum up the division of the UK around the riots like the sentencing of those involved. Some think courts are too lenient, while others gape at six month sentences for people who stole a bottle of water.

These judgments are often made on the basis of a single case, rather than any overall view. And you might think, in such a situation, that a journalist’s role would be to find out just how harsh or lenient sentencing has been – not just across the 1,600 or more people who have been arrested during the riots, but also in comparison to previous civil disturbances – or indeed, to similar crimes outside of a riot situation.

As Martin Belam argues:

“Really good data journalism will help us untangle the truth from those prejudiced assumptions. But this is data journalism that needs to stay the course, and seems like an ideal opportunity to do “long-form data journalism”. How long will these looters serve? What is the ethnic make-up and age range of those convicted? How many other criminals will get an early release because our jails are newly full of looters? How many people convicted this week will go on to re-offend?”

And yet, amazingly, we cannot reliably answer these questions – because it is still not possible to get raw data on sentencing in UK courts, not even through FOI. Continue reading

INFOGRAPHIC: UK riots – Gauging the Columnists Blame Game

Here’s a quick experiment in data visualisation to provide an instant insight into a story on how the blame game is being played by columnists.

The data is taken from a Liberal Conspiracy blog post – I’ve transferred that into a spreadsheet with limited categories and used the Gauges gadget to visualise the totals.

A screengrab is below – but there is also an embed code that provides a gauge that will be updated whenever a new columnist is added. See the spreadsheet for both the gauge and the raw data.

Columnist Blame Game Gauge - UK Riots

Columnist Blame Game Gauge

How to: convert easting/northing into lat/long for an interactive map

A map generated in Google Fusion Tables from a geocoded dataset

A map generated in Google Fusion Tables from a dataset cleaned using these methods

Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn’t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed cameras.

So you’ll need a way to convert easting and northing into something that Fusion Tables does like – such as latitude and longitude.

Here’s how I did it – quickly. Continue reading

SFTW: Asking questions of a webpage – and finding out when those answers change

Previously I wrote on how to use the =importXML formula in Google Docs to pull information from an XML page into a conventional spreadsheet. In this Something For The Weekend post I’ll show how to take that formula further to grab information from webpages – and get updates when that information changes.

Animation from Digital Inspiration

Animation from Digital Inspiration

Asking questions of a webpage – or find out when the answer changes

Despite its name, the =importXML formula can be used to grab information from HTML pages as well. This post on SEO Gadget, for example, gives a series of examples ranging from grabbing information on Twitter users to price information and web analytics (it also has some further guidance on using these techniques, and is well worth a read for that).

Asking questions of webpages typically requires more advanced use of XPath than I outlined previously – and more trial and error.

This is because, while XML is a language designed to provide structure around data, HTML – used as it is for a much wider range of purposes – isn’t quite so tidy.

Finding the structure

To illustrate how you can use =importXML to grab data from a webpage, I’m going to grab data from Gorkana, a job ads site.

Continue reading

Time for UK media organisations to use some lobbying muscle

There are two Cabinet Office consultations taking place at the moment around open data: one around data policy for the new Public Data Corporation (PDC), and another around the government’s policy around transparency and open data strategy.

This should be of enormous interest to any media organisation – a key opportunity to influence the availability of information of public interest.

For example, among the issues under consideration are (summed up by Tony Hirst): charging for PDC information, licensing and regulation.

These will all be vital elements in the future of journalism – news organisations and journalists should be vocal in shaping them.

The deadline for both consultations is October 27.

SFTW: How to scrape webpages and ask questions with Google Docs and =importXML

XML puzzle cube

Image by dullhunk on Flickr

Here’s another Something for the Weekend post. Last week I wrote a post on how to use the =importFeed formula in Google Docs spreadsheets to pull an RSS feed (or part of one) into a spreadsheet, and split it into columns. Another formula which performs a similar function more powerfully is =importXML.

There are at least 2 distinct journalistic uses for =importXML:

  1. You have found information that is only available in XML format and need to put it into a standard spreadsheet to interrogate it or combine it with other data.
  2. You want to extract some information from a webpage – perhaps on a regular basis – and put that in a structured format (a spreadsheet) so you can more easily ask questions of it.

The first task is the easiest, so I’ll explain how to do that in this post. I’ll use a separate post to explain the latter. Continue reading

SFTW: How to grab useful political data with the They Work For You API

They Work For You

It’s been over 2 years since I stopped doing the ‘Something for the Weekend’ series. I thought I would revive it with a tutorial on They Work For You and Google Refine…

If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs for those constituencies – the They Work For You API can save you hours of fiddling – if you know how to use it.

An API is – for the purposes of journalists – a way of asking questions for reams of data. For example, you can use an API to ask “What constituency is each of these postcodes in?” or “When did these politicians enter office?” or even “Can you show me an image of these people?”

The They Work For You API will give answers to a range of UK political questions on subjects including Lords, MLAs (Members of the Legislative Assembly in Northern Ireland), MPs, MSPs (Members of the Scottish Parliament), select committees, debates, written answers, statements and constituencies.

When you combine that API with Google Refine you can fill a whole spreadsheet with additional political data, allowing you to answer questions you might otherwise not be able to.

I’ve written before on how to use Google Refine to pull data into a spreadsheet from the Google Maps API and the UK Postcodes API, but this post takes things a bit further because the They Work For You API requires something called a ‘key’. This is quite common with APIs so knowing how to use them is – well – key. If you need extra help, try those tutorials first. Continue reading

How to collaborate (or crowdsource) by combining Delicious and Google Docs

RSS girl by Heather Weaver

RSS girl by HeatherWeaver on Flickr

During some training in open data I was doing recently, I ended up explaining (it’s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is.

In a Google Docs spreadsheet the formula =importfeed will pull information from an RSS feed and put it into that spreadsheet. Titles, links, datestamps and other parts of the feed will each be separated into their own columns.

When combined with Delicious, this can be a useful way to collect together pages that have been bookmarked by a group of people, or any other feed that you want to analyse.

Here’s how you do it: Continue reading

When information is power, these are the questions we should be asking

Various commentators over the past year have made the observation that “Data is the new oil“. If that’s the case, journalists should be following the money. But they’re not.

Instead it’s falling to the likes of Tony Hirst (an Open University academic), Dan Herbert (an Oxford Brookes academic) and Chris Taggart (a developer who used to be a magazine publisher) to fill the scrutiny gap. Recently all three have shone a light into the move towards transparency and open data which anyone with an interest in information would be advised to read.

Hirst wrote a particularly detailed post breaking down the results of a consultation about higher education data.

Herbert wrote about the publication of the first Whole of Government Accounts for the UK.

And Taggart made one of the best presentations I’ve seen on the relationship between information and democracy.

What all three highlight is how control of information still represents the exercise of power, and how shifts in that control as a result of the transparency/open data/linked data agenda are open to abuse, gaming, or spin. Continue reading

In Spanish: The inverted pyramid of data journalism part 2

Mauro Accurso has followed up his rapid translation of last week’s inverted pyramid of data journalism with a Spanish version of part 2: the 6 C’s of communicating data journalism. It’s copied in full below.

La semana pasada les traduje la primera parte de La Pirámide Invertida del Periodismo de Datos de Paul Bradshaw que prometió extender en el aspecto de comunicación del extenso proceso que significa el periodismo de datos.

comunicar periodismo de datosEn esta segunda parte Paul recorre 6 formas diferentes de comunicar en periodismo de datos que pueden ver en el cuadro de arriba y al final encontrarán un gráfico que resume toda la teoría (la cual está en desarrollo todavía y Bradshaw pide aportes, comentarios y sugerencias):

Continue reading