Category Archives: data journalism

Why we need open courts data – and newspapers need to improve too

Justice photo by mira66

Few things sum up the division of the UK around the riots like the sentencing of those involved. Some think courts are too lenient, while others gape at six month sentences for people who stole a bottle of water.

These judgments are often made on the basis of a single case, rather than any overall view. And you might think, in such a situation, that a journalist’s role would be to find out just how harsh or lenient sentencing has been – not just across the 1,600 or more people who have been arrested during the riots, but also in comparison to previous civil disturbances – or indeed, to similar crimes outside of a riot situation.

As Martin Belam argues:

“Really good data journalism will help us untangle the truth from those prejudiced assumptions. But this is data journalism that needs to stay the course, and seems like an ideal opportunity to do “long-form data journalism”. How long will these looters serve? What is the ethnic make-up and age range of those convicted? How many other criminals will get an early release because our jails are newly full of looters? How many people convicted this week will go on to re-offend?”

And yet, amazingly, we cannot reliably answer these questions – because it is still not possible to get raw data on sentencing in UK courts, not even through FOI. Continue reading →

INFOGRAPHIC: UK riots – Gauging the Columnists Blame Game

3 Replies

Here’s a quick experiment in data visualisation to provide an instant insight into a story on how the blame game is being played by columnists.

The data is taken from a Liberal Conspiracy blog post – I’ve transferred that into a spreadsheet with limited categories and used the Gauges gadget to visualise the totals.

A screengrab is below – but there is also an embed code that provides a gauge that will be updated whenever a new columnist is added. See the spreadsheet for both the gauge and the raw data.

Columnist Blame Game Gauge

How to: convert easting/northing into lat/long for an interactive map

8 Replies

A map generated in Google Fusion Tables from a geocoded dataset

A map generated in Google Fusion Tables from a dataset cleaned using these methods

Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn’t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed cameras.

So you’ll need a way to convert easting and northing into something that Fusion Tables does like – such as latitude and longitude.

Here’s how I did it – quickly. Continue reading →

SFTW: Asking questions of a webpage – and finding out when those answers change

2 Replies

Previously I wrote on how to use the =importXML formula in Google Docs to pull information from an XML page into a conventional spreadsheet. In this Something For The Weekend post I’ll show how to take that formula further to grab information from webpages – and get updates when that information changes.

Animation from Digital Inspiration

Asking questions of a webpage – or find out when the answer changes

Despite its name, the =importXML formula can be used to grab information from HTML pages as well. This post on SEO Gadget, for example, gives a series of examples ranging from grabbing information on Twitter users to price information and web analytics (it also has some further guidance on using these techniques, and is well worth a read for that).

Asking questions of webpages typically requires more advanced use of XPath than I outlined previously – and more trial and error.

This is because, while XML is a language designed to provide structure around data, HTML – used as it is for a much wider range of purposes – isn’t quite so tidy.

Finding the structure

To illustrate how you can use =importXML to grab data from a webpage, I’m going to grab data from Gorkana, a job ads site.

Continue reading →

Time for UK media organisations to use some lobbying muscle

SFTW: How to scrape webpages and ask questions with Google Docs and =importXML

5 Replies

Image by dullhunk on Flickr

Here’s another Something for the Weekend post. Last week I wrote a post on how to use the =importFeed formula in Google Docs spreadsheets to pull an RSS feed (or part of one) into a spreadsheet, and split it into columns. Another formula which performs a similar function more powerfully is =importXML.

There are at least 2 distinct journalistic uses for =importXML:

You have found information that is only available in XML format and need to put it into a standard spreadsheet to interrogate it or combine it with other data.
You want to extract some information from a webpage – perhaps on a regular basis – and put that in a structured format (a spreadsheet) so you can more easily ask questions of it.

The first task is the easiest, so I’ll explain how to do that in this post. I’ll use a separate post to explain the latter. Continue reading →

SFTW: How to grab useful political data with the They Work For You API

6 Replies

It’s been over 2 years since I stopped doing the ‘Something for the Weekend’ series. I thought I would revive it with a tutorial on They Work For You and Google Refine…

If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs for those constituencies – the They Work For You API can save you hours of fiddling – if you know how to use it.

An API is – for the purposes of journalists – a way of asking questions for reams of data. For example, you can use an API to ask “What constituency is each of these postcodes in?” or “When did these politicians enter office?” or even “Can you show me an image of these people?”

The They Work For You API will give answers to a range of UK political questions on subjects including Lords, MLAs (Members of the Legislative Assembly in Northern Ireland), MPs, MSPs (Members of the Scottish Parliament), select committees, debates, written answers, statements and constituencies.

When you combine that API with Google Refine you can fill a whole spreadsheet with additional political data, allowing you to answer questions you might otherwise not be able to.

I’ve written before on how to use Google Refine to pull data into a spreadsheet from the Google Maps API and the UK Postcodes API, but this post takes things a bit further because the They Work For You API requires something called a ‘key’. This is quite common with APIs so knowing how to use them is – well – key. If you need extra help, try those tutorials first. Continue reading →

How to collaborate (or crowdsource) by combining Delicious and Google Docs

7 Replies

RSS girl by HeatherWeaver on Flickr

During some training in open data I was doing recently, I ended up explaining (it’s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is.

In a Google Docs spreadsheet the formula =importfeed will pull information from an RSS feed and put it into that spreadsheet. Titles, links, datestamps and other parts of the feed will each be separated into their own columns.

When combined with Delicious, this can be a useful way to collect together pages that have been bookmarked by a group of people, or any other feed that you want to analyse.

Here’s how you do it: Continue reading →

When information is power, these are the questions we should be asking

In Spanish: The inverted pyramid of data journalism part 2

4 Replies

Mauro Accurso has followed up his rapid translation of last week’s inverted pyramid of data journalism with a Spanish version of part 2: the 6 C’s of communicating data journalism. It’s copied in full below.

La semana pasada les traduje la primera parte de La Pirámide Invertida del Periodismo de Datos de Paul Bradshaw que prometió extender en el aspecto de comunicación del extenso proceso que significa el periodismo de datos.

En esta segunda parte Paul recorre 6 formas diferentes de comunicar en periodismo de datos que pueden ver en el cuadro de arriba y al final encontrarán un gráfico que resume toda la teoría (la cual está en desarrollo todavía y Bradshaw pide aportes, comentarios y sugerencias):