Author Archives: johnnymates

About johnnymates

Photojournalist from Birmingham.

Leveraging music to help people understand data

In a guest post for OJB, Ion Mates interviews Tom Levine and Roman Heindorff about the role of audio in data journalism.

Audiolisation (sometimes called ‘auralization‘ or ‘sonification’) is the process of turning complex data to sound.

Instead of using graphics and bar charts, one can represent the contents of a spreadsheet by assigning sounds to different kinds of data.

In the above example, the activity of newsrooms is represented by verses, phrases and different rhythms. The author is Thomas Levine.

Beginning to represent data as audio

Tom started playing with computers from an early age. His main interest was to design things towards them being easier to use.

Continue reading

How to: clean a converted PDF using Open Refine

Our initial table

This spreadsheet sent in response to an FOI request appeared to have been converted from PDF format

In a guest post post for OJB, Ion Mates explains how he used OpenRefine to clean up a spreadsheet which had been converted from PDF format. An earlier version of this post was published on his blog.

Journalists rarely get their hands on nice, tidy data: public bodies don’t have an interest in providing information in a structured form. So it is increasingly part of a journalist’s job to get that information into the right state before extracting patterns and stories.

A few months ago I sent a Freedom of Information request asking for the locations of all litter bins in Birmingham. But instead of sending a spreadsheet downloaded directly from their database, the spreadsheet they sent appeared to have been converted from a multiple-page PDF.

This meant all sorts of problems, from rows containing page numbers and repeated header rows, to information split across multiple rows and even pages.

In this post I’ll be taking you through I used the free data cleaning tool OpenRefine (formerly Google Refine) to tackle all these problems and create a clean version of the data. Continue reading