Tag Archives: Freebase Gridworks

Data cleaning tool relaunches: Freebase Gridworks becomes Google Refine

When I first saw Freebase Gridworks I was a very happy man. Here was a tool that tackled one of the biggest problems in data journalism: cleaning dirty data (and data is invariably dirty). The tool made it easy to identify variations of a single term, and clean them up, to link one set of data to another – and much more besides.

Then Google bought the company that made Gridworks, and now it’s released a new version of the tool under a new name: Google Refine.

It’s notable that Google are explicitly positioning Refine in their video (above) as a “data journalism” tool.

You can download Google Refine here.

Further videos below. The first explains how to take a list on a webpage and convert it into a cleaned-up dataset – a useful alternative to scraping:

The second video explains how to link your data to data from elsewhere, aka “reconciliation” – e.g. extracting latitude and longitude or language.

The New Online Journalists #6: Conrad Quilty-Harper

As part of an ongoing series on recent graduates who have gone into online journalism, The Telegraph’s new Data Mapping Reporter Conrad Quilty-Harper talks about what got him the job, what it involves, and what skills he feels online journalists need today.

I got my job thanks to Twitter. Chris Brauer, head of online journalism at City University, was impressed by my tweets and my experience, and referred me to the Telegraph when they said they were looking for people to help build the UK Political database.

I spent six weeks working on the database, at first manually creating candidate entries, and later mocking up design elements and cleaning the data using Freebase Gridworks, Excel and Dabble DB. At the time the Telegraph was advertising for a “data juggler” role, and I interviewed for the job and was offered it. Continue reading