When I first saw Freebase Gridworks I was a very happy man. Here was a tool that tackled one of the biggest problems in data journalism: cleaning dirty data (and data is invariably dirty). The tool made it easy to identify variations of a single term, and clean them up, to link one set of data to another – and much more besides.
Then Google bought the company that made Gridworks, and now it’s released a new version of the tool under a new name: Google Refine.
It’s notable that Google are explicitly positioning Refine in their video (above) as a “data journalism” tool.
You can download Google Refine here.
Further videos below. The first explains how to take a list on a webpage and convert it into a cleaned-up dataset – a useful alternative to scraping:
The second video explains how to link your data to data from elsewhere, aka “reconciliation” – e.g. extracting latitude and longitude or language.
Pingback: links for 2010-11-12 | Joanna Geary
Pingback: Data journalism pt2: Interrogating data | Online Journalism Blog