Archive for the google refine Tag

SFTW: Scraping data with Google Refine

For the first Something For The Weekend of 2012 I want to tackle a common problem when you’re trying to scrape a collection of webpage: they have some sort of structure in their URL like this, where part of the URL refers to the name or code of an entity: http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237521 http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237629 http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237823 In this instance, you can see that
Read more…

Social Interest Positioning – Visualising Facebook Friends’Likes With Data Grabbed Using Google Refine

What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here’s a recipe for visualising that data… After discovering via Martin Hawksey that the recent (December, 2011) 2.5 release of Google Refine allows [...]

How to use the CableSearch API to quickly reference names against Wikileaks cables (SFTW)

CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it’s been around for some time, I’ve only just noticed the site’s API, so I thought I’d show how such an API can be useful as a
Read more…

How to: convert easting/northing into lat/long for an interactive map

Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn’t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed cameras. So you’ll need a way to convert easting and northing
Read more…

SFTW: How to grab useful political data with the They Work For You API

It’s been over 2 years since I stopped doing the ‘Something for the Weekend’ series. I thought I would revive it with a tutorial on They Work For You and Google Refine… If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs
Read more…

The inverted pyramid of data journalism

I’ve been working for some time on picking apart the many processes which make up what we call data journalism. Indeed, if you read the chapter on data journalism (blogged draft) in my Online Journalism Handbook, or seen me speak on the subject, you’ll have seen my previous diagram that tries to explain those processes. I’ve now revised that considerably,
Read more…

Cleaning data using Google Refine: a quick guide

I’ve been focusing so much on blogging the bells and whistles stuff that Google Refine does that I’ve never actually written about its most simple function: cleaning data. So, here’s what it does and how to do it: Download and install Google Refine if you haven’t already done so. It’s free. Run it – it uses your default browser. In
Read more…

Merging Datasets with Common Columns in Google Refine

It’s an often encountered situation, but one that can be a pain to address – merging data from two sources around a common column. Here’s a way of doing it in Google Refine… Here are a couple of example datasets to import into separate Google Refine projects if you want to play along, both courtesy [...]

Fragments: Glueing Different Data Sources Together With Google Refine

I’m working on a new pattern using Google Refine as the hub for a data fusion experiment pulling together data from different sources. I’m not sure how it’ll play out in the end, but here are some fragments…. Grab Data into Google Refine as CSV from a URL (Proxied Google Spreadsheet Query via Yahoo Pipes) [...]

Data for journalists: JSON for beginners

Following the post earlier this week on XML and RSS for journalists I wanted to look at another important format for journalists working with data: JSON. JSON is a data format which has been rising in popularity over the past few years. Quite often it is offered alongside – or instead of – XML by various information services, such as
Read more…