If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week – and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.
You can watch a video tutorial of this here.
1. Find a website that gives information based on a postcode
First, I needed to find an API which would return a page of information on any postcode in JSON…
If that sounds like double-dutch, don’t worry, try this instead.
Translation: First, I needed either of these websites: http://www.uk-postcodes.com/ or http://mapit.mysociety.org/
Both of these will generate a page giving you details about any given postcode. The formatting of these pages is consistent, e.g.
(The first removes the space between the two parts of the postcode, and adds .json; the second replaces the space with %20 – although I’m told by Matthew Somerville that it will work with spaces and postcodes without spaces)
This information will be important when we start to use Google Refine…
2. Create a new column that has text in the same format as the webpages you want to fetch
In Google Refine click on the arrow at the top of your postcode column and follow the instructions here to create a new column which has the same postcode information, but with no spaces. To replace the space with %20 instead you would replace the express with
Let’s name this column ‘SpacesRemoved’ and click OK.
Now that we’ve got postcodes in the same format as the webpages above, we can start to fetch a bunch of code giving us extra information on those postcodes.
3. Write some code that goes to a webpage and fetches information about each postcode
In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column by fetching URLs…’
You can read more about this functionality here.
This time you will type the expression:
That basically creates a URL that inserts ‘value’ (the value in the previous column) where you want it.
Call this column ‘JSON for postcode’ and click OK.
Each cell will now be filled with the results of that webpage. This might take a while.
4. Write some code that pulls out a specific piece of information from that
In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column based on this column…’
Write the following expression:
Look at the preview as you type this and you’ll see information become more specific as you add each term in square brackets.
Call this ‘Council’ and click OK.
This column will now be populated with the council names for each postcode. You can repeat this process for other information, adapting the expression for different pieces of information such as constituency, easting and northing, and so on.
5. Export as a standard spreadsheet
Click Export in the top right corner and save your spreadsheet in the format you prefer. You can then upload this to Google Docs and share it publicly.
Although this post is about postcode data you can use the same principles to add information based on any data that you can find an API for. For example if you had a column of charities you could use the Open Charities API to pull further details (http://opencharities.org/info/about). For local authority data you could pull from the OpenlyLocal API (http://openlylocal.com/info/api).
If you know of other similarly useful APIs let me know.