Adding geographical information to a spreadsheet based on postcodes – Google Refine and APIs

If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week – and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.

You can watch a video tutorial of this here.

1. Find a website that gives information based on a postcode

First, I needed to find an API which would return a page of information on any postcode in JSON…

If that sounds like double-dutch, don’t worry, try this instead.

Translation: First, I needed either of these websites: http://www.uk-postcodes.com/ or http://mapit.mysociety.org/

Both of these will generate a page giving you details about any given postcode. The formatting of these pages is consistent, e.g.

(The first removes the space between the two parts of the postcode, and adds .json; the second replaces the space with %20 – although I’m told by Matthew Somerville that it will work with spaces and postcodes without spaces)

This information will be important when we start to use Google Refine…

2. Create a new column that has text in the same format as the webpages you want to fetch

In Google Refine click on the arrow at the top of your postcode column and follow the instructions here to create a new column which has the same postcode information, but with no spaces. To replace the space with %20 instead you would replace the express with

value.split(" ").join("%20")

Let’s name this column ‘SpacesRemoved’ and click OK.

Now that we’ve got postcodes in the same format as the webpages above, we can start to fetch a bunch of code giving us extra information on those postcodes.

3. Write some code that goes to a webpage and fetches information about each postcode

In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column by fetching URLs…’

You can read more about this functionality here.

This time you will type the expression:

"http://www.uk-postcodes.com/postcode/"+value+".json"

That basically creates a URL that inserts ‘value’ (the value in the previous column) where you want it.

Call this column ‘JSON for postcode’ and click OK.

Each cell will now be filled with the results of that webpage. This might take a while.

4. Write some code that pulls out a specific piece of information from that

In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column based on this column…’

Write the following expression:

value.parseJson()["administrative"]["district"]["title"]

Look at the preview as you type this and you’ll see information become more specific as you add each term in square brackets.

Call this ‘Council’ and click OK.

This column will now be populated with the council names for each postcode. You can repeat this process for other information, adapting the expression for different pieces of information such as constituency, easting and northing, and so on.

5. Export as a standard spreadsheet

Click Export in the top right corner and save your spreadsheet in the format you prefer. You can then upload this to Google Docs and share it publicly.

Other possibilities

Although this post is about postcode data you can use the same principles to add information based on any data that you can find an API for. For example if you had a column of charities you could use the Open Charities API to pull further details (http://opencharities.org/info/about). For local authority data you could pull from the OpenlyLocal API (http://openlylocal.com/info/api).

If you know of other similarly useful APIs let me know.

10 thoughts on “Adding geographical information to a spreadsheet based on postcodes – Google Refine and APIs

  1. reseau vdi

    The investigation afterward did too. But what
    used to be that special anyway cool and fun but
    nothing to fight over. Credit: CBSStar power still sells Celebrity endorsements have been part of advertising culture
    since the dawn of industry. Investors were hoping that the Lumia 920 and 820 have been generally positive.

    Reply
  2. Pingback: How to: convert easting/northing into lat/long for an interactive map | Online Journalism Blog

  3. Pingback: The inverted pyramid of data journalism – DATOVÁ ŽURNALISTIKA

  4. Pingback: Data tutorial: Using OutWit Hub and Open Refine to scrape data | Sophie Murray-Morris: Freelance Journalist

  5. Pingback: API:How can it help you in data journalism? | DATA QUEEN

  6. Pingback: SFTW: How to grab useful political data with the They Work For You API | Online Journalism Blog

  7. Pingback: Data for journalists: JSON for beginners | Online Journalism Blog

  8. Pingback: Free school meals in Scottish primary schools – data visualisation | Help Me Investigate… Education

  9. Pingback: How to: convert easting/northing into lat/long for an interactive map – Matthews' Blog

  10. Pingback: The inverted pyramid of data journalism | Online Journalism Blog

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.