Last night OpenHeatMap creator Pete Warden announced that the tool now allowed you to visualise UK data. I’ve been gleefully playing with the heat-mapping tool today and thought I’d share some pointers on visualising data on a map.
This is not a tutorial for OpenHeatMap – Pete’s done a great job of that himself (video below) – but rather an outline of the steps to get some map-ready data in the first place.[youtube:http://www.youtube.com/watch?v=yFy80kMcaRo%5D
1. Find a dataset to visualise.
You firstly need data that fits the geographical areas supported by OpenHeatMap (countries, constituencies, local authorities, districts and counties), and which suits geographical visualisation.
My first stop was the data.gov.uk RSS feed to see what recent datasets had been released, but you could also do advanced searches for “unemployment by county” etc. if you are looking for something specific to visualise.
Helpfully, each dataset description includes a field on “Geographical granularity”. This helps you quickly rule out those that only give country-level data (most of them). I ended up discovering the Scottish Neighbourhood Statistics page which allows you to download very specific data. Once on the site’s very helpful data download page I clicked on “Up to 100 indicators” and selected ‘Local authority’ from the first drop-down menu to ensure it matched up to the councils on OpenHeatMap.
I decided to map smoking data, downloaded it, and unzipped it. I then uploaded it to Google Spreadsheets (thankfully, the dataset was not too large).
2. Clarify the data
It’s at this point that the real work begins. Most datasets use a lot of jargon and codes, and you’ll have to spend time decoding those. In this case the zipped file contained a spreadsheet of “meta-data” that explained some of the codes used in the main spreadsheets. That cleared up part of my dataset, but there was a problem. Instead of local authority names, the spreadsheet (as is often the case) used codes.
I started scouring the web for a spreadsheet that might have those codes and what they meant. Eventually I found a page that listed them. Had the information been in a table, or even a list, I could have used the importHTML function in Google Docs (or Excel) to import the list. But it wasn’t, so I had 3 choices:
- Convert the codes manually into authority names
- Keep searching to see if I could find the same list in a better format
- Write a scraper to convert the list on that page into a dataset
I tweeted an appeal for someone to scrape the page (the data was relatively structured – each item was prefixed by a 3-digit number, for example – which should make it easier) and Tom Smith quickly did just that. However, the scraper had for some reason not managed to capture the Scottish codes (possibly because they used the same numbers as the English ones).
Once those codes and associated local authority names were added to my spreadsheet I could convert my own codes to local authority names by using the VLOOKUP formula.
In Google Spreadsheets this means typing something like this into an empty cell to the right of your data: =VLOOKUP(D2,A:B,2)
You need to adapt the formula as follows:
- D2 is the cell with the unknown code in it
- A:B is the range of cells which contain both the code your are looking for and the associated word (A:B is all cells in A and B columns; you might also use something like A2:B340)
- 2 (the “index”) is which of those columns contains the word you want to ‘pull’. 1 would be column A, because it is the first in your range. 2 is B.
- As a whole, then, the code looks at the code in D2, then looks across all the cells in A and B to find the same code. If it finds it, then it will ‘copy across’ whatever is in the B column next to that code.
3. Create a spreadsheet for OpenHeatMap to use
With all my codes converted to local authority names (which OpenHeatMap should recognise), I’ve got all the data I need.
I create a new spreadsheet that only contains the relevant data: local authority, and a value (in this case, the percentage of the population that smokes).
I change the header for the local authorities council to uk_council and the other column to value
OpenHeatMap will use the names in uk_council to decide what areas of a UK map to colour in – and it will use the numbers in value to paint them according to whether they are high, low, or near the middle ground.
Now, I publish the spreadsheet as a webpage by going to Share > Publish as webpage. And copy the URL that is generated.
Then it’s off to openheatmap.com and follow the instructions there.
At this point, I hit a problem – one of the local authorities is not recognised: Eilean Siar. That’s okay – I thought this might happen. On the webpage of local authority codes that one was followed by the parenthesis: (formerly Western Isles). This sort of change of name or alternative name is worth looking out for when gathering data as it’s the sort of thing that does tend to trip up scripts.
In my spreadsheet, I change Eilean Siar to Western Isles, go to Share > Publish as a webpage again and click ‘Republish now‘. Returning to OpenHeatMap, I try again, and it seems to have solved the problem. The map works.
It’s a wonderfully simple way to achieve something that would have required a lot of programming knowhow 6 months ago. The UK is notoriously problematic in the way that it uses a range of different geographical reference points (health authorities, police forces and other data sources use others still), so this is a massive step forward in being able to present regional differences in stories.
If you use OpenHeatMap or mapping data yourself, please let me know how you get on and any other tips you pick up.