SFTW: How to grab useful political data with the They Work For You API

They Work For You

It’s been over 2 years since I stopped doing the ‘Something for the Weekend’ series. I thought I would revive it with a tutorial on They Work For You and Google Refine…

 

If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs for those constituencies – the They Work For You API can save you hours of fiddling – if you know how to use it.

An API is – for the purposes of journalists – a way of asking questions for reams of data. For example, you can use an API to ask “What constituency is each of these postcodes in?” or “When did these politicians enter office?” or even “Can you show me an image of these people?”

The They Work For You API will give answers to a range of UK political questions on subjects including Lords, MLAs (Members of the Legislative Assembly in Northern Ireland), MPs, MSPs (Members of the Scottish Parliament), select committees, debates, written answers, statements and constituencies.

When you combine that API with Google Refine you can fill a whole spreadsheet with additional political data, allowing you to answer questions you might otherwise not be able to.

I’ve written before on how to use Google Refine to pull data into a spreadsheet from the Google Maps API and the UK Postcodes API, but this post takes things a bit further because the They Work For You API requires something called a ‘key’. This is quite common with APIs so knowing how to use them is – well – key. If you need extra help, try those tutorials first.

The They Work For You API key

Unlike the previous APIs I’ve written about, the They Work For You API requires you to register for a ‘key’ to use it. If you don’t understand how this works the instructions on the TWFY website can be a little confusing. So here’s how it works:

The key is a password of sorts, used when you ask the API a question.

As your ‘question’ takes the form of a web address (URL) then that key needs to be included at a particular part of that URL.

You’ll see how that works when we get to asking the URL questions. But first, go to http://www.theyworkforyou.com/api/key to get a key.

Got it? OK, now copy it into a text document – or just keep this window open. You’ll need to paste it later.

Using the TWFY key

The API has a number of pre-set questions, called ‘functions’. These are listed in the right hand column, and include getMPs, getLord, getDebates and so on. If you click on any of these you will be given information on how they work, and you can also test the function with the ‘Explorer’.

To demonstrate how to use these functions, click on getConstituency.

If you use the ‘Explorer’ to test it (in this case with ‘Edinburgh South”) you will be shown a bunch of results at a URL like this:

http://www.theyworkforyou.com/api/docs/getConstituency?name=edinburgh+south&postcode=&output=js#output

Now you could manually use the Explorer to get information for each of the cells in a spreadsheet, but it’s much, much quicker to use the API to automate the process instead.

On that front the Explorer can be a little misleading. Because although it shows you the information you might get from the API, this is not the URL that you will need.

The URL you really need is shown above the results, and below the word ‘Output’ like so:

http://www.theyworkforyou.com/api/getConstituency?name=edinburgh+south&output=js

If you copy and paste that URL into your browser you will get the following warning:

{

error: “No API key provided. Please see http://www.theyworkforyou.com/api/key for more information.”

}

So now we need that key.

Using your key

Assuming you still have your API key copied somewhere, or still open in another window, you can find instructions on how to use it at http://www.theyworkforyou.com/api/

Here you are told to use the key as part of the following structure:

http://www.theyworkforyou.com/api/function?key=key&output=output&other_variables

The important bit is where it says key=key&

That is where you need to add your own key, so that that part of the URL looks something like

key=aTh0jklerJaHui7&

(where that random assortment of characters is your key, copied earlier, followed by the & sign)

Going back for a moment to the URL that wasn’t working without a key, we can see that it can be split into two parts:

http://www.theyworkforyou.com/api/getConstituency?

 

and

name=edinburgh+south&output=js

 

Adding in the key in the middle makes up a third part, like so:

http://www.theyworkforyou.com/api/getConstituency?

 

and

key=key&

and

name=edinburgh+south&output=js

So, you now need to edit the output URL to include your API key. It should then look something like this:

http://www.theyworkforyou.com/api/getConstituency?key=AHdajHUShajshaJ&name=edinburgh+south&output=js

UPDATE: Matthew Somerville points out that the key can be used anywhere after the ? so you can tag it on the end if that’s easier.

The URL broken down further

Just to clarify, these are the parts:

http://www.theyworkforyou.com/

(The website hosting the API)

api/

(The API)

getConstituency?

(The function – or question being asked)

key=AHdajHUShajshaJ

(Our API key – or password)

&name=edinburgh+south

(and the constituency name that we are asking the API for information on)

&output=js

(and the format we want the answer in – JSON, in this case)

You should now get a page of JSON code giving data for the question. If your browser doesn’t display it particularly well, try Chrome or Firefox.

Using with Google Refine to get a bunch of results

Great. But we could get one result by using the ‘Explorer’, so why did we need to do all that? Because we can now use Google Refine to automate the process of asking the same question hundreds of times.

To demonstrate this, here’s a spreadsheet with 4 constituencies. Open it, and select File > Download as… > CSV

Open Google Refine (download here) and create a new project with that spreadsheet. Create a new column from the one you have by clicking on the arrow at the top of the column and selecting Edit Column > Add Column by fetching URLs

In the window that appears adapt the following piece of Google Refine Expression Language (GREL) with your own API key (shown in bold):

This generates a URL in each cell based on the value of the original column: the start and end of the URL are in quotation marks; the value is inserted in the middle where it says +value+

(NOTE: Avoid copying and pasting as quotation marks may cause you problems. Instead try typing it in yourself – this also helps you remember things) This generates a URL in each cell based on the value of the original column: the start and end of the URL are in quotation marks; the value is inserted in the middle where it says +value+

Give the column a name and click OK. It will now run – this test example only has 4 rows so you can see the results quickly.

You’ll see that only one row has actually worked – Tatton. The others have failed. Why? Because they have more than one word.

Take another look at that URL that the API returned earlier with the test of Edinburgh South:

http://www.theyworkforyou.com/api/getConstituency?key=AHdajHUShajshaJ&name=edinburgh+south&output=js

When a constituency has two words the space between them is represented by a plus sign – so we need to format our data in the same way for it to work.

Formatting data for the API

You could use Find and Replace in Excel to replace all spaces in that column with a plus sign but you will still hit problems with unusual constituency names. But this is how to do it in Google Refine:

Click on the arrow at the top of the constituency column and selecting Edit Column > Add column based on this column…

In the window that appears type the following GREL:

value.split(” “).join(“+”)

To explain:

‘Value’ is the value in each cell.

‘.split(” “)’ splits each value where there is a space (” “).

‘.join(“+”) then joins the resulting items together, with a plus sign.

Give it a name and click OK. You’ll see a new column with plus signs replacing the spaces. [see comment from Matthew Somerville for explanation]

Create a new column from the one you have by clicking on the arrow at the top of the column and selecting Edit Column > Add Column by fetching URLs

In the window that appears adapt the following piece of Google Refine Expression Language (GREL) with your own API key (shown in bold):

“http://www.theyworkforyou.com/api/getConstituency?name=” + escape(value, “url”) + “&key=Gr7jUUlKdhB3fsihFnHzab&output=js”

The key part here is between the + signs. Whereas before we simply inserted the value of each cell, here we escape that value at the same time so that it will work in a URL.

This will change Edinburgh South to “edinburgh+south” but also Normanton, Pontefract and Castleford to ”Normanton%2C+Pontefract+and+Castleford” and any other unforeseen characters in similar ways.

Give this new column a name, click OK and watch your new column populate itself with the JSON from each URL.

Creating new columns from the JSON

Now we can populate new columns with data taken from that JSON as follows:

Click on the arrow at the top of the new JSON column and select Edit Column > Add column based on this column…

Type this GREL:

value.parseJson().bbc_constituency_id

(This looks in the JSON in each cell and pulls out the bit after bbc_constituency_id :) And click OK.

Repeat the process for further columns as follows:

value.parseJson().guardian_election_results

value.parseJson().pa_id

value.parseJson().guardian_id

Going further

That’s just a demonstration of how to use a small part of the They Work For You API – there are lots of other functions that you can use to get other information. Have a play with those.

Meanwhile, what about those IDs? Well, the Guardian ID will allow you to play with The Guardian’s API – which gives lots more information on each constituency. For an example see http://www.guardian.co.uk/politics/api/constituency/664/json

Based on that URL you can repeat the process above to grab more data.

Is this useful? Anything you can add? Or other data problems?

 
Advertisements

One thought on “SFTW: How to grab useful political data with the They Work For You API

  1. Pingback: Should journalists learn how to code? They already do. (And yes, they should) | Online Journalism Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s