Data for journalists: JSON for beginners

Following the post earlier this week on XML and RSS for journalists I wanted to look at another important format for journalists working with data: JSON.

JSON is a data format which has been rising in popularity over the past few years. Quite often it is offered alongside – or instead of – XML by various information services, such as Google Maps, the UK Postcodes API and the Facebook Graph API.

Because of this, in practice JSON is more likely to be provided in response to a specific query (“Give me geographical and political data about this location”) than a general file that you access (“Give me all geographical data about everywhere”).

I’ll describe how you supply that query below.

Not as easy on the eye as XML

Like XML, JSON provides structured data about information. But whereas XML uses tags in angled brackets followed by the value – e.g. <name>Paul Bradshaw</name>, JSON separates the tag (called an attribute) and the value with a colon, both of which are contained within a curly bracket like so –

{"name":"Paul Bradshaw"}

As a result, visually, JSON can be harder to get a hold on, as both tag and value look the same.

This, for example, is JSON provided by the UK Postcodes API for just one postcode:

{"postcode":"B42  2SU","geo":{"lat":"52.517269","lng":"-1.89729","easting":407066,"northing":291047,"geohash":"http://geohash.org/gcqdub6rmbw0"},"administrative":{"constituency":{"title":"Birmingham, Perry Barr","uri":"http://statistics.data.gov.uk/id/parliamentary-constituency/038","code":"038"},"district":{"title":"Birmingham City Council","uri":"http://statistics.data.gov.uk/id/local-authority/00CN","snac":"00CN"},"ward":{"title":"Perry Barr","uri":"http://statistics.data.gov.uk/id/electoral-ward/00CNHR","snac":"00CNHR"}}}

The key is to copy that JSON into a simple text editor and break it down at every comma and new bracket, like so:

{
"postcode":"B42  2SU",
"geo":
    {
    "lat":"52.517269",
    "lng":"-1.89729",
    "easting":407066,
    "northing": 291047,
    "geohash":"http://geohash.org/gcqdub6rmbw0"
    },
"administrative":
    {
    "constituency":
        {
        "title":"Birmingham, Perry Barr",
        "uri":"http://statistics.data.gov.uk/id/parliamentary-constituency/038",
        "code":"038"
        },
    "district":
        {
        "title":"Birmingham City Council",
        "uri":"http://statistics.data.gov.uk/id/local-authority/00CN",
        "snac":"00CN"
        },
    "ward":
        {
        "title":"Perry Barr",
        "uri":"http://statistics.data.gov.uk/id/electoral-ward/00CNHR",
        "snac":"00CNHR"
        }
    }
}

(Coincidentally, Tony Hirst wrote a post this week giving more on how to do this, including useful tools such as JSON-Pad)

Once you do that you can see that apart from the postcode itself our postcode JSON has:

  • 2 root elements of data: ‘geo’, and ‘administrative’.
  • Each of those has child elements, so ‘geo’ has latitude, longitude, and so on. ‘Administrative’ has constituency, district and ward.
  • But constituency, district and ward also all have child elements – not just the name (title) but an administrative code (‘snac’) and a URI (a place online where further details are stored).

Any of these might be useful to us as journalists: we may need latitude and longitude to map the data, a constituency so we can know which MP is responsible (which may involve more JSON or XML), or a link to find statistics about that area.

And breaking it down makes it easier for us to write the code that pulls out that specific piece of data.

You can use Google Refine, for example, to pull JSON for each postcode in a dataset. But to then extract a specific piece of information from that JSON requires a bit of code using Google Refine Expression Language (GREL).

You don’t need to learn the entire language, just particular phrases that are useful to you. In this case, the phrase is

value.parseJson()
  • followed by the ‘path’ to the information you want.

This path consists of each element you want to follow, in quotes and square brackets like so:

["administrative"]["district"]["title"]

Looking back at the broken down JSON above you can see that this should go to the ‘administrative’ element of your postcode JSON, then the ‘district’ part within that, and finally give you the ‘title’ of that district.

The full code then will look like this:

value.parseJson()["administrative"]["district"]["title"]

Or you can write it like this:

value.parseJson().administrative.district.title

Again, Tony Hirst’s post goes into a bit more detail on other permutations and exercises to get to grips with this.

For completeness sake, I should explain that ‘value’ is the value in the cell that you are performing this operation on, i.e. the one containing the JSON. The ‘parseJSON()’ function ‘parses’ that value – it parses the JSON in that cell – to find the information along the path that you describe.

So here’s a question to test whether this made sense: if you wanted the latitude instead how would you change the code above?

If you want to explore these ideas further, here’s a tutorial on how to do this with the Facebook Open Graph API. And here’s a post about how to use a Google Gadget to publish a spreadsheet as JSON.

11 thoughts on “Data for journalists: JSON for beginners

  1. Pingback: EXTJS – JsonWriter not respecting DateFormat used with JsonReader | Prodromus

  2. Pingback: links for 2011-04-18 | Joanna Geary

  3. Pingback: How to use the CableSearch API to quickly reference names against Wikileaks cables (SFTW) | Online Journalism Blog

  4. Pingback: Hand-outs and plans for data track at Global Investigative Journalism Conference in Kiev

  5. Pingback: How to: convert easting/northing into lat/long for an interactive map | Online Journalism Blog

  6. Pingback: How to: convert easting/northing into lat/long for an interactive map | Online Journalism Blog

  7. Pingback: API:How can it help you in data journalism? | DATA QUEEN

  8. Pingback: Coding for journalists: 10 programming concepts it helps to understand | Online Journalism Blog

  9. Pingback: EXTJS – JsonWriter not respecting DateFormat used with JsonReader | Z-Car

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s