In my last post I wrote about how using feeds and social bookmarking can make for a quicker data journalism workflow. In this second part I look at how to anticipate and prevent problems; and how collaboration can improve data work.
Workflow tip 3. Anticipate problems
A particularly useful habit of successful data journalists is to think ahead in the way you request data. For example, you might want to request basic datasets now that you think you’ll need in future, such as demographic details for local patches.
You might also want to request the ‘data dictionary‘ for key datasets. This lists all the fields used in a particular database. For example, did you know that the police have a database for storing descriptions of suspects? And that one of the fields is shoe size? That could make for quite a quirky story.
Likewise, if you know that the gifts and hospitality register requires politicians to say whether they took a partner to a particular event that adds an extra angle you might not have considered.
Cost codes are another useful dataset to have. Birmingham City Council’s monthly published data on spending, for example, includes which directorate made the payment, but also the cost centre code for each one. Knowing what they mean means you can get a much more specific idea of who spent the money and what it was for.
You can also use that information to request further data. If you know the cost code for the Mayor’s travel expenses, for example, you might make a Freedom of Information for entries against that code, and so on.
Another problem you can try to anticipate is possible objections to a request that you might make. The Freedom of Information Act is full of exemptions for everything from confidentiality to official secrets. Some of these are absolute and cannot be argued, while others must meet a public interest test, or make a strong argument of potential harm. One very common exemption is that the information would cost too much to collect.
If you think any of these exemptions might be used to refuse your request then try to anticipate them in your request. If you think cost might be raised, point out that they are obliged to help you rephrase the request to get it under the cost limit (for example by narrowing the scope or time period covered).
If you think privacy issues might be raised, offer to have the data with names and identifying information removed (if that’s not crucial to your story). If you think commercial interests might be an issue, read up on the official guidance on that exemption and remind them of their obligations under that (for example they need to demonstrate the harm that might result), or suggest that they might still supply some of the information.
Books like Heather Brooke’s Your Right to Know, and Montague and Amin’s FOIA Without the Lawyer are excellent reference books on these – but also look at the FOI requests being made on WhatDoTheyKnow.com and pick up tips from the good ones.
Workflow tip 4. Lower the barriers to collaboration – and seek out collaborators
There is a quiet cultural battle taking place in journalism right now between journalists who want to ‘own’ a story, and the ‘open’ culture of collaborative online networks.
Data journalism – with its reliance on a range of skills – lends itself particularly to collaborative methods. The Guardian’s use of crowdsourcing to invite users to look at MPs’ expenses; its use of a Flickr photo pool to allow designers to share their visualisations of data shared on the datablog; and its creation of an API which over 8,000 web developers have registered to use are all examples of creating value by opening up assets which could never be fully exploited in-house.
Ask yourself how you might do the most with a particular data story. Can you publish the data the story is based on, and invite others to find things you might have missed? Or invite people to contribute their own visualisations? The more generous you are with your own resources, the more likely others are going to be generous with their skills and time.
Think about communities of people who might help you do better journalism. If you need data scraping, for example, the Scraperwiki mailing list is useful to follow and engage with. The NICAR-L mailing list is particularly useful for asking questions about computer assisted reporting, FOI and spreadsheets. If you’re really geeky you can try contributing to the forums on StackOverflow.
Is there a statistician at a local university you can turn to when you are not sure of the validity of data? Do you have the contact details for the person who deals with data at the local fire service? Is there a non-government body that might be collecting data, like a charity, or academic? Contacts have always been vital in journalism, and data journalism is no different.
Can you add any other tips? Let us know in the comments or @paulbradshaw on Twitter.
In the final part I look at how to think like a computer and speed up things further.