Tag Archives: foi

Getting full addresses for data from an FOI response (using APIs)

heatfullcolour11-960x1024

Here’s an example of how APIs can be useful to journalists when they need to combine two sets of data.

I recently spoke to Lincoln investigative journalism student Sean McGrath who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic).

He had spent 3 days cleaning up the data and manually adding postcodes to it. This seemed a good example where using an API might cut down your work considerably, and so in this post I explain how you make a start on the same problem in less than an hour using Excel, Google Refine and the Google Maps API.

Step 1: Get the data in the right format to work with an API

APIs can do all sorts of things, but one of the things they do which is particularly useful for journalists is answer questions. Continue reading

Why journalists should be lobbying over police.uk’s crime data

UK police crime maps

Conrad Quilty-Harper writes about the new crime data from the UK police force – and in the process adds another straw to the groaning camel’s back of the government’s so-called transparency agenda:

“It’s useless to residents wanting to find out what was going on at the house around the corner at 3am last night, and it’s useless to individuals who want to build mobile phone applications on top of the data (perhaps to get a chunk of that £6 billion industry open data is supposed to create).

“The site’s limitations are as follows:

  • No IDs for crimes: what if I want to check whether real life crimes have made it onto the map? Sorry.
  • Six crime categories: including “other crimes”, everything from drug dealing to bank robberies in one handy, impossible to understand category.
  • No live data: you mean I have to wait until the end of the next month to see this month’s criminality?!
  • No dates or times: funny how without dates and times I can’t tell which police manager was in charge.
  • Case status: the police know how many crimes go solved or unsolved, why not tell us this?”

This is why people are so concerned about the Public Data Corporation. This is why we need to be monitoring exactly what spending data councils release, and in what format. And this is why we need to continue to press for the expansion of FOI laws. This is what we should be doing. Are we?

UPDATE: Will Perrin has FOI’d all correspondence relating to ICO advice on the crime maps. Jonathan Raper has a list of further flaws including:

  • Some data such as sexual offences and murder is removed – even though it would be easy to discover and locate from other police reports.
  • Data covers reported crimes rather than convictions, so some of it may turn out not to be crime.
  • The levels of policing are not provided, so that two areas with the “same” crime levels may in fact have “radically different” experiences of crime and policing.

Charles Arthur notes that: “Police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.”

Louise Kidney says:

“What we’ve actually got with http://www.police.uk is neither one nor the other. Ruth looks like a crime overlord cos of all the crimes happening in her garden and we haven’t got exact point data, but we haven’t got first part of postcode data either e.g. BB5 crimes or NW1 crimes. Instead, we’ve got this weird halfway house thing where it’s not accurate, but its inaccuracy almost renders it useless because we don’t have any idea if every force uses the same parameters when picking these points, we don’t know how they pick their points, we don’t know what we don’t know in terms of whether one house in particular is causing a considerable issue with anti-social behaviour for example, allowing me to go to my local Council and demand they do something about it.”

Adrian Short argues that “What we’re looking at here isn’t a value-neutral scientific exercise in helping people to live their daily lives a little more easily, it’s an explicitly political attempt to shape the terms of a debate around the most fundamental changes in British policing in our lifetimes.”

He adds:

“It’s derived data that’s already been classified, rounded and lumped together in various ways, with a bit of location anonymising thrown in for good measure. I haven’t had a detailed look at it yet but I would caution against trying to use it for anything serious. A whole set of decisions have already transformed the raw source data (individual crime reports) into this derived dataset and you can’t undo them. You’ll just have to work within those decisions and stay extremely conscious that everything you produce with it will be prefixed, “as far as we can tell”.

“£300K for this? There ought to be a law against it.”

UPDATE 2: One frustrated developer has launched CrimeSearch.co.uk to provide “helpful information about crime and policing in your area, without costing 300k of tax payers’ money”

Open data meets FOI via some nifty automation

OpenlyLocal generated FOI request

Now this is an example of what’s possible with open data and some very clever thinking. Chris Taggart blogs about a new tool on his OpenlyLocal platform that allows you to send a Freedom of Information (FOI) request based on a particular item of spending. “This further lowers the barriers to armchair auditors wanting to understand where the money goes, and the request even includes all the usual ‘boilerplate’ to help avoid specious refusals.”

It takes around a minute to generate an FOI request.

The function is limited to items of spending above £10,000. Cleverly, it’s also all linked so you can see if an FOI request has already been generated and answered.

Although the tool sits on OpenlyLocalFrancis Irving at WhatDoTheyKnow gets enormous credit for making their side of the operation work with it.

Once again you have to ask why a media organisation isn’t creating these sorts of tools to help generate journalism beyond the walls of its newsroom.

Data journalism pt1: Finding data (draft – comments invited)

The following is a draft from a book about online journalism that I’ve been working on. I’d really appreciate any additions or comments you can make – particularly around sources of data and legal considerations

The first stage in data journalism is sourcing the data itself. Often you will be seeking out data based on a particular question or hypothesis (for a good guide to forming a journalistic hypothesis see Mark Hunter’s free ebook Story-Based Inquiry (2010)). On other occasions, it may be that the release or discovery of data itself kicks off your investigation.

There are a range of sources available to the data journalist, both online and offline, public and hidden. Typical sources include:

Continue reading

Review: Heather Brooke – The Silent State

The Silent State

In the week that a general election is called, Heather Brooke’s latest book couldn’t have been better timed. The Silent State is a staggeringly ambitious piece of work that pierces through the fog of the UK’s bureaucracies of power to show how they work, what is being hidden, and the inconsistencies underlying the way public money is spent.

Like her previous book, Your Right To Know, Brooke structures the book into chapters looking at different parts of the power system in the UK – making it a particularly usable reference work when you want to get your head around a particular aspect of our political systems.

Chapter by chapter

Chapter 1 lists the various databases that have been created to maintain information on citizens – paying particular focus to the little-publicised rack of databases holding subjective data on children. The story of how an old unpopular policy was rebranded to ride into existence on the back of the Victoria Climbie bandwagon is particularly illustrative of government’s hunger for data for data’s sake.

Picking up that thread further, Chapter 2 explores how much public money is spent on PR and how public servants are increasingly prevented from speaking directly to the media. It’s this trend which made The Times’ outing of police blogger Nightjack particularly loathsome and why we need to ensure we fight hard to protect those who provide an insight into their work on the ground.

Chapter 3 looks at how the misuse of statistics led to the independence of the head of the Office of National Statistics – but not the staff that he manages – and how the statistics given to the media can differ quite significantly to those provided when requested by a Select Committee (the lesson being that these can be useful sources to check). It’s a key chapter for anyone interested in the future of public data and data journalism.

Bureaucracy itself is the subject of the fourth chapter. Most of this is a plea for good bureaucracy and the end of unnamed sources, but there is still space for illustrative and useful anecdotes about acquiring information from the Ministry of Defence.

And in Chapter 5 we get a potted history of MySociety’s struggle to make politicians accountable for their votes, and an overview of how data gathered with public money – from The Royal Mail’s postcodes to Ordnance Survey – is sold back to the public at a monopolistic premium.

The justice system and the police are scrutinised in the 6th and 7th chapters – from the twisted logic that decreed audio recordings are more unreliable than written records to the criminalisation of complaint.

Then finally we end with a personal story in Chapter 8: a reflection on the MPs’ expenses saga that Brooke is best known for. You can understand the publishers – and indeed, many readers – wanting to read the story first-hand, but it’s also the least informative of all the chapters for journalists (which is a credit to all that Brooke has achieved on that front in wider society).

With a final ‘manifesto’ section Brooke summarises the main demands running across the book and leaves you ready to storm every institution in this country demanding change. It’s an experience reminiscent of finishing Franz Kafka’s The Trial – we have just been taken on a tour through the faceless, logic-deprived halls of power. And it’s a disconcerting, disorientating feeling.

Journalism 2.0

But this is not fiction. It is great journalism. And the victims caught in expensive paper trails and logical dead ends are real people.

Because although the book is designed to be dipped in as a reference work, it is also written as an eminently readable page-turner – indeed, the page-turning gets faster as the reader gets angrier. Throughout, Brooke illustrates her findings with anecdotes that not only put a human face on the victims of bureaucracy, but also pass on the valuable experience of those who have managed to get results.

For that reason, the book is not a pessimistic or sensationalist piece of writing. There is hope – and the likes of Brooke, and MySociety, and others in this book are testament to the fact that this can be changed.

The Silent State is journalism 2.0 at its best – not just exposing injustice and waste, but providing a platform for others to hold power to account. It’s not content for content’s sake, but a tool. I strongly recommend not just buying it – but using it. Because there’s some serious work to be done.

An open letter to Tim Berners-Lee about open government

Following the tone set so succinctly by Glyn Moody, I thought I would add my own thoughts on what Sir Tim should say to the government when he bends their ear on transparency.

Firstly, I would second everything that Glyn says.

But I’m going to be cynical and strategic, and urge Sir Tim to emphasise the importance of open data on a couple of areas that are close to the government’s hearts.

1. Stimulating growth in the economy.

You could compare a genuinely significant release of public data to an economic stimulus.

Like cutting VAT, only cheaper.

At minimal cost you could have a new raw material that startups and established media organisations alike could create new value out of. Some of those would create commercial implications far exceeding any revenue generated within government (as research recently suggested in relation to the comparably valuable Ordnance Survey data).

Repeat after me: jobs and money, jobs and money.

2. Efficiencies and passing on costs in the public sector

Samuel Butler’s Erewhon puts it particularly well:

You will sooner gain your end by “appealing to men’s pockets, in which they have generally something of their own, than to their heads, which contain for the most part little but borrowed or stolen property”

Public sector spending is going to drop whichever party is in power. Let’s play to that.

By opening up public data the government will effectively be able to pass on some development costs to willing volunteers who mash up the data in their own ways. The difference is that people will do this to their own agendas and for their own benefit.

But more importantly, the results of this experimentation – if supported and encouraged – should produce work that makes it more efficient to interact with public data and therefore public bodies. If I can use a slider to find out which schools are within 3 miles, that saves 20 minutes of someone answering a phonecall in the local education department. If I can have a Facebook app which tells other users how much money alcohol abuse is costing my local hospital, it might save the NHS a bob or two. You get the picture. 

Oh yes, and it’s important for democracy, civic engagement and digital literacy

The limited data that’s available in the UK is an embarrassment. Imagine what MySociety could do with what’s available in the US.

Likewise, for all the talk of transparency, the recent announcement that Cabinet Papers and information relating to the Royal Family would be exempt from the Freedom of Information act is a backward step. Heather Brooke’s concerns proved right.

The cynic in me sees the appointment of Berners-Lee as an action intended to generate the illusion of movement – “We’re working on it”. But the Freedom of Information act is possibly the most positive contribution the Labour government has made to this country’s political health since it came to power, and not to follow through on promises made would be an enormous political mistake.

So I will add one request to my advice above: I would stress that any discussion of transparency acknowledges the importance of requiring any organisation using public funds to make their data public too. So much public work is outsourced to the private sector that it is particularly difficult to see whether public money is spent responsibly.

More at Podnosh, BBC, Emma Mulqueeny, Simon Dickson and Amused Cynicism.