Tag Archives: foi

Staffordshire Council just made a very persuasive case for the value of Freedom of Information

Yesterday Staffordshire County Council controversially published details of “The cost of “Freedom of Information” to local people. The titling of that page gives some clue to its intent: FOI is a ‘cost’, and it’s you, local people, who pay.

But I think the list – despite its obvious agenda and related weaknesses – is actually rather brilliant.

Why? Because it shows just how flexible a tool FOI is, how widely it is used, and perhaps raises questions to be answered about why it has to be used in the first place.

staffordshire FOI requesters

Staffordshire’s top 10 list of FOI requesters includes a parish council, elected MP and local and national media

The top ten requesters, for example, throws up not just news organisations but a politician, a parish council, and the right wing campaign group TaxPayers’ Alliance. Continue reading

Ethics in data journalism: mass data gathering – scraping, FOI and deception

chicago_crime

Automated mapping of data – ChicagoCrime.org – image from Source

This is the third in a series of extracts from a draft book chapter on ethics in data journalismThe first looked at how ethics of accuracy play out in data journalism projects, and the second at culture clashes, privacy, user data and collaborationThis is a work in progress, so if you have examples of ethical dilemmas, best practice, or guidance, I’d be happy to include it with an acknowledgement.

Mass data gathering – scraping, FOI, deception and harm

The data journalism practice of ‘scraping’ – getting a computer to capture information from online sources – raises some ethical issues around deception and minimisation of harm. Some scrapers, for example, ‘pretend’ to be a particular web browser, or pace their scraping activity more slowly to avoid detection. But the deception is practised on another computer, not a human – so is it deception at all? And if the ‘victim’ is a computer, is there harm? Continue reading

Three book reviews: leaks, FOI, and surveillance

Secret Manoeuvres in the Dark book cover
This Machine Kills Secrets book cover

If you’re interested in leaks, surveillance or FOI, three book reviews I wrote over the last two months on the Help Me Investigate blog recently might interest you:

Hyperlocal Voices: Simon Pipe, St Helena Online

After a short summer break, our Hyperlocal Voices series returns.  In this issue we visit the tiny island South Atlantic island of Saint Helena. Perhaps best known for being the home of an exiled Napoleon, it is frequently described as one of the world’s most isolated islands. At just 10 x 5 miles, and with a population of 4,255 people, Simon Pipe’s St Helena Online, offered Damian Radcliffe an insight into a very different type of hyperlocal site. Continue reading

Video: Heather Brooke’s tips on investigating, and using the FOI and Data Protection Acts

The following 3 videos first appeared on the Help Me Investigate blog, Help Me Investigate: Health and Help Me Investigate: Welfare. I thought I’d collect them together here too. As always, these are published under a Creative Commons licence, so you are welcome to re-use, edit and combine with other video, with attribution (and a link!).

First, Heather Brooke’s tips for starting to investigate public bodies:

Her advice on investigating health, welfare and crime:

And on using the Data Protection Act:

Crowdsourcing investigative journalism: a case study (part 3)

Continuing the serialisation of the research underpinning a new Help Me Investigate project, in this third part I describe how the focus of the site was shaped by the interests of its users and staff, and how site functionality was changed to react to user needs. I also identify some areas where the site could have been further developed and improved. (Part 1 is available here; Part 2 is here)

Reflections on the proof of concept phase

By the end of the 12 week proof of concept phase the site had also completed a number of investigations that were not ‘headline-makers’ but fulfilled the objective of informing users: in particular ‘Why is a new bus company allowed on an existing route with same number, but higher prices?’; ‘What is the tracking process for petitions handed in to Birmingham City Council?’ and ‘The DVLA and misrepresented number plates’

The site had also unearthed some promising information that could provide the basis for more stories, such as Birmingham City Council receiving over £160,000 in payments for vehicle removals; and ‘Which councils in the UK (that use Civil Enforcement) make the most from parking tickets?’ (as a byproduct, this also unearthed how well different councils responded to Freedom of Information requests#)

A number of news organisations expressed an interest in working with the site, but practical contributions to the site took place largely at an individual rather than organisational level. Journalist Tom Scotney, who was involved in one of the investigations, commented: “Get it right and you’re becoming part of an investigative team that’s bigger, more diverse and more skilled than any newsroom could ever be” (Scotney, 2009, n.p.) – but it was becoming clear that most journalists were not culturally prepared – or had the time – to engage with the site unless there was a story ‘ready made’ for them to use. Once there were stories to be had, however, they contributed a valuable role in writing those stories up, obtaining official reactions, and spreading visibility.

After 12 weeks the site had around 275 users (whose backgrounds ranged from journalism and web development to locally active citizens) and 71 investigations, exceeding project targets. It is difficult to measure ‘success’ or ‘failure’ but at least eight investigations had resulted in coherent stories, representing a success rate of at least 11%: the target figure before launch had been 1-5%. That figure rose to around 21% if other promising investigations were included, and the sample included recently initiated investigations which were yet to get off the ground.

‘Success’ was an interesting metric which deserves further elaboration. In his reflection on The Guardian’s crowdsourcing experiment, for example, developer Martin Belam (2011a, n.p.) noted a tendency to evaluate success “not purely editorially, but with a technology mindset in terms of the ‘100% – Achievement unlocked!’ games mechanic.”. In other words, success might be measured in terms of degrees of ‘completion’ rather than results.

In contrast, the newspaper’s journalist Paul Lewis saw success in terms of something other than pure percentages: getting 27,000 people to look at expense claims was, he felt, a successful outcome, regardless of the percentage of claims that those represented. And BBC Special Reports Editor Bella Hurrell – who oversaw a similar but less ambitious crowdsourcing project on the same subject on the broadcaster’s website, felt that they had also succeeded in genuine ‘public service journalism’ in the process (personal interview).

A third measure of success is noted by Belam – that of implementation and iteration (being able to improve the service based on how it is used):

“It demonstrated that as a team our tech guys could, in the space of around a week, get an application deployed into the cloud but appear integrated into our site, using a technology stack that was not our regular infrastructure.

“Secondly, it showed that as a business we could bring people together from editorial, design, technology and QA to deliver a rapid turnaround project in a multi-disciplinary way, based on a topical news story.

“And thirdly, we learned from and improved upon it.“ (Belam, 2010, n.p.)

A percentage ‘success’ rate of Help Me Investigate, then, represents a similar, ‘game-oriented’ perspective on the site, and it is important to draw on other frameworks to measure its success.

For example, it was clear that the site did very well in producing raw material for ‘journalism’, but it was less successful in generating more general civic information such as how to find out who owned a piece of land. Returning to the ideas of Actor-Network Theory outlined above, the behaviour of two principal actors – and one investigation – had a particular influence on this, and how the site more generally developed over time. Site user Neil Houston was an early adopter of the site and one of its heaviest contributors. His interest in interrogating data helped shape the path of many of the site’s most active investigations, which in turn set the editorial ‘tone’ of the site. This attracted users with similar interests to Neil, but may have discouraged others who did not – further research would be needed to establish this.

Likewise, while Birmingham City Council staff contributed to the site in its earliest days, when the council became the subject of an investigation staff’s involvement was actively discouraged (personal interview with contributor). This left the site short of particular expertise in answering civic questions.

At least one user commented that the site was very ‘FOI [Freedom Of Information request]-heavy’ and risked excluding users interested in different types of investigations, or who saw Freedom of Information requests as too difficult for them. This could be traced directly to the appointment of Heather Brooke as the site’s support journalist. Heather is a leading Freedom of Information activist and user of FOI requests: this was an enormous strength in supporting relevant investigations but it should also be recognised how that served to set the editorial tone of the site.

This narrowing of tone was addressed by bringing in a second support journalist with a consumer background: Colin Meek. There was also a strategic shift in community management which involved actively involving users with other investigations. As more users came onto the site these broadened into consumer, property and legal areas.

However, a further ‘actor’ then came into play: the legal and insurance systems. Due to the end of proof of concept funding and the associated legal insurance the team had to close investigations unrelated to the public sector as they left the site most vulnerable legally.

A final example of Actor-Network Theory in action was a difference between the intentions of the site designers and its users. The founders wanted Help Me Investigate to be a place for consensus, not discussion, but it was quickly apparent users did not want to have to go elsewhere to have their discussions. Users needed to – and did – have conversations around the updates that they posted.

The initial challenge-and-result model (breaking investigations down into challenges with entry fields for the subsequent results, which were required to include a link to the source of their information) was therefore changed very early on to challenge-and-update: people could now update without a link, simply to make a point about a previous result, or to explain their efforts in failing to obtain a result.

One of the challenges least likely to be accepted by users was to ‘Write the story up’. It seemed that those who knew the investigation had no need to write it up: the story existed in their heads. Instead it was either site staff or professional journalists who would normally write up the results. Similarly, when an investigation was complete, it required site staff to update the investigation description to include a link to any write-up. There was no evidence of a desire from users to ‘be a journalist’. Indeed, the overriding objective appeared rather to ‘be a citizen’.

In contrast, a challenge to write ‘the story so far’ seemed more appealing in investigations that had gathered data but no resolution as yet. The site founders underestimated the need for narrative in designing a site that allowed users to join investigations while they were in progress.

As was to be expected with a ‘proof of concept’ site (one testing whether an idea could work), there were a number of areas of frustration in the limitations of the site – and identification of areas of opportunity. When looking to crowdfund small amounts for an investigation, for example, there were no third party tools available that would allow this without going through a nonprofit organisation. And when an investigation involved a large crowdsourcing operation the connection to activity conducted on other platforms needed to be stronger so users could more easily see what needed doing (e.g. a live feed of changes to a Google spreadsheet, or documents bookmarked using Delicious).

Finally investigations often evolved into new questions but had to stay with an old title or risk losing the team and resources that had been built up. The option to ‘export’ an investigation team and resources into a fresh question/investigation was one possible future solution.

‘Failure for free’ was part of the design of the site in order to allow investigations to succeed on the efforts of its members rather than as a result of any top-down editorial agenda – although naturally journalist users would concentrate their efforts on the most newsworthy investigations. In practice it was hard to ‘let failure happen’, especially when almost all investigations had some public interest value.

Although the failure itself was not an issue (and indeed the failure rate lower than expected), a ‘safety net’ was needed that would more proactively suggest ways investigators could make their investigation a success, including features such as investigation ‘mentors’ who could pass on their experience; ‘expiry dates’ on challenges with reminders; improved ability to find other investigators with relevant skills or experience; a ‘sandbox’ investigation for new users to find their feet; and developing a metric to identify successful and failing investigations.

Communication was central to successful investigations and two areas required more attention: staff time in pursuing communication with users; and technical infrastructure to automate and facilitate communication (such as alerts to new updates or the ability to mail all investigation members)

The much-feared legal issues threatened by the site did not particularly materialise. Out of over 70 investigations in the first 12 weeks, only four needed rephrasing to avoid being potentially libellous. Two involved minor tweaks; the other two were more significant, partly because of a related need for clarity in the question.

Individual updates within investigations, which were post-moderated, presented even less of a legal problem. Only two updates were referred for legal advice, and only one of those rephrased. One was flagged and removed because it was ‘flamey’ and did not contribute to the investigation.

There was a lack of involvement by users across investigations. Users tended to stick to their own investigation and the idea of ‘helping another so they help you’ did not take root. Further research is needed to see if there was a power law distribution at work here – often seen on the internet – of a few people being involved in lots of investigations, most being involved in one, and a steep upward curve between.

In the next part I look at one particular investigation in an attempt to identify the qualities that made it successful.

If you want to get involved in the latest Help Me Investigate project, get in touch on paul@helpmeinvestigate.com

Getting full addresses for data from an FOI response (using APIs)

heatfullcolour11-960x1024

Here’s an example of how APIs can be useful to journalists when they need to combine two sets of data.

I recently spoke to Lincoln investigative journalism student Sean McGrath who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic).

He had spent 3 days cleaning up the data and manually adding postcodes to it. This seemed a good example where using an API might cut down your work considerably, and so in this post I explain how you make a start on the same problem in less than an hour using Excel, Google Refine and the Google Maps API.

Step 1: Get the data in the right format to work with an API

APIs can do all sorts of things, but one of the things they do which is particularly useful for journalists is answer questions. Continue reading

Why journalists should be lobbying over police.uk’s crime data

UK police crime maps

Conrad Quilty-Harper writes about the new crime data from the UK police force – and in the process adds another straw to the groaning camel’s back of the government’s so-called transparency agenda:

“It’s useless to residents wanting to find out what was going on at the house around the corner at 3am last night, and it’s useless to individuals who want to build mobile phone applications on top of the data (perhaps to get a chunk of that £6 billion industry open data is supposed to create).

“The site’s limitations are as follows:

  • No IDs for crimes: what if I want to check whether real life crimes have made it onto the map? Sorry.
  • Six crime categories: including “other crimes”, everything from drug dealing to bank robberies in one handy, impossible to understand category.
  • No live data: you mean I have to wait until the end of the next month to see this month’s criminality?!
  • No dates or times: funny how without dates and times I can’t tell which police manager was in charge.
  • Case status: the police know how many crimes go solved or unsolved, why not tell us this?”

This is why people are so concerned about the Public Data Corporation. This is why we need to be monitoring exactly what spending data councils release, and in what format. And this is why we need to continue to press for the expansion of FOI laws. This is what we should be doing. Are we?

UPDATE: Will Perrin has FOI’d all correspondence relating to ICO advice on the crime maps. Jonathan Raper has a list of further flaws including:

  • Some data such as sexual offences and murder is removed – even though it would be easy to discover and locate from other police reports.
  • Data covers reported crimes rather than convictions, so some of it may turn out not to be crime.
  • The levels of policing are not provided, so that two areas with the “same” crime levels may in fact have “radically different” experiences of crime and policing.

Charles Arthur notes that: “Police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.”

Louise Kidney says:

“What we’ve actually got with http://www.police.uk is neither one nor the other. Ruth looks like a crime overlord cos of all the crimes happening in her garden and we haven’t got exact point data, but we haven’t got first part of postcode data either e.g. BB5 crimes or NW1 crimes. Instead, we’ve got this weird halfway house thing where it’s not accurate, but its inaccuracy almost renders it useless because we don’t have any idea if every force uses the same parameters when picking these points, we don’t know how they pick their points, we don’t know what we don’t know in terms of whether one house in particular is causing a considerable issue with anti-social behaviour for example, allowing me to go to my local Council and demand they do something about it.”

Adrian Short argues that “What we’re looking at here isn’t a value-neutral scientific exercise in helping people to live their daily lives a little more easily, it’s an explicitly political attempt to shape the terms of a debate around the most fundamental changes in British policing in our lifetimes.”

He adds:

“It’s derived data that’s already been classified, rounded and lumped together in various ways, with a bit of location anonymising thrown in for good measure. I haven’t had a detailed look at it yet but I would caution against trying to use it for anything serious. A whole set of decisions have already transformed the raw source data (individual crime reports) into this derived dataset and you can’t undo them. You’ll just have to work within those decisions and stay extremely conscious that everything you produce with it will be prefixed, “as far as we can tell”.

“£300K for this? There ought to be a law against it.”

UPDATE 2: One frustrated developer has launched CrimeSearch.co.uk to provide “helpful information about crime and policing in your area, without costing 300k of tax payers’ money”

Open data meets FOI via some nifty automation

OpenlyLocal generated FOI request

Now this is an example of what’s possible with open data and some very clever thinking. Chris Taggart blogs about a new tool on his OpenlyLocal platform that allows you to send a Freedom of Information (FOI) request based on a particular item of spending. “This further lowers the barriers to armchair auditors wanting to understand where the money goes, and the request even includes all the usual ‘boilerplate’ to help avoid specious refusals.”

It takes around a minute to generate an FOI request.

The function is limited to items of spending above £10,000. Cleverly, it’s also all linked so you can see if an FOI request has already been generated and answered.

Although the tool sits on OpenlyLocalFrancis Irving at WhatDoTheyKnow gets enormous credit for making their side of the operation work with it.

Once again you have to ask why a media organisation isn’t creating these sorts of tools to help generate journalism beyond the walls of its newsroom.

Data journalism pt1: Finding data (draft – comments invited)

The following is a draft from a book about online journalism that I’ve been working on. I’d really appreciate any additions or comments you can make – particularly around sources of data and legal considerations

The first stage in data journalism is sourcing the data itself. Often you will be seeking out data based on a particular question or hypothesis (for a good guide to forming a journalistic hypothesis see Mark Hunter’s free ebook Story-Based Inquiry (2010)). On other occasions, it may be that the release or discovery of data itself kicks off your investigation.

There are a range of sources available to the data journalist, both online and offline, public and hidden. Typical sources include:

Continue reading