Category Archives: data journalism

Tell the government what you want from the Public Data Corporation

Public Data Corporation consultation

If who are excited about the prospect of open data, but frustrated by its execution (or just one of those people who complain that data doesn’t change anything), the government are inviting comments on what shape the Public Data Corporation should take.

It’s a refreshingly simple execution: a WordPress blog with each question as a separate blog post – presumably it cost a lot less than £300,000. But of course the questions are theirs, and they are:

1. Which public sector datasets do you currently make use of?

2. How easy is it to find out what datasets are held by public sector organisations?

3. How do you, or would you, decide whether a dataset has value for you or for your organisation? What affects how valuable they are, for example timeliness, granularity, format?

4. Which datasets are of most value to you or your organisation? Why?

5. What methods of access to datasets would most benefit you or your organisation?

6. What gets in the way of you or your organisation accessing datasets or data products?

7. What are the most exciting applications of datasets or data products you are aware of – here or internationally? We are, again, particularly interested in the following areas: registration activities, environmental science, critical infrastructure and the built environment.

8. Are there any datasets or products you’d like to see generated? How would you or your organisation use them, and what social or economic benefits do you think they would deliver?

9. From your perspective, what would success look like for the Public Data Corporation?

10. Have we got the name for this organisation right? Do you have any suggestions on naming that might better convey our aims?

It’s a shame that there isn’t any space for more open discussion – and that so many of the questions resemble market research. But still, the more journalists who pile in – the more justifiably we can moan later. So go ahead.

Post your responses here.

3 things that BBC Online has given to online journalism

14 Replies

It’s now 3 weeks since the BBC announced 360 online staff were to lose their jobs as part of a 25% cut to the online budget. It’s a sad but unsurprising part of a number of cuts which John Naughton summarises as: “It’s not television”, a sign that “The past has won” in the internal battle between those who saw consumers as passive vessels for TV content, and those who credited them with some creativity.

Dee Harvey likewise poses the question: “In the same way that openness is written into the design of the Internet, could it be that closedness is written into the very concept of the BBC?”

If it is, I don’t think it can remain that way for ever. Those who have been part of the BBC’s work online will feel rightly proud of what has been achieved since the corporation went online in 1997. Here are just 3 ways that the corporation has helped to define online journalism as we know it – please add others that spring to mind:

1. Web writing style

The BBC’s way of writing for the web has always been a template for good web writing, not least because of the BBC’s experience with having to meet similar challenges with Ceefax – the two shared a content management system and journalists writing for the website would see the first few pars of their content cross-published on Ceefax too.

Even now it is difficult to find an online publisher who writes better for the web.

2. Editors blogs

Thanks to the likes of Robin Hamman, Martin Belam, Jem Stone and Tom Coates – to name just a few – when the BBC did begin to adopt blogs (it was not an early adopter) it did so with a spirit that other news organisations lacked.

In particular, the Editors’ Blogs demonstrated a desire for transparency that many other news organisations have yet to repeat, while the likes of Robert Peston, Kevin Anderson and Rory Cellan-Jones have played a key role in showing skeptical journalists how engaging with the former audience on blogs can form a key part of the newsgathering process.

Unfortunately, many of those innovators later left the BBC, and the earlier experimentation was replaced with due process.

3. Backstage

While so many sing and dance about the APIs of The Guardian and The New York Times, Ian Forrester’s BBC Backstage project was well ahead of the game when it opened up the corporation’s API and started hosting hack days and meetups way back in 2005.

Backstage closed at the end of last year, just as the rest of the UK’s media were starting to catch up. You can read an e-book on its history here.

What else?

I’m sure you can add others – the iPlayer and their on-demand team; Special Reports; the UGC hub (the biggest in the world as far as I know); and even their continually evolving approach to linking (still not ideal, but at least they think about it) are just some that spring to mind. What parts of BBC Online have influenced or inspired you?

3 new resources for data journalists

8 Replies

There have been a raft of new sites for data launched in the past couple of months which I haven’t had time to blog about, so here’s a quick round-up:

Tim Davies‘ Open Data Cookbook aims to collect “step by step recipes for practical ways to use open data” – a useful complement to GetTheData. The recipes are currently aimed at the more technically minded but you know what to do to address that…
Is It Open Data? aims to “make it easy for people to make enquires of data holders, about the openness of the data they hold — and to record publicly the results of those efforts.”
And for those wishing to publish open data, The Open Data Manual provides information on what open data is, why you should publish open data, and how to do it. If you come up against an organisation that does not know how to publish their data in an open format, or needs convincing of why they should do so, this is a good place to point them to (or learn the arguments from).

If you’ve seen any other useful resources of late, please post a link in the comments.

Why journalists should be lobbying over police.uk’s crime data

2 Replies

UK police crime maps

Conrad Quilty-Harper writes about the new crime data from the UK police force – and in the process adds another straw to the groaning camel’s back of the government’s so-called transparency agenda:

“It’s useless to residents wanting to find out what was going on at the house around the corner at 3am last night, and it’s useless to individuals who want to build mobile phone applications on top of the data (perhaps to get a chunk of that £6 billion industry open data is supposed to create).

“The site’s limitations are as follows:

No IDs for crimes: what if I want to check whether real life crimes have made it onto the map? Sorry.

Six crime categories: including “other crimes”, everything from drug dealing to bank robberies in one handy, impossible to understand category.

No live data: you mean I have to wait until the end of the next month to see this month’s criminality?!

No dates or times: funny how without dates and times I can’t tell which police manager was in charge.

Case status: the police know how many crimes go solved or unsolved, why not tell us this?”

This is why people are so concerned about the Public Data Corporation. This is why we need to be monitoring exactly what spending data councils release, and in what format. And this is why we need to continue to press for the expansion of FOI laws. This is what we should be doing. Are we?

UPDATE: Will Perrin has FOI’d all correspondence relating to ICO advice on the crime maps. Jonathan Raper has a list of further flaws including:

Some data such as sexual offences and murder is removed – even though it would be easy to discover and locate from other police reports.
Data covers reported crimes rather than convictions, so some of it may turn out not to be crime.
The levels of policing are not provided, so that two areas with the “same” crime levels may in fact have “radically different” experiences of crime and policing.

Charles Arthur notes that: “Police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.”

Louise Kidney says:

“What we’ve actually got with http://www.police.uk is neither one nor the other. Ruth looks like a crime overlord cos of all the crimes happening in her garden and we haven’t got exact point data, but we haven’t got first part of postcode data either e.g. BB5 crimes or NW1 crimes. Instead, we’ve got this weird halfway house thing where it’s not accurate, but its inaccuracy almost renders it useless because we don’t have any idea if every force uses the same parameters when picking these points, we don’t know how they pick their points, we don’t know what we don’t know in terms of whether one house in particular is causing a considerable issue with anti-social behaviour for example, allowing me to go to my local Council and demand they do something about it.”

Adrian Short argues that “What we’re looking at here isn’t a value-neutral scientific exercise in helping people to live their daily lives a little more easily, it’s an explicitly political attempt to shape the terms of a debate around the most fundamental changes in British policing in our lifetimes.”

He adds:

“It’s derived data that’s already been classified, rounded and lumped together in various ways, with a bit of location anonymising thrown in for good measure. I haven’t had a detailed look at it yet but I would caution against trying to use it for anything serious. A whole set of decisions have already transformed the raw source data (individual crime reports) into this derived dataset and you can’t undo them. You’ll just have to work within those decisions and stay extremely conscious that everything you produce with it will be prefixed, “as far as we can tell”.

“£300K for this? There ought to be a law against it.”

UPDATE 2: One frustrated developer has launched CrimeSearch.co.uk to provide “helpful information about crime and policing in your area, without costing 300k of tax payers’ money”

Getting Started With Local Council Spending Data

Investigations tool DocumentCloud goes public (PS: documents drive traffic)

3 Replies

The rather lovely DocumentCloud – a tool that allows journalists to share, annotate, connect and organise documents – has finally emerged from its closet and made itself available to public searches.

This means that anyone can now search the powerful database (some tips here) of newsworthy documents. If you want to add your own, however, you still need approval.

If you do end up on this list you’ll find it’s quite a powerful tool, with quick conversion of PDFs into text files, analytic tools and semantic tagging (so you can connect all documents with a particular person, or organisation) among its best features. The site is open source and has an API too.

I asked Program Director Amanda B Hickman what she’s learned on the project so far. Her response suggests that documents have a particular appeal for online readers:

“If we’ve learned anything, it is that people really love documents. It is pretty clear that when there’s something interesting going on in the news, plenty of people want to dig a little deeper. When Arizona Republic posted an annotated version of that state’s new immigration law, it got more traffic than their weekly entertainment round up. WNYC told us that the page listing the indictments in last week’s mob roundup was still getting more traffic than any other single news story even a week later.

“These were big news documents, to be sure, but it still seems pretty clear that people do want to dig deeper and explore the documents behind the news, which is great for us and great for news.”

A market for data called…

3 Replies

Datamarket.com – it’s just gone live. Looks interesting…

Where do I get that data? New Q&A site launched

3 Replies

Get the Data

Well here’s another gap in the data journalism process ever-so-slightly plugged: Tony Hirst blogs about a new Q&A site that Rufus Pollock has built. Get the Data allows you to “ask your data related questions, including, but not limited to, the following:

“where to find data relating to a particular issue;
“how to query Linked Data sources to get just the data set you require;
“what tools to use to explore a data set in a visual way;
“how to cleanse data or get it into a format you can work with using third party visualisation or analysis tools.”

As Tony explains (the site came out of a conversation between him and Rufus):

“In some cases the data will exist in a queryable and machine readable form somewhere, if only you knew where to look. In other cases, you might have found a data source but lack the query writing expertise to get hold of just the data you want in a format you can make use of.”

He also invites people to help populate the site:

“If you publish data via some sort of API or queryable interface, why not considering posting self-answered questions using examples from your FAQ?

“If you’re running a hackday, why not use GetTheData.org to post questions arising in the scoping the hacks, tweet a link to the question to your event backchannel and give the remote participants a chance to contribute back, at the same time adding to the online legacy of your event.”

Off you go then.

Bootstrapping GetTheData.org for All Your Public Open Data Questions and Answers

A portal for European government data: PublicData.eu plans

5 Replies

The Open Knowledge Foundation have published a blog post with notes on a site they’re developing to gather together data from across Europe. The post notes that the growth of data catalogues at both a national level (mentioning the Digitalisér.dk data portal run by the Danish National IT and Telecom Agency) and “countless city level initiatives across Europe as well – from Helsinki to Munich, Paris to Zaragoza.” with many more initiatives “in the pipeline with plans to launch in the next 6 to 12 months.”

PublicData.eu will, it says:

“Provide a single point of access to open, freely reusable datasets from numerous national, regional and local public bodies throughout Europe.

“[It] will harvest and federate this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the “discoverability problem” of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe.”

What is perhaps even more interesting for journalists is that the site plans to:

“Capture (proposed) edits, annotations, comments and uploads from the broader community of public data users.”

That might include anything from cleaner versions of data, to instances where developers match datasets together, or where users add annotations that add context to a particular piece of information.

Finally there’s a general indication that the site hopes to further lower the bar for data and collaborative journalism by:

“Providing basic data analysis and visualisation tools together with more in-depth resources for those looking to dig deeper into the data. Users will be able to personalise their data browsing experience by being able to save links and create notes and comments on datasets.”

More in the post itself. Worth keeping an eye on.

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.

Category Archives: data journalism

Tell the government what you want from the Public Data Corporation

3 things that BBC Online has given to online journalism

1. Web writing style

2. Editors blogs

3. Backstage

What else?

3 new resources for data journalists

Why journalists should be lobbying over police.uk’s crime data

Getting Started With Local Council Spending Data

Investigations tool DocumentCloud goes public (PS: documents drive traffic)

A market for data called…

Where do I get that data? New Q&A site launched

Bootstrapping GetTheData.org for All Your Public Open Data Questions and Answers

A portal for European government data: PublicData.eu plans