Tag Archives: Charles Arthur

Let’s explode the myth that data journalism is ‘resource intensive’

"Data Journalism is very time consuming, needs experts, is hard to do with shrinking news rooms" Eva Linsinger, Profil

Is data journalism ‘time consuming’ or ‘resource intensive’? The excuse – and I think it is an excuse – seems to come up at an increasing number of events whenever data journalism is discussed. “It’s OK for the New York Times/Guardian/BBC,” goes the argument. “But how can our small team justify the resources – especially in a time of cutbacks?

The idea that data journalism inherently requires extra resources is flawed – but understandable. Spectacular interactives, large scale datasets and investigative projects are the headliners of data journalism’s recent history. We have oohed and aahed over what has been achieved by programmer-journalists and data sleuths…

But that’s not all there is.

Continue reading

2 guest posts: 2012 predictions and “Social media and the evolution of the fourth estate”

Memeburn logo

I’ve written a couple of guest posts for Nieman Journalism Lab and the tech news site Memeburn. The Nieman post is part of a series looking forward to 2012. I’m never a fan of futurology so I’ve cheated a little and talked about developments already in progress: new interface conventions in news websites; the rise of collaboration; and the skilling up of journalists in data.

Memeburn asked me a few months ago to write about social media’s impact on journalism’s role as the Fourth Estate, and it took me until this month to find the time to do so. Here’s the salient passage:

“But the power of the former audience is a power that needs to be held to account too, and the rise of liveblogging is teaching reporters how to do that: reacting not just to events on the ground, but the reporting of those events by the people taking part: demonstrators and police, parents and politicians all publishing their own version of events — leaving journalists to go beyond documenting what is happening, and instead confirming or debunking the rumours surrounding that.

“So the role of journalist is moving away from that of gatekeeper and — as Axel Bruns argues — towards that of gatewatcher: amplifying the voices that need to be heard, factchecking the MPs whose blogs are 70% fiction or the Facebook users scaremongering about paedophiles.

“But while we are still adapting to this power shift, we should also recognise that that power is still being fiercely fought-over. Old laws are being used in new waysnew laws are being proposed to reaffirm previous relationships. Some of these may benefit journalists — but ultimately not journalism, nor its fourth estate role. The journalists most keenly aware of this — Heather Brooke in her pursuit of freedom of information; Charles Arthur in his campaign to ‘Free Our Data’ — recognise that journalists’ biggest role as part of the fourth estate may well be to ensure that everyone has access to information that is of public interest, that we are free to discuss it and what it means, and that — in the words of Eric S. Raymond — “Given enough eyeballs, all bugs are shallow“.”

Comments, as always, very welcome.

Is Ice Cream Strawberry? Part 4: Human Capital

This is the fourth part of my inaugural lecture at City University London, ‘Is Ice Cream Strawberry?’. You can find part one here, part two here, and part three here.

Human capital

So here’s person number 4: Gary Becker, a Nobel prize-winning economist.

Fifty years ago he used the phrase ‘human capital’ to refer to the economic value that companies should ascribe to their employees.

These days, of course, it is common sense to invest time in recruiting, training and retaining good employees. But at the time employees were seen as a cost.

We need a similar change in the way we see our readers – not as a cost on our time but as a valuable part of our operations that we should invest in recruiting, developing and retaining. Continue reading

Why journalists should be lobbying over police.uk’s crime data

UK police crime maps

Conrad Quilty-Harper writes about the new crime data from the UK police force – and in the process adds another straw to the groaning camel’s back of the government’s so-called transparency agenda:

“It’s useless to residents wanting to find out what was going on at the house around the corner at 3am last night, and it’s useless to individuals who want to build mobile phone applications on top of the data (perhaps to get a chunk of that £6 billion industry open data is supposed to create).

“The site’s limitations are as follows:

  • No IDs for crimes: what if I want to check whether real life crimes have made it onto the map? Sorry.
  • Six crime categories: including “other crimes”, everything from drug dealing to bank robberies in one handy, impossible to understand category.
  • No live data: you mean I have to wait until the end of the next month to see this month’s criminality?!
  • No dates or times: funny how without dates and times I can’t tell which police manager was in charge.
  • Case status: the police know how many crimes go solved or unsolved, why not tell us this?”

This is why people are so concerned about the Public Data Corporation. This is why we need to be monitoring exactly what spending data councils release, and in what format. And this is why we need to continue to press for the expansion of FOI laws. This is what we should be doing. Are we?

UPDATE: Will Perrin has FOI’d all correspondence relating to ICO advice on the crime maps. Jonathan Raper has a list of further flaws including:

  • Some data such as sexual offences and murder is removed – even though it would be easy to discover and locate from other police reports.
  • Data covers reported crimes rather than convictions, so some of it may turn out not to be crime.
  • The levels of policing are not provided, so that two areas with the “same” crime levels may in fact have “radically different” experiences of crime and policing.

Charles Arthur notes that: “Police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.”

Louise Kidney says:

“What we’ve actually got with http://www.police.uk is neither one nor the other. Ruth looks like a crime overlord cos of all the crimes happening in her garden and we haven’t got exact point data, but we haven’t got first part of postcode data either e.g. BB5 crimes or NW1 crimes. Instead, we’ve got this weird halfway house thing where it’s not accurate, but its inaccuracy almost renders it useless because we don’t have any idea if every force uses the same parameters when picking these points, we don’t know how they pick their points, we don’t know what we don’t know in terms of whether one house in particular is causing a considerable issue with anti-social behaviour for example, allowing me to go to my local Council and demand they do something about it.”

Adrian Short argues that “What we’re looking at here isn’t a value-neutral scientific exercise in helping people to live their daily lives a little more easily, it’s an explicitly political attempt to shape the terms of a debate around the most fundamental changes in British policing in our lifetimes.”

He adds:

“It’s derived data that’s already been classified, rounded and lumped together in various ways, with a bit of location anonymising thrown in for good measure. I haven’t had a detailed look at it yet but I would caution against trying to use it for anything serious. A whole set of decisions have already transformed the raw source data (individual crime reports) into this derived dataset and you can’t undo them. You’ll just have to work within those decisions and stay extremely conscious that everything you produce with it will be prefixed, “as far as we can tell”.

“£300K for this? There ought to be a law against it.”

UPDATE 2: One frustrated developer has launched CrimeSearch.co.uk to provide “helpful information about crime and policing in your area, without costing 300k of tax payers’ money”

Where should an aspiring data journalist start?

In writing last week’s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I’d publish them in full here.

The Telegraph’s Conrad Quilty-Harper:

Start reading:

http://www.google.com/reader/bundle/user%2F06076274130681848419%2Fbundle%2Fdatavizfeeds

Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.

Look for sources of data:

ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).

Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts. Continue reading

Why did you get into data journalism?

In researching my book chapter (UPDATE: now published) I asked a group of journalists who worked with data what led them to do so. Here are their answers:

Jonathon Richards, The Times:

The flood of information online presents an amazing opportunity for journalists, but also a challenge: how on earth does one keep up with; make sense of it? You could go about it in the traditional way, fossicking in individual sites, but much of the journalistic value in this outpouring, it seems, comes in aggregation: in processing large amounts of data, distilling them, and exploring them for patterns. To do that – unless you’re superhuman, or have a small army of volunteers – you need the help of a computer.

I ‘got into’ data journalism because I find this mix exciting. It appeals to the traditional journalistic instinct, but also calls for a new skill which, once harnessed, dramatically expands the realm of ‘stories I could possibly investigate…’ Continue reading

Data journalism pt2: Interrogating data

This is a draft from a book chapter on data journalism (the first, on gathering data, is here). I’d really appreciate any additions or comments you can make – particularly around ways of spotting stories in data, and mistakes to avoid.

UPDATE: It has now been published in The Online Journalism Handbook.

“One of the most important (and least technical) skills in understanding data is asking good questions. An appropriate question shares an interest you have in the data, tries to convey it to others, and is curiosity-oriented rather than math-oriented. Visualizing data is just like any other type of communication: success is defined by your audience’s ability to pick up on, and be excited about, your insight.” (Fry, 2008, p4)

Once you have the data you need to see if there is a story buried within it. The great advantage of computer processing is that it makes it easier to sort, filter, compare and search information in different ways to get to the heart of what – if anything – it reveals. Continue reading