Tag Archives: Charles Arthur

Let’s explode the myth that data journalism is ‘resource intensive’

"Data Journalism is very time consuming, needs experts, is hard to do with shrinking news rooms" Eva Linsinger, Profil

Is data journalism ‘time consuming’ or ‘resource intensive’? The excuse – and I think it is an excuse – seems to come up at an increasing number of events whenever data journalism is discussed. “It’s OK for the New York Times/Guardian/BBC,” goes the argument. “But how can our small team justify the resources – especially in a time of cutbacks?

The idea that data journalism inherently requires extra resources is flawed – but understandable. Spectacular interactives, large scale datasets and investigative projects are the headliners of data journalism’s recent history. We have oohed and aahed over what has been achieved by programmer-journalists and data sleuths…

But that’s not all there is.

Continue reading

2 guest posts: 2012 predictions and “Social media and the evolution of the fourth estate”

Memeburn logo

I’ve written a couple of guest posts for Nieman Journalism Lab and the tech news site Memeburn. The Nieman post is part of a series looking forward to 2012. I’m never a fan of futurology so I’ve cheated a little and talked about developments already in progress: new interface conventions in news websites; the rise of collaboration; and the skilling up of journalists in data.

Memeburn asked me a few months ago to write about social media’s impact on journalism’s role as the Fourth Estate, and it took me until this month to find the time to do so. Here’s the salient passage:

“But the power of the former audience is a power that needs to be held to account too, and the rise of liveblogging is teaching reporters how to do that: reacting not just to events on the ground, but the reporting of those events by the people taking part: demonstrators and police, parents and politicians all publishing their own version of events — leaving journalists to go beyond documenting what is happening, and instead confirming or debunking the rumours surrounding that.

“So the role of journalist is moving away from that of gatekeeper and — as Axel Bruns argues — towards that of gatewatcher: amplifying the voices that need to be heard, factchecking the MPs whose blogs are 70% fiction or the Facebook users scaremongering about paedophiles.

“But while we are still adapting to this power shift, we should also recognise that that power is still being fiercely fought-over. Old laws are being used in new waysnew laws are being proposed to reaffirm previous relationships. Some of these may benefit journalists — but ultimately not journalism, nor its fourth estate role. The journalists most keenly aware of this — Heather Brooke in her pursuit of freedom of information; Charles Arthur in his campaign to ‘Free Our Data’ — recognise that journalists’ biggest role as part of the fourth estate may well be to ensure that everyone has access to information that is of public interest, that we are free to discuss it and what it means, and that — in the words of Eric S. Raymond — “Given enough eyeballs, all bugs are shallow“.”

Comments, as always, very welcome.

Is Ice Cream Strawberry? Part 4: Human Capital

This is the fourth part of my inaugural lecture at City University London, ‘Is Ice Cream Strawberry?’. You can find part one here, part two here, and part three here.

Human capital

So here’s person number 4: Gary Becker, a Nobel prize-winning economist.

Fifty years ago he used the phrase ‘human capital’ to refer to the economic value that companies should ascribe to their employees.

These days, of course, it is common sense to invest time in recruiting, training and retaining good employees. But at the time employees were seen as a cost.

We need a similar change in the way we see our readers – not as a cost on our time but as a valuable part of our operations that we should invest in recruiting, developing and retaining. Continue reading

Why journalists should be lobbying over police.uk’s crime data

UK police crime maps

Conrad Quilty-Harper writes about the new crime data from the UK police force – and in the process adds another straw to the groaning camel’s back of the government’s so-called transparency agenda:

“It’s useless to residents wanting to find out what was going on at the house around the corner at 3am last night, and it’s useless to individuals who want to build mobile phone applications on top of the data (perhaps to get a chunk of that £6 billion industry open data is supposed to create).

“The site’s limitations are as follows:

  • No IDs for crimes: what if I want to check whether real life crimes have made it onto the map? Sorry.
  • Six crime categories: including “other crimes”, everything from drug dealing to bank robberies in one handy, impossible to understand category.
  • No live data: you mean I have to wait until the end of the next month to see this month’s criminality?!
  • No dates or times: funny how without dates and times I can’t tell which police manager was in charge.
  • Case status: the police know how many crimes go solved or unsolved, why not tell us this?”

This is why people are so concerned about the Public Data Corporation. This is why we need to be monitoring exactly what spending data councils release, and in what format. And this is why we need to continue to press for the expansion of FOI laws. This is what we should be doing. Are we?

UPDATE: Will Perrin has FOI’d all correspondence relating to ICO advice on the crime maps. Jonathan Raper has a list of further flaws including:

  • Some data such as sexual offences and murder is removed – even though it would be easy to discover and locate from other police reports.
  • Data covers reported crimes rather than convictions, so some of it may turn out not to be crime.
  • The levels of policing are not provided, so that two areas with the “same” crime levels may in fact have “radically different” experiences of crime and policing.

Charles Arthur notes that: “Police forces have indicated that whenever a new set of data is uploaded – probably each month – the previous set will be removed from public view, making comparisons impossible unless outside developers actively store it.”

Louise Kidney says:

“What we’ve actually got with http://www.police.uk is neither one nor the other. Ruth looks like a crime overlord cos of all the crimes happening in her garden and we haven’t got exact point data, but we haven’t got first part of postcode data either e.g. BB5 crimes or NW1 crimes. Instead, we’ve got this weird halfway house thing where it’s not accurate, but its inaccuracy almost renders it useless because we don’t have any idea if every force uses the same parameters when picking these points, we don’t know how they pick their points, we don’t know what we don’t know in terms of whether one house in particular is causing a considerable issue with anti-social behaviour for example, allowing me to go to my local Council and demand they do something about it.”

Adrian Short argues that “What we’re looking at here isn’t a value-neutral scientific exercise in helping people to live their daily lives a little more easily, it’s an explicitly political attempt to shape the terms of a debate around the most fundamental changes in British policing in our lifetimes.”

He adds:

“It’s derived data that’s already been classified, rounded and lumped together in various ways, with a bit of location anonymising thrown in for good measure. I haven’t had a detailed look at it yet but I would caution against trying to use it for anything serious. A whole set of decisions have already transformed the raw source data (individual crime reports) into this derived dataset and you can’t undo them. You’ll just have to work within those decisions and stay extremely conscious that everything you produce with it will be prefixed, “as far as we can tell”.

“£300K for this? There ought to be a law against it.”

UPDATE 2: One frustrated developer has launched CrimeSearch.co.uk to provide “helpful information about crime and policing in your area, without costing 300k of tax payers’ money”

Where should an aspiring data journalist start?

In writing last week’s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I’d publish them in full here.

The Telegraph’s Conrad Quilty-Harper:

Start reading:

http://www.google.com/reader/bundle/user%2F06076274130681848419%2Fbundle%2Fdatavizfeeds

Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.

Look for sources of data:

ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).

Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts. Continue reading

Why did you get into data journalism?

In researching my book chapter (UPDATE: now published) I asked a group of journalists who worked with data what led them to do so. Here are their answers:

Jonathon Richards, The Times:

The flood of information online presents an amazing opportunity for journalists, but also a challenge: how on earth does one keep up with; make sense of it? You could go about it in the traditional way, fossicking in individual sites, but much of the journalistic value in this outpouring, it seems, comes in aggregation: in processing large amounts of data, distilling them, and exploring them for patterns. To do that – unless you’re superhuman, or have a small army of volunteers – you need the help of a computer.

I ‘got into’ data journalism because I find this mix exciting. It appeals to the traditional journalistic instinct, but also calls for a new skill which, once harnessed, dramatically expands the realm of ‘stories I could possibly investigate…’ Continue reading

Data journalism pt2: Interrogating data

This is a draft from a book chapter on data journalism (the first, on gathering data, is here). I’d really appreciate any additions or comments you can make – particularly around ways of spotting stories in data, and mistakes to avoid.

UPDATE: It has now been published in The Online Journalism Handbook.

“One of the most important (and least technical) skills in understanding data is asking good questions. An appropriate question shares an interest you have in the data, tries to convey it to others, and is curiosity-oriented rather than math-oriented. Visualizing data is just like any other type of communication: success is defined by your audience’s ability to pick up on, and be excited about, your insight.” (Fry, 2008, p4)

Once you have the data you need to see if there is a story buried within it. The great advantage of computer processing is that it makes it easier to sort, filter, compare and search information in different ways to get to the heart of what – if anything – it reveals. Continue reading

Every news organisation should have a Datastore

You may know about The Guardian’s Datastore: a compilation of “publicly-available data for you to use free” that’s been around for a few months now. You know the sort of thing: university tables; MPs’ expensestax paid by the FTSE 100.

It has already produced some great work from what I once described as the “Technician” variant of distributed journalism

But a column by Charles Arthur recently was the first example I’ve seen of Datastore being used for, well, more ordinary data – the sort of information journalists deal with every week. Here’s how it appeared in print:

“I dug up the figures from the UK music industry: the British record industry’s trade association (the BPI), and the UK games industry (via its trade body, Elspa) as well as the DVD industry (through the UK Film Council and the British Video Association). The results are over on the Guardian Data Store (http://bit.ly/data01), because they are the sort of numbers that should be available to everyone to chew over.

“What did I find? Total spending has grown – but music spending is being squeezed. The games industry – hardware and software – has grown from £1.4bn in 1999 (the year Napster started, and the music business stood rabbit-transfixed) to £4.04bn in 2008. That’s 12% annual compound growth. You’d kill for an endowment like that. Even DVD sales and rental take a £2.5bn bite out of consumers’ available funds, double that of 1999.

“So the music industry’s deadliest enemy isn’t filesharing – it’s the likes of Nintendo, Microsoft and Sony, and a zillion games publishers.”

That link (which frustratingly isn’t active in the online article) takes you to a Datablog post by Arthur which in turn links to a rather simple spreadsheet. 

And it’s the simplicity that I think is important.

It’s one thing to link to huge datasets that benefit from lots of eyeballs looking for stories, or perming the data in different ways.

But it’s something else to link to the more everyday figures journalists deal with; to show your sums, in short.

Is this a natural extension of the blogging culture of linking to your sources? I think it is. And the more journalists get used to publishing their work on the likes of Google Spreadsheets, the better journalism we will get.

So why aren’t more journalists doing it? And why aren’t more news organisations providing a place for them to do it? Or are they? I’d love to know of any other individual or organisational examples.

‘Journalists: learn to code’ says Guardian’s Arthur

Charles Arthur of The Guardian makes his point pretty plain: “If I had one piece of advice to a journalist starting out now, it would be: learn to code”

“Let’s be clear that I’m not saying “code” as in “get deep into C++ or Java” … I mean it in the sense of having a nodding acquaintance with methods of programming, and perhaps a few languages, so that when something comes along where you’ll need, say, to transform data from one form to another, you can. Or where you need to make your own life easier by automating some process or other.

” … None of which is saying you shouldn’t be talking to your sources, and questioning what you’re told, and trying to find other means of finding stuff out from people. But nowadays, computers are a sort of primary source too. You’ve got to learn to interrogate them effectively – and quote them meaningfully – too.”

Amen to that.