"The mass market was a hack": Data and the future of journalism

The following is an unedited version of an article written for the International Press Institute report ‘Brave News Worlds (PDF)

For the past two centuries journalists have dealt in the currency of information: we transmuted base metals into narrative gold. But information is changing.

At first, the base metals were eye witness accounts, and interviews. Later we learned to melt down official reports, research papers, and balance sheets. And most recently our alloys have been diluted by statements and press releases.

But now journalists are having to get to grips with a new type of information: data. And this is a very rich seam indeed.

Data: what, how and why

Data is a broad term so I should define it here: I am not talking here about statistics or numbers in general, because those are nothing new to journalists. When I talk about data I mean information that can be processed by computers.

This is a crucial distinction: it is one thing for a journalist to look at a balance sheet on paper; it is quite another to be able to dig through those figures on a spreadsheet, or to write a programming script to analyse that data, and match it to other sources of information. We can also more easily analyse new types of data, such as live data, large amounts of text, user behaviour patterns, and network connections.

And that, for me, is hugely important. Indeed, it is potentially transformational. Adding computer processing power to our journalistic arsenal allows us to do more, faster, more accurately, and with others. All of which opens up new opportunities – and new dangers. Things are going to change.

We’ve had over 40 years to see this coming. The growth of the spreadsheet and the database from the 1960s onwards kicked things off by making it much easier for organisations – including governments – to digitise information from what they spent our money on to how many people were being treated for which diseases, and where.

In the 1990s the invention of the world wide web accelerated the data at journalists’ disposal by providing a platform for those spreadsheets and databases to be published and accessed by both humans and computer programs – and a network to distribute it.

And now two cultural movements have combined to add a political dimension to the spread of data: the open data movement, and the linked data movement. Journalists should be familiar with these movements: the arguments that they have developed in holding power to account are a lesson in dealing with entrenched interests, while their experiments with the possibilities of data journalism show the way forward.

While the open data movement campaigns for important information – such as government spending, scientific information and maps – to be made publicly available for the benefit of society both democratically and economically, the linked data movement (championed by the inventor of the web, Sir Tim Berners-Lee) campaigns for that data to be made available in such a way that it can be linked to other sets of data so that, for instance, a computer can see that the director of a company named in a particular government contract is the same person who was paid as a consultant on a related government policy document. Advocates argue that this will also result in economic and social benefits.

Concrete results of both movements can be seen in the US and UK – most visibly with the launch of government data repositories Data.gov and Data.gov.uk in 2009 and 2010 respectively – but also less publicised experiments such as Where Does My Money Go? – which uses data to show how public expenditure is distributed – and Mapumental – which combines travel data, property prices and public ratings of ‘scenicness’ to help you see at a glance which areas of a city might be the best place to live based on your requirements.

But there are dozens if not hundreds of similar examples in industries from health and science to culture and sport. We are experiencing an unprecedented release of data – some have named it ‘Big Data’ – and yet for the most part, media organisations have been slow to react.

That is about to change.

The data journalist

Over the last year an increasing number of news organisations have started to wake from their story-centric production lines and see the value of data. In the UK the MPs’ expenses story was seminal: when a newspaper dictates the news agenda for six weeks, the rest of Fleet Street pays attention – and at the core of this story was a million pieces of data on a disc. Since then every serious news organisation has expanded its data operations.

In the US the journalist-programmer Adrian Holovaty has pioneered the form with the data mashup ChicagoCrime.org and its open source offspring Everyblock, while Aron Pilhofer has innovated at the interactive unit at The New York Times, and new entrants from Talking Points Memo to ProPublica have used data as a launchpad for interrogating the workings of government.

To those involved, it feels like heady days. In reality, it’s very early days indeed. Data journalism takes in a huge range of disciplines, from Computer Assisted Reporting (CAR) and programming, to visualisation and statistics. If you are a journalist with a strength in one of those areas, you are currently exceptional. This cannot last for long: the industry will have to skill up, or it will have nothing left to sell.

Because while news organisations for years made a business out of being a middleman processing content between commerce and consumers, and government and citizens, the internet has made that business model obsolete. It is not enough any more for a journalist to simply be good at writing – or rewriting. There are a million others out there who can write better – large numbers of them working in PR, marketing, or government. While we will always need professional storytellers, many journalists are simply factory line workers.

So on a commercial level if nothing else, publishing will need to establish where the value lies in this new environment – and the new efficiencies to make journalism viable.

Data journalism is one of those areas. With a surfeit of public data being made available, there is a rich supply of raw material. The scarcity lies in the skills to locate and make sense of that – whether the programming skills to scrape it and compare it with other sources in the first place, the design flair to visualise it, or the statistical understanding to unpick it.

“The mass market was a hack”: opportunities for the new economy

The technological opportunity is massive. As processing power continues to grow, the ability to interrogate, combine and present data continues to increase. The development of augmented reality provides a particularly attractive publishing opportunity: imagine being able to see local data-based stories through your mobile phone, or indeed add data to the picture through your own activity. The experiments of the past five years will come to see crude in comparison.

And then there is the commercial opportunity. Publishing is for most publishers, after all, not about selling content but about selling advertising. And here also data has taken on increasing importance. The mass market was a hack. As the saying goes: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.”

But Google, Facebook and others have used the measurability of the web to reduce the margin of error, and publishers will have to follow suit. It makes sense to put data at the centre of that – while you allow users to drill into the data you have gathered around automotive safety, the offering to advertisers is likely to say “We can display different adverts based on what information the user is interested in”, or “We can point the user to their local dealership based on their location”.

A collaborative future

I’m skeptical of the ability of established publishers to adapt to such a future but, whether they do or not, others will. And the backgrounds of journalists will have to change. The profession has a history of arts graduates who are highly literate but not typically numerate. That has already been the source of ongoing embarrassment for the profession as expert bloggers have highlighted basic errors in the way journalists cover science, health and finance – and it cannot continue.

We will need more journalists who can write a killer Freedom of Information request; more researchers with a knowledge of the hidden corners of the web where databases – the ‘invisible web’ – reside. We will need programmer-journalists who can write a screen scraper to acquire, sort, filter and store that information, and combine or compare it with other sources. We will need designers who can visualise that data in the clearest way possible – not just for editorial reasons but distribution too: infographics are an increasingly significant source of news site traffic.

There is a danger of ‘data churnalism’ – taking public statistics and visualising them in a spectacular way that lacks insight or context. Editors will need the statistical literacy to guard against this, or they will be found out.

And it is not just in editorial that innovation will be needed. Advertising sales will need to experience the same revolution that journalists have experienced, learning the language of web metrics, behavioural advertising and selling the benefits to advertisers.

And as publishers of data too, executives will need to adopt the philosophies of the open data and linked data movements to take advantage of the efficiencies that they provide. The New York Times and The Guardian have both published APIs that allow others to build web services with their content. In return they get access to otherwise unaffordable technical, mathematical and design expertise, and benefit from new products and new audiences, as (in the Guardian’s case) advertising is bundled in with the service. As these benefits become more widely recognised, other publishers will follow.

I have a hope that this will lead to a more collaborative form of journalism. The biggest resource a publisher has is its audience. Until now publishers have simply packaged up that resource for advertisers. But now that the audience is able to access the same information and tools as journalists, to interact with publishers and with each other, they are valuable in different ways.

At the same time the value of the newsroom has diminished: its size has shrunk, its competitive advantage reduced; and no single journalist has the depth and breadth of skillset needed across statistics, CAR, programming and design that data journalism requires. A new medium – and a new market – demands new rules. The more networked and iterative form of journalism that we’ve already seen emerge online is likely to become even more conventional as publishers move from a model that sees the story as the unit of production, to a model that starts with data.


9 thoughts on “"The mass market was a hack": Data and the future of journalism

  1. David Dunkley Gyimah

    Well smelted Paul and congrats on your City appointment.

    I shared a podium with Adrian in 2005 at the National Press Club in DC for the Batten Awards. He won, I was runner up and had my first chance to see how transformative grabbing raw data and finding a visual methodology could be highly interpretive.

    It’s not that we hadn’t seen the use of data-interpreted graphics; I’m still haunted by Newsweek’s illustration of the Neutron bomb from the 80s. But here, now, the numbers empirically wrote the story, avoiding the use of good judgment guesswork.

    The FOI in the UK, notwithstanding individual investigative work, should as you allude to provide the data, but as the recipient devising a tangible and measurable way might, I believe, be an issue for a while.

    Primarily, in concurring, that journalism is not a vocation/career that traditionally draws in programmers or those with statistical data reading skills. I have probably counted a handful on courses I have been privy to.

    Granted my degree was a while ago, but as an Applied Chemist in a room full of Lit grads? You can see why the Adrians are special.

    You probably have your own tales of what it can sometimes be like, getting students to set up a blog. Though that’s changing with some uni courses mooting (and doing??) the idea of computing and journalism

    But I believe the market place will pick up this challenge, not least because there’s revenue. So just as bespoke templates for blogs, Apps etc. have become available eschewing any dense or even scripting language from the user, there may well be a simplification of the data-in; decipherable info-out to cause a melting point.

    The availability of the Ushahidi platform veers towards that example.

    In your journeys though I wonder what your feedback has been talking to editors to allow programmers et al, nominally the techs upstairs, to share the same space as journalists from the get-go including editorial meetings.

    My experience is it happens sparingly; request forms still the norm.
    And that once an outfit undertakes drilling data it provides a level of transparency and sourcing for others to interrogate at will.

    I guess guide lines are still be drawn up.

    Happy days 🙂

    Cheers David
    ps Had a good natter with George (City) today when your name cropped up

  2. Pingback: links for 2010-09-24 « Köszönjük, Emese!

  3. Kingsley Idehen

    Awesome post! Absolutely spot on!

    I would extend the concept of “data journalist” to include “citizen analyst”. Like the blogosphere inflection, a new era is upon us that will combine the skills of data analysts and journalists — individuals or loosely coupled cooperatives on the InterWeb devoid of any geographic boundaries.

    Help me remind your colleagues in the newspaper business that: they have always been database curators, so the model tweak comes down to dispatching up they value via high fidelity Linked Data Spaces rather than Paper Cups 🙂

    The hack is getting a major fix!


  4. Pingback: Are journalists forgetting the telephone? « The ContentETC Blog

  5. Pingback: The Alchemy of Information « (Re)Structuring Journalism

  6. Pingback: Jordanguy's Blog

  7. Pingback: links for 2010-10-18 « pabwall_local

  8. Pingback: links for 2010-09-24 « pabwall_local

  9. Pingback: Bad data PR: how the NSPCC sunk to a new low in data churnalism | Online Journalism Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.