10 principles for data journalism in its second decade

10 principles Data journalism

In 2007 Bill Kovach and Tom Rosenstiel published The Elements of Journalism. With the concept of ‘journalism’ increasingly challenged by the fact that anyone could now publish to mass audiences, their principles represented a welcome platform-neutral attempt to articulate exactly how journalism could be untangled from the vehicles that carried it and the audiences it commanded.

In this extract from a forthcoming book chapter* I attempt to use Kovach and Rosenstiel’s principles (outlined in part 1 here) as the basis for a set that might form a basis for (modern) data journalism as it enters its second and third decades.

Principle 1: Data journalists should strive to interrogate data as a power in its own right

When data journalist Jean-Marc Manach set out to find out how many people had died while migrating to Europe he discovered that no EU member state held any data on migrants’ deaths. As one public official put it, dead migrants “aren’t migrating anymore, so why care?

Similarly, when the BBC sent Freedom of Information requests to mental health trusts about their use of face-down restraint, six replied saying they could not say how often any form of restraint was used — despite being statutorily obliged to “document and review every episode of physical restraint which should include a detailed account of the restraint” under the Mental Health Act 1983.

The collection of data, the definitions used, and the ways that data informs decision making, are all exercises of power in their own right. The availability, accuracy and employment should all be particular focuses for data journalism as we see the expansion of smart cities and wearable technology.

Principle 2: Editorial independence includes technological independence

I wrote in 2013 about the role of coding in ensuring editorial independence, quoting Lawrence Lessig‘s point, made over a decade ago, that code is law:

“Ours is the age of cyberspace. It, too, has a regulator. This regulator, too, threatens liberty. But so obsessed are we with the idea that liberty means “freedom from government” that we don’t even see the regulation in this new space. We therefore don’t see the threat to liberty that this regulation presents.

“This regulator is code—the software and hardware that make cyberspace as it is. This code, or architecture, sets the terms on which life in cyberspace is experienced. It determines how easy it is to protect privacy, or how easy it is to censor speech. It determines whether access to information is general or whether information is zoned. It affects who sees what, or what is monitored. In a host of ways that one cannot begin to see unless one begins to understand the nature of this code, the code of cyberspace regulates.” (Lessig 2006)

The independence of the journalist is traditionally portrayed as possessing the power to resist pressure from our sources, our bosses and business models, and the government and law enforcement. But in a networked age it will also mean independence from the biases inherent in the tools that we use.

From the content management systems that we use, to the mobile devices that record our every move, independence in the 21st century will be increasingly facilitated by being able to ‘hack’ our tools or build our own.

Code affects what information you can access, your ability to verify it, your ability to protect sources — and your ability to empower them. Finally, code affects your ability to engage users.

Code is a key infrastructure that we work in as journalists: if we understand it, we can move across it much more effectively. If it is invisible to us, we cannot adapt it, we cannot scrutinise it. We are, in short, subject to it.

Principle 3: We should strive for objectivity not just in the sources and language that we use, but also the way that we design our tools

europe from moscow

Mapping tools assume a default point of view. Image: Time Magazine via Newberry Library via Jake Ptacek

In data journalism right now we are at a crucial stage: the era during which we move from making stories and tools for other people, to making our own tools.

As John Culkin, in writing about Marshall McLuhan, said:

“We shape our tools, and thereafter they shape us”.

The values which we embed in those tools, the truths we take for granted, will have implications beyond our own generation.

The work of Lawrence Lessig and Nicholas Diakopoulos highlights the role that code plays in shaping the public lives that we can lead; we need to apply the same scrutiny to our own processes.

When we build tools on maps do we embed the prejudices that have been identified by critical cartographers?

Do we seek objectivity in the visual language we use as well as the words that we choose?

But it is not just the tools which will shape our practice: the reorganisation of newsrooms and the creation of data desks and the data journalist’s routine will also begin to draw the borders of what is considered normal in – and what is considered outside of – the role of the data journalist.

Uskali and Kuutti, for example, already identify at least three different models for organising data journalism work practices: data desks, flexible data projects, and the entrepreneur or sub-contractor model. To what extent these models circumscribe or provide opportunities for new ways of doing journalism bears some reflection.

If we are to augment our journalism, we must do so critically.

Principle 4: Impartiality means not relying only on stories where data exists and is easy to obtain

The increasing abundance of data brings with it a new danger: that we do not look beyond what is already accessible, or that we give up too easily if a story does not seem practical.

Just as the expansion of the PR industry in the 20th century led to accusations of ‘churnalism’ in the media, the expansion of data in the 21st century risks leading to ‘data churnalism’ instead of data journalism, including the use of automation and dashboards as a way of dealing with those accessible sources.

Principle 5: We should strive to give a voice to those who are voiceless in data by seeking to create or open up data which would do so

Head icons

When The Guardian’s ‘The Counted’ project sought to report on people killed by police in the US, it was literally seeking to ‘give a voice to the voiceless’ — because those people were dead; they could not speak.

The Bureau of Investigative Journalism‘s Naming the Dead project had a similar objective: tracking and investigating US covert drone strikes since 2011 and seeking to identify those killed.

Neither is an example of data journalism that uses coding: the skills are as basic as keeping a record of every report you can find. And yet this basic process has an important role at the heart of modern journalism: digitising that which did not exist in digital form before: moving from zeroes to ones. You can find more examples in this post about the approach in 2015:

“Too often data journalism is described in ways that focus on the technical act of working with existing data. But to be voiceless often means that no data exists about your experience.”

Principle 6: We retain editorial responsibility for context and breadth of coverage where we provide personalisation

If journalism must provide a forum for public criticism and compromise, what role does personalisation — which gives each person a different experience of the story — play in that?

Some would argue that it contributes to ‘filter bubbles’ whereby people are unaware of the experiences and opinions of people outside of their social circle. But it can also bring people in to stories that they would otherwise not read at all, because those stories would otherwise have no relevance to their lives.

As data journalists, then, we have a responsibility to consider the impact of personalisation and interactivity both in making news relevant to readers, and providing insights into other dimensions of the same story which might not be so directly relevant.

This, of course, has always been journalism’s skill: after all, human interest stories are the ‘universal’ hook that often draws people in to the significant and important.

Principle 7. We should strive to keep the significant interesting and relevant by seeking to find and tell the human story that the data shines a spotlight on

For the same reason, we should ensure that our stories are not merely about numbers, but people. I always tell my MA Data Journalism students that a good story should do two things: tell us why we should care, and tell us why it matters.

Data helps us to establish why a story matters: it connects one person’s story to 100 others like it; without data, a bad experience is merely an anecdote. But without a human story, data becomes just another statistic.

Principle 8. The algorithms in our work – both human and computational – should be open to scrutiny, and iteration

The more that journalism becomes augmented by automation, or facilitated by scripts, the more that we should consider being open to public scrutiny.

If we are unable to explain how we arrived at a particular result, that undermines the credibility of the conclusion.

Diakopoulos and Koliska, who have explored algorithmic transparency in the news media, conclude that it is an area much in need of research, development and experimentation:

“There are aspects of transparency information that are irrelevant to an immediate individual user context, but which are nonetheless of importance in media accountability for a broad public such as fair and uncensored access to information, bias in attention patterns, and other aggregate metrics of, for instance, error rates. In other words, some factors may have bearing on an individual whereas others have import for a larger public. Different disclosure mechanisms, such as periodic reports by ombudspeople may be more appropriate for factors like benchmarks, error analysis, or the methodology of data collection and processing, since they may not be of interest to or even comprehensible for many users yet demanded by those who value an expert account and explanation.”

Principle 9. Sharing our code also allows us to work more efficiently and raise standards

buzzfeed github

It has often been said that transparency is the new objectivity in this new world of limitless publishing. This both recognises that while true objectivity does not exist transparency can help establish what steps we have taken towards coming as close as we can to it.

The AP Stylebook‘s section on data journalism has formally recognised this with its reference to reproducible analysis:

“Providing readers with a roadmap to replicate the analysis we’ve done is an essential element of transparency in our reporting. We can accomplish this transparency in many ways, depending on the data set and the story”

But what is transparency for data journalists? Jennifer Stark and Nicholas Diakopoulos outline principles from scientific research that can be adapted – specifically reproducibility and replicability.

Reproducibility involves making code and data available so a user can rerun the original analysis on the original data. “This is the most basic requirement for checking code and verifying results”

Replicability,  on  the  other  hand,  “requires  achieving  the  same outcome  with independent data  collection,  code  and  analysis.  If the same  outcome  can  be  achieved  with  a  different  sample, experimenters  and analysis software,  then  it  is  more  likely  to  be true.”

Currently the code-sharing site GitHub is the place where many data teams share their code so that others can reproduce their analysis. It is incredible to look across the GitHub repositories of FiveThirtyEight or BuzzFeed and understand how the journalism was done. It also acts as a way to train and attract future talent into the industry, either formally as employees, or informally as contributors.

Principle 10. We should seek to empower citizens to exercise their rights and responsibilities

new york times you draw it

The New York Times You Draw It challenges users to take an active role

The final principle mirrors Kovach and Rosenstiel’s: the obligation on the public to take some responsibility for journalism too. And it is here, perhaps, where data journalism has the most significant role to play.

Because where Kovach and Rosenstiel put the onus on the public, I believe that data journalism is well positioned to do more, and to actively empower that public to exercise those rights and responsibilities.

A New York Times interactive which invites the user to draw a line chart before revealing how close they were to the true trend is precisely the sort of journalism which helps users engage with their own role in negotiating information.

A tool which allows you to write to your local representative, or to submit a Freedom of Information request, is one which positions the reader not as a passive consumer of news, but as an active participant in the world that they are reading about.

In print and on air we could only arm our audiences with information, and hope that they use it wisely. Online we can do much more — and we’ve only just begun.

*You can read all three extended extracts from the book chapter under the tag ‘Next Wave of Data Journalism’ here.

Comments and responses



One thought on “10 principles for data journalism in its second decade

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s