In an extended extract from the forthcoming second edition of the Data Journalism Handbook, I look at the different types of impact that data journalism can have, and how can better think about it.
If you’ve not seen Spotlight, the film about the Boston Globe’s investigation into institutional silence over child abuse, then you should watch it right now. More to the point — you should watch right through to the title cards right at the end.
In an epilogue to the film — this is a story about old-school-style data journalism, by the way — a list scrolls down the screen. It details the dozens and dozens of places where abuse scandals have been uncovered since the events of the film, from Akute, Nigeria, to Wollongong, Australia.
But the title cards also cause us to pause in our celebrations: one of the key figures involved in the scandal, it says, was reassigned to “one of the highest ranking Roman Catholic churches in the world.”
This is the challenge of impact in data journalism: is raising awareness of a problem “impact”? A mass audience, a feature film? Does the story have to result in penalties for those responsible for bad things? Or visible policy change? Is all impact good impact?
Data journalism’s history is riddled with tales of impact: on the world around it, and on journalism itself (summed up best in Nate Silver’s 2012 attack, after he successfully predicted the outcome of every state in the US election, that political punditry was mostly “fundamentally useless”).
But what sort of impact are we talking about?
Within the industry impact is often difficult to quantify — even for a discipline which is about quantification. Perhaps the easiest measure is sheer reach: data-driven interactives like the New York Times’s ‘How Y’All, Youse and You Guys Talk’ and the BBC’s ‘7 billion people and you: What’s your number?’ both went viral and engaged millions of readers; while at one point in 2012 Nate Silver’s data journalism was reaching one in five visitors to the New York Times.
Some journalists will sneer at such crude measures — but they are important. If traditional journalism was sometimes criticised for aiming to impress its peers with overwritten features, modern journalism is at least expected to prove that it can connect with its audience, and when a story ‘touches a nerve’ (we call it ‘going viral’ these days) it can tell us something about that audience too.
Beyond reach, data journalism has a track record of high engagement. Reach (formerly Trinity Mirror) publications have noted that simply adding a piece of data visualisation to a page increased dwell time by a third, and the potential for interactivity opened up by data can entice users to do more than just passively scan an article: in 2015 David Higgerson noted that more than 200,000 people put their postcodes into an interactive widget by their data team based on deprivation statistics — far higher, he said, “than I would imagine [for] a straight-forward ‘data tells us x’ story.”
But while reach and engagement might give us quick, easy fixes in the newsroom and generate helpful headlines for the company roundup, as journalists we want to do more — we want to report stories that result in real change.
Change as impact
That change can take any number of forms. And the most obvious is explicitly political — a policy change.
In the UK perhaps the best known example is the series of stories around the UK’s expenses scandal. Not only did this result in enormous reach and dominate the news agenda for weeks in the UK, it also led to the formation of a new body, the Independent Parliamentary Standards Authority (IPSA), which in addition to setting the level of politicians’ salaries publishes open data on politicians’ expense claims. This new data source has in turn led to further stories.
But policy can be much broader than politics. The lending policies of banks affect millions of people, and were famously held to account in the late-1980s in the US by Bill Dedman in his Pulitzer-winning “Color of Money” series of articles.
In identifying racially divided loan practices (“redlining”) the data-driven investigation also led to political, financial and legal change, with probes, new financing, lawsuits and the passing of new laws among the follow-ups.
Fast-forward 30 years and you can see a very modern version of this approach: ProPublica’s Machine Bias series shines a light on algorithmic accountability, while the Bureau Local tapped into its network to crowdsource information on algorithmically targeted ‘dark ads’ on social media.
Both have helped contribute to change in a number of Facebook’s policies, while ProPublica’s methods were adopted by a fair housing group in establishing the basis for a lawsuit against the social network.
As the policies of algorithms become increasingly powerful in our lives — from influencing the allocation of police to Uber pricing in non-white areas — holding these to account has already become as important as holding governments to account.
Changing what we count, how we count it, and whether we get it right
Advanced technical skills are not necessarily required to create a story with impact. One of the longest-running data journalism projects, the Bureau of Investigative Journalism’s Drone Warfare project has been tracking US drone strikes for over 5 years. Its core methodology boils down to one word: persistence. Bureau reporters monitor news reports, press releases and documents and over time have turned those ‘free text’ reports into a structured dataset that can be analysed, searched, and queried.
That data — complemented by interviews with sources — has been used by NGOs and the Bureau has submitted written evidence to the UK’s Defence Committee.
Counting the uncounted is a particularly important way that data journalism can make an impact — indeed, it is probably fair to say that it is data journalism’s equivalent of ‘giving a voice to the voiceless’.
The Migrants Files, a project involving journalists from over 15 countries, was started after data journalists noted that there was “no usable database of people who died in their attempt to reach or stay in Europe.” Its impact has been to force other agencies into action: “Since we published our first results in 2014,” they report, “the International Organization for Migration and others have started their own data collection operations.”
Even when a government appears to be counting something, it can be worth investigating. While working with the BBC England Data Unit on an investigation into the scale of library cuts, for example, I experienced a moment of panic when I saw that a question was being asked in Parliament for data about the issue. Would the response scoop the months of work we had been doing?
In fact, it didn’t — instead, it established that the government itself knew less than we did about the true scale of those cuts, because they hadn’t undertaken the depth of investigation that we had.
And sometimes the impact lies not in the mere existence of data, but in its representation: one project by Mexican newspaper El Universal, Ausencias Ignoradas (Ignored Absences), puts a face to over 4,500 women who have gone missing in the country in a decade. The data was there, but it hadn’t been broken down to a ‘human’ level.
Libération’s Meurtres conjugaux, des vies derrière les chiffres does the same thing for domestic murders of women, and Ceyda Ulukaya’s Kadin Cinayetleri project has mapped femicides in Turkey.
Sometimes impact is cultural: when I launched the experimental crowdsourced investigative journalism project Help Me Investigate in 2009 it was with the aim to establish that collaborative methods could be used routinely in the field (one senior US journalist insisted to me that an early example of crowdsourcing at the Florida News Press investigation was “lucky” and “would never happen again”).
Almost a decade on, collaborative models have become widely accepted, and networks are involved in some of the biggest stories — from the GIJN’s work on the Panama Papers at an international level, to The Bureau Local and BBC Shared Data Unit at a local level in the UK.
The impact of these projects comes not just in the effect of their stories on the policies of governments and businesses, but in the policies of newsrooms themselves, which are becoming increasingly open to contributors outside of the newsroom (who bring technical expertise, subject insights, and insider knowledge).
Not only that, but these contributors from outside the newsroom bring their own impact too, helping encourage journalists to practise an open-source ethos in working with others and sharing their methodology — recognised both in the annual Data Journalism Awards’ open data category and the AP Stylebook‘s decision to include a passage on reproducible analysis.
Collaborative practices have also helped inspire a new wave of stakeholder-driven media, made possible by new technologies and business models.
Some of my favourite projects as a data journalist have been those which highlighted, or led to the identification of, flawed or missing data. In 2016 I suggested that the BBC England Data Unit look at how many academy schools were following rules on transparency: we picked a random sample of 100 academies and checked to see if they published a register of all their governors’ interests, as required by official rules. One in five academies failed to do so — and as a result the regulator Ofsted took action against those we’d identified. But were they serious about ensuring this would continue? Returning to the story in later years would be important in establishing how serious they really were.
Registers of interests were also the focus of a project by the Ciudadano Inteligente Fundacion in Chile, which decided to check the Chilean MPs’ register of assets and interests by building a database: they found that nearly 40% of MPs were not disclosing their assets fully, and opened their database up to the public to help identify conflicts of interest.
Similarly, the British Medical Journal’s TheyCareForYou project published the proportion of NHS trusts that provided gifts and hospitality registers and identified that only 6% published them online.
In some cases data journalism can shine a light on bodies not adhering to the law on the information they should be holding: when the BBC sent Freedom of Information requests to mental health bodies about their use of face-down restraint, six replied saying they could not say how often any form of restraint was used — despite being legally required to “document and review every episode of physical restraint which should include a detailed account of the restraint”.
As journalists we need to prepared to make this a story too, rather than only report on the figures that were provided.
Lack of data is one thing; organisations getting away with incorrect data is another. In 2018, as companies began to publish gender pay gap figures, the BBC’s Colin George noticed something odd: a number of employers were claiming to have exactly 50% male and 50% female staff — and a gender pay gap of zero. He spoke to the body responsible for collecting the data, the Government Equalities Office, who said such figures were “statistically improbable” and called on firms to check their information.
What was amazing was that they hadn’t noticed this themselves — especially given that the Financial Times’s Billy Ehrenberg, Aleksandra Wisniewska and Sarah Gordon had already reported “statistically improbable” submissions by 1 in 20 companies some weeks previous. By the time of the deadline the Telegraph was pointing out that some companies were still reporting an improbable pay gap of precisely 0.0.
Impact as a byproduct
Sometimes the impact of a data journalism project is a byproduct — only identified when the story is ready and responses are being sought. In 2017 I helped the BBC England Data Unit analyse over 37 million rows of crime data to establish the scale of unsolved crime in the country. When seeking a right of reply from those police forces whose data showed high levels of unsolved crime one force admitted that its data was wrong, and was forced to resubmit it.
Similarly, when the Bureau Local appeared to find that 18 councils in England had nothing held over in their reserves to protect against financial uncertainty, and sought a response, it turned out the data was wrong:
“What gave the story credence was the source. It was not a tip-off or the Bureau’s own calculations, but data supplied by local authorities and published by the government … No-one noticed [the incorrect data]. Not the councils that compiled the figures, nor the Ministry of Housing, Communities and Local Government, which vetted and then released [them]”.
Their investigation has added to a growing campaign for local bodies to publish data more consistently, more openly, and more accurately.
Now, as data journalism becomes more routine, and more integrated into ever-complex business models, its impact has shifted from innovation to delivery. As data editor David Ottewell says:
“Innovation is getting data journalism on a front-page. Delivery is getting it on the front page day after day. Innovation is building a snazzy interactive that allows readers to explore and understand an important issue. Delivery is doing that, and getting large numbers of people to actually use it; then building another one the next day, and another the day after that.”
Delivery is also, of course, about impact beyond our peers, beyond the ‘wow’ factor of a striking datavis or interactive map — on the real world. It may be immediate, obvious and measurable, or it may be slow-burning, under the radar and diffuse.
Change might appear to be correlated with our work, but data journalists more than anyone know that correlation does not equal causation. Sometimes we can feel like we didn’t make a difference — as in the case of the Boston Globe’s Catholic priest — but change can take time: reporting can sow the seeds of change, with results coming years or decades later.
Ultimately, data journalism with impact sets the agenda. It reaches audiences that other journalism does not reach, and engages them in ways that other journalism does not. It gives a voice to voiceless, and shines a light on that which is unjustifiably obscure. It holds data to account, and speaks truth to its power.
If you can do any of these things — and do them consistently, and persistently — then you should try to measure the impact in any way you can. After all, you’re a data journalist 😊.