FAQ: La Nacion interview in English

Paul Bradshaw: “La falta de datos puede ser la historia en sí misma”
Periodista británico y docente, propone mirar donde otros no miran: los silencios estadísticos, los vacíos oficiales y las voces excluidas pueden ser claves para revelar problemas sistémicos

Earlier this month I was interviewed for a feature about data journalism in the Argentina newspaper La Nacion. Here are the full questions and answers, in English, published as part of the FAQ series.

Data journalism is often associated with numbers and statistics, but how can journalists ensure that the human element remains at the core of their storytelling?

Data and the human element of stories go hand in hand: data can often give us a lead on how things are changing, or the scale of an issue, which places are falling behind, or where there is unfair variation — then we need humans to tell us why, and what the impact of those events are. Data tells us why something matters; humans tell us why we should care. 

Without data, a human story can end up as an anecdote that can be dismissed by those in power as bad luck or the result of a ‘bad apple’. Seeking out data on how many other people are affected, or whether there is an increase in the events being described, makes for a much more engaging story about systemic issues, which is much harder to dismiss.

And if there’s no data — or flawed data — then that’s important to report too: no data means people have no voice, and flawed data means they’re not being clearly heard.

Data helps us identify where we should be looking for interviews or case studies (for example the worst areas or most successful projects), and it empowers us to ask the right questions of those in power: instead of asking them what is happening and merely reporting what they say, we can ask them what they are doing about a problem or trend we have already identified.

Put another way, it allows us to move past “he said, she said” reporting to establishing which person’s version of events is better supported by the data. 

How should journalists approach AI-driven events, such as algorithmic decisions in finance or automated political propaganda?

Many news organisations’ earliest experiments with generative AI have been around using it to produce news, with generally awful results. A common mistake is to misunderstand AI’s inherently probabilistic nature: what most AI does is predict a sequence of words. It is very good at those predictions: in 95% of cases that sequence of words may turn out to also be a true statement. But if 5% of statements are not, that’s a problem if your main value proposition is 100% accuracy.

It doesn’t mean that AI can’t have a role, but it does mean additional costs in engineering and editing.

Even basic translation and summarisation tasks face this challenge, so we are likely to see a lot of job displacement as organisations shift resources from gathering information and writing, to roles more focused on design, adaptation and verification.

In terms of AI-driven events, algorithmic decision making has been subject to journalistic scrutiny for over a decade, with organisations like ProPublica pioneering the field of Algorithmic Accountability.

It’s clear that the design and implementation of AI-based systems is an exercise of power which should be treated like any other. What’s interesting for me is that in many ways AI makes systemic problems much more visible, because an organisation is essentially codifying what might previously have been more implicit.

Crime enforcement might always have been biased, for example, but an algorithm encodes that bias and makes it easier to highlight (it also makes it easier for people to blame the algorithm, rather than the systemic biases it was trained on, so we need to push back on that more).

Automatically generated content is another fast-growing frontier where perhaps the AI arms race between journalists and propagandists is most pronounced. It’s one reason why I believe journalists can’t ignore AI: if you are in an information war and don’t at least understand the weapons your opponents are using, then you are at a marked disadvantage. Operating at the same scale as misinformation agents, and detecting the patterns of their work, is going to require some form of AI-augmented journalism.

More journalists are becoming freelancers or forming independent investigative teams. How does this impact the future of data journalism?

Data journalism requires a lot of flexibility and there is constant technical experimentation, so independence can open up opportunities to innovate and develop. But obviously the freedom of operating outside a large legacy media organisation means you have less protection, reduced access to sources, and it can be harder to get impact with your work. 

The future of data journalism is tied in with the future of journalism more generally: for some decades now we have seen business models changing as you would expect with a lowering in costs and an increase in competition, and data journalism is no exception to that.

Data journalism’s growth can be attributed partly to those same forces: the costs of learning the skills of data analysis, and the tools needed to do it, dropped, while increased competition drove a demand for more in-depth, exclusive reporting that generated better engagement from readers.

The increasing datafication of society is another factor in the growth of data journalism: data is not only increasingly key source material for journalists, but also it acts as a mediator in audiences’ interactions with journalism (data about a person often determines the story that they see, or the focus that it has).

We are only really at the beginning of that increasing personalisation of news, and the issues that raises for ideas of the public sphere and what we might call the ‘national conversation’.

And just as journalism has moved increasingly out of newsrooms — with the rise of citizen journalism and podcasters, of charities and NGOs setting up investigative teams — data journalism increasingly operates outside of traditional news organisations. 

Not all important issues have clear data available. What are the biggest blind spots in data journalism today, and how can journalists investigate stories that lack structured datasets?

A major blind spot is private companies generally: there is vastly more data about public bodies, and public bodies are also subject to Right to Information laws like the Freedom of Information Act. 

So it’s really important that anyone working with data recognises this, and takes extra efforts to consider story ideas that are outside of the public sector.

One common technique for addressing this is to look at where the private sector intersects with the public sector. For example, if companies have contracts with public bodies then those bodies may hold data on their activities, contracts, correspondence, and so on which can be obtained through FOI requests. Regulators may be collecting information on private bodies or requiring them to publish certain information on their website.  

The same techniques that might be used in an authoritarian country with little public data can also be used for tackling data blind spots: compiling data from public sources, for example (such as companies’ annual reports or public policies); scraping; leaks; and text documents as data are all options for the data-starved reporter. 

At a more granular level, we tend to lack data that has a breakdown by ethnicity, and there are also blind spots around gender, disability, sexuality, and social class. That makes it especially difficult to identify variation in outcomes for people from different backgrounds, or variations in how they are treated. The result is that many data stories can be ‘colour-blind’ or blind to other differences in experience. 

In some cases the best we can do is to match up datasets to try to give some sort of indication of this. In one recent story, for example, I looked at library closures, and while the library data contained no measures of social class, I was able to use the libraries’ locations to match with data on deprivation to show that more deprived areas were much more likely to have their library closed.

On other occasions it might be that data is available but not actively published, and needs to be asked for via FOI. Or if it’s not collected, there’s always the story that asks why.

Journalists as a group have blind spots too: the lack of diversity in our newsrooms means we fail to think about the people who might be silent in our data, or whose experiences might need focusing in on. The broader trends identified by data might hide smaller trends that are being experienced by minority groups, and it’s only through either being exposed to those experiences (through personal experience) or seeking those out systematically (by asking ourselves “Is this the same for every group?”) that we can address that. 

Many media outlets use algorithms to detect trends or even identify potential investigative leads. What are the ethical risks of algorithmic bias in journalism, and how can reporters ensure that AI-driven insights don’t reinforce existing prejudices?

There is a danger of anthropomorphising AI as “biased”, but it’s not biased in the same way a person is. It is better to talk of its plural biases, or its biased training. The same biases exist in surveys (where certain groups are under-represented because of lower response rates, or where smaller populations result in greater uncertainty) and we can see a number of techniques being used in survey data to compensate for that – but we don’t say the survey itself is biased, because we don’t anthropomorphise surveys in the same way.

As in surveys, the important thing is not whether an algorithm is ‘biased’ or not – all methods are imperfect, including and especially humans – it’s what steps we are taking to reduce the impact of that bias.

For example, if we train algorithms to detect trends or leads, based on previous stories, then we might ask how journalists choose to cover some stories and not others. We might ask what leads or trends journalists tend to miss, or if a group of journalists is classifying some training data, we might ensure that group is representative of the diversity of perspectives we want to codify in the algorithm.

Put another way, the bias is ours, not the AI’s, and dismissing AI as biased leaves the cause unaddressed.

That’s a particular problem for journalism, which has a strong myth of being objective, and this will require us to be critical about our own profession in the same way we are about other forms of power.

We can even use AI as a check on our own biases. At the idea generation and development stage reporters tend towards groupthink and availability bias, so there are template prompts that we can design to counter that.

When we identify sources we tend to rely on a small pool of contacts, and forget to consider diversity — but asking AI at that stage to make other suggestions can unlock our thinking.

There are plenty of guidelines around avoiding bias in writing, but those are inconsistently applied by journalists. We should be routinely asking AI to look at our work with those guidelines in mind and provide advice. 

At each stage, what we are really introducing is a moment of reflection and feedback into the reporting process, rather than thinking of ourselves as somehow superior to AI. 

What are the most common ways in which data is misrepresented in journalism, and how can audiences become more critical consumers of data-driven reports?

The most common way is not that numbers are manipulated, it’s that other important numbers are left out.

For example an opposition politician might mention a number of crimes that looks large — but they don’t mention that the previous years’ numbers were even larger. Or a campaign group might say that one area has more crimes than any other area, but they don’t mention that the area also has more people than any other area.

Another technique is to highlight a large amount of money being spent on something that looks frivolous – but not to mention the much larger full budget it’s only a tiny part of.

A common one is for a government to claim that they are spending “more than ever” on education, but they fail to mention that there are also more pupils than ever, and that equipment and wages cost more than they ever did before, because of inflation. In real terms, when adjusted for inflation, it may be that they are spending less, or spending less per person.

So one key technique for audiences and journalists is to ask “what context is missing here?” Are the numbers given per person? Do we know if those numbers are getting better or worse? Are financial figures adjusted for inflation? Do we know what proportion of the full budget, or all people, that seemingly big number is?

This map claimed to show “immigrant crime” but included crimes where there was no evidence anyone involved was an immigrant. Source: Bureau of Investigative Journalism

Another area to focus on is classification: numbers can be exaggerated by making definitions so broad that they aren’t really measuring what they claim to be. In Germany many years ago, for example, a map claiming to show ‘refugee crime’ was being shared online, but its definition covered any crimes with descriptions of people as “dark-skinned” or “southern”, and in at least one case the person was the victim.

Misclassification can happen at an institutional level, too, as in the case of the LA Police Department falsely classifying people as gang members.

Look out for changing definitions in the middle of data – the World Bank changed its definition of extreme poverty in 2015, for example. And find out about the methodology used to collect data: is it based on a sample or the whole population, for example. If that information isn’t given, you should be wary of trusting it.

Do you think data-heavy reporting on topics like climate change, inequality, or crime can cause news fatigue or even apathy among readers?

I am not aware of any research that links news fatigue or news avoidance to data journalism specifically. These behaviours tend to be linked to a broader sense of powerlessness and being overloaded with bad news.

So if anything, it may be that reporting which focuses only on tragic events you cannot change (and doesn’t include any data context) is more likely to cause news fatigue, rather than reporting informed by data which provides a bigger picture and puts things into context. It would be good to see research on this.

Some of the strategies for addressing news avoidance require the use of data. Solutions journalism, for example, which has been found to increase engagement and help readers feel more empowered, needs data to identify where to focus reporting (e.g. which areas appear to be bucking a negative trend).

News fatigue is about how we choose to report on a subject – not whether data is involved or not. As always, the more variety within a story the better: if we only focus on data then we risk dehumanising the issues; and if we only focus on human stories we risk de-contextualising people’s experiences and reducing them to anecdotes.

As attention spans shrink and audiences seek more engaging formats, do you see a future where traditional written journalism could be replaced by fully interactive, data-driven narratives?

No. With any new technology there’s always a temptation to think in ‘all or nothing’ terms, but we know in journalism it’s rarely black and white, and mostly shades of grey.

The history of technology shows us that new technologies rarely wipe out old technologies: print wasn’t replaced by radio, and radio wasn’t replaced by TV, and none of those were replaced by online news. 

Different technologies suit different situations, audiences, and budgets. Publishers will be driven by that rather than a strict ideology about one format or approach being inherently better than all others. 

When I wrote the ‘Model for the 21st century newsroom‘ it was all about how different formats serve different needs in the journey of a story: interactivity works well once a news event has happened and there’s a need to dig beyond the simple facts of that event, but before that point it’s often just about who, what, where, when – and speed, rather than depth. 

Some investigative projects rely on data provided by ordinary citizens (e.g., mapping environmental damage or tracking misinformation). How can newsrooms ensure the reliability of crowdsourced data while still encouraging public participation?

It all depends on the specific circumstances of the story, but there needs to be a clear assessment of risks and then what steps can be taken to reduce those risks. For example, if it’s an issue that trolls are likely to be attracted to, then you might take steps to block submissions that match particular behaviours (e.g. bot-like behaviour, certain words, IP addresses etc.)

Crowdsourcing is just one stage in an editorial process, and just one tool in a larger toolkit, so it’s also about what other stages you have: how are you preparing the ground, what are you doing with the information after it’s submitted? What other tools are you using to find and tell the story? 

Often crowdsourcing is as much a way of engaging and empowering a community as it is about information gathering. There can be an arrogance about journalism which actually excludes the very communities it’s supposed to be for, so when we invite audiences to contribute information it is one way to acknowledge that maybe those audiences might also have knowledge which can be of value. And we also have to add value through verification and following up: if we don’t, then asking the audience for information can be interpreted as laziness or exploitation. 

If you had unlimited resources, what kind of large-scale investigation would you launch today? Are there any underreported systemic issues that you believe data journalism is uniquely positioned to uncover?

With unlimited resources I’d probably want to launch a project which sought to fill the gaps in data that exist around ethnicity, gender, disability, and social class etc., and the activities of the private sector – and provide a platform for holding power to account at the most local level.

Data journalism is uniquely positioned to empower reporters, and audiences, in a way that individual reports often can’t, so that’s where I’d be focusing my efforts.

If pushed on a systemic issue, I’d say the areas where private companies are providing public services. This is where public money is being spent, but without the accountability and scrutiny that public services face. And I’d say the inequalities in who benefits from that money, in terms of different groups within society.

Prisons and incarceration (e.g. asylum seekers) are another massively underreported area: how do we engage audiences in issues affecting people they are least likely to care about? Just because we don’t think our audience will care, and that makes it hard, doesn’t mean we shouldn’t be covering those things. 

How can data reporters balance the need for speed with the depth and verification required for high-quality reporting?

Quick, simple stories are more valuable than we often give ourselves credit for. They allow reporters to build an understanding of a field and build trust, to make contacts, and keep an eye on the news agenda.

A high-quality investigation doesn’t have to be a longform piece – it can take the shape of many stories, some simple, and some longer, over a period of time.

So I don’t think it’s a choice between one or the other: we can be strategic in identifying the quick stories that we can do as part of our journey in search of a deeper truth.

There are simple principles we can remember when working at speed, though.

Always being honest – with ourselves as well as our audience – about what we don’t know, for example, is vital. It’s easy to make assumptions about data, so check you understand exactly what is being measured, and how, why and by whom. Look for what’s missing in data, and why it might have been left out. Look for other perspectives, as with any story, and use other sources for confirmation, as with any story. 

I also think we forget that our stories are part of a wider conversation. Every story that we tell takes our audiences, and our fellow journalists, a step closer to the truth, and a step deeper towards understanding.

A single reporter doesn’t have to do all the work in a story, they can build on the work of their peers and predecessors. There’s a lot of ego in wanting to do it all ourselves. If we saved another reporter having to do even a little work on a story, then we’ve contributed to that story regardless of whether our name is on it.

What is the one principle or mindset that journalists should never lose sight of, no matter how the profession evolves?

We are here to serve our audiences, not to impress our fellow journalists, or build relationships with interviewees, or even to advance a cause.

We can serve those audiences in simple ways. We can serve them by empowering them, or by connecting them, we can tell them things they don’t want to hear, about things they didn’t think they were interested in, because serving an audience is not the same thing as telling them what they want to hear.

Not all of what we do is about stories, too – it might be about tools, or platforms, or simply listening. Awards and metrics of engagement are meaningless if they don’t ultimately serve our audiences. 

This entry was posted in AI, data journalism, faq, online journalism and tagged , , , , , on by .
Unknown's avatar

About Paul Bradshaw

Paul teaches data journalism at Birmingham City University and is the author of a number of books and book chapters about online journalism and the internet, including the Online Journalism Handbook, Mobile-First Journalism, Finding Stories in Spreadsheets, Data Journalism Heist and Scraping for Journalists. From 2010-2015 he was a Visiting Professor in Online Journalism at City University London and from 2009-2014 he ran Help Me Investigate, an award-winning platform for collaborative investigative journalism. Since 2015 he has worked with the BBC England and BBC Shared Data Units based in Birmingham, UK. He also advises and delivers training to a number of media organisations.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.