FAQ: On data journalism and open data

In the second part of this FAQ (first part here), I respond to more answers to questions from a Turkish PR company (published on LinkedIn here)…

Q: What skills do you think a journalist must absolutely have when working with data?

There are three core skills I always begin with: sorting, filtering, and calculating percentages (proportion and change). You can do most data journalism stories with those alone.

Alongside those basic technical skills it’s important to have the basic editorial skills of checking a source against other sources (following up your data by getting quotes or interviews), and being able to communicate what you’ve found clearly for a particular audience.

The next two technical skills are probably visualisation and pivot tables.

Pivot tables allow you to quickly turn detailed data into a summary table. That table might allow you to rank the worst or best areas or categories, and it can allow you to look at how those have changed over time. Those summary tables are also exactly what you need to create a bar, pie or line chart.

There’s a lot of talk about coding but in practice I would say this is much less important than those core spreadsheet skills, and probably only used in no more than 5% of data-driven stories.

It becomes more important once you start to move into stories that require scraping, or dealing with large datasets, or advance planning or time-saving (i.e. writing code that will analyse the data when it’s released, or analysis that you are doing regularly).

5. Are governments and institutions transparent enough in regard to open data?

I’m always going to say governments and institutions can be doing a lot more in relation to open data and transparency. A big problem is that, in general, the choice of what to release lies with the organisation and so what is released can potentially be a distraction from what is important. As journalists we need to be aware of that.

To some extent it’s not about organisations being more transparent — it’s about government and regulators passing laws and enforcing them to ensure that the most important data is required to be published.

Q: As you emphasised in the Data Journalism Handbook, storytelling plays a crucial role in the analysis of datasets. What do you see as the most critical step in transforming data into a “story”?

I always advise my students and trainees to look at the list of columns in a dataset and identify the ones that they think are most important or newsworthy — and then think about that column in terms of the 7 angles of data journalism: what stories can they tell about things getting better or worse? About who/what is ranked worst or best? Or how well your audience’s area or country is performing? Or the scale of a problem? 

If you feel there are a number of stories to tell, you might choose to do something more in depth with each of those stories forming one chapter in a feature that offers to ‘explore’ the issue or present a summary ‘in numbers’.

People often look for relationships but this is generally a mistake, especially for beginners: it is rare that we can prove a causal relationship, and trying to can result in an overcomplicated and unsatisfying story. Relationship stories require more time and experience. 

Q: Data journalism has become increasingly important not only for professional journalists but also for activists and citizen reporters. What kind of future do you think this widespread adoption promises?

I’ve always emphasised that data is just another source, and should be treated with scepticism. Its adoption outside journalism is probably already making it clearer that ‘data’ doesn’t necessarily mean ‘fact’, and should encourage us to always ask questions of data just as we would question a human source: what are the motivations of those who collect it? How was it collected? How are key concepts defined or measured?

Classification is often where people either jump to the wrong conclusion or misrepresent what data is actually showing. One thinktank, for example, misrepresented people charged with crimes as people who had committed those crimes. Other reports have confused irregular migrants with illegal migrants, or made calculations based on a wrong or out of date population estimate.

We have also seen campaigners deciding to (mis)classify anyone described as having dark skin in crime reports as an ‘immigrant’, or failing to distinguish between crimes and criminals (one person can commit multiple crimes). 

Confirmation bias and access to almost unlimited information means it is very easy to find something that supports your position, so “cherry picking” data is widespread, with many people unaware that this is what they are doing.

We need to train people early to develop a habit of challenging their own assumptions — and as journalists we need to have the skills to check the claims made based on any dataset. 

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.