FAQ: Big data and journalism

The latest in the series of Frequently Asked Questions comes from a UK student, who has questions about big data.

How can data journalists make sense of such quantities of data and filter out what’s meaningful?

In the same way they always have. Journalists’ role has always been to make choices about which information to prioritise, what extra information they need, and what information to include in the story they communicate.

Data is just another type of information. So journalists decide: what is the story I’m reporting here? What parts of the data will help me to find that story? Which parts will flesh out the context, or detail? Which will lead me to interesting human aspects?

Different journalists will find different things interesting – or ‘meaningful’ – in the same piece of data. Often there are many different stories to tell, so it’s a case of prioritising and focusing.

What current challenges do data journalists face?

There are technical challenges, political challenges, and ethical challenges, to name just three. Technically there are a number of tools and techniques that journalists need to master – it depends on the nature of data in their field. It might be scraping, or data visualisation, or creating apps and platforms.

Politically there’s an ongoing struggle taking place over freedom of information and open data – the latter is being used to restrict the former, in some cases, and vice versa. We are seeing increasing numbers of public services being delivered by private companies which are not subject to scrutiny under FOI acts. Journalists need to master the laws to gain access to information – but also support their expansion.

Ethically we’re having to balance competing values: public interest versus privacy, for example; ensuring accuracy is another.

Are there any potentially ‘game changing’ technology or methods that you are aware of which would enhance data journalism?

I think network analysis tools have enormous potential for showing potential conflicts of interest – or vested interests – of those in power.

Semantic analysis is fascinating: DocumentCloud, for example, can take 300 pages of documents and show you who and what is mentioned on what dates in those documents, allowing you to drill down to the most newsworthy facets.

Obviously the increasingly public nature of information makes scraping a particularly powerful way of gaining insights which were not previously powerful – that’s why I wrote a book about it.

Then there’s the ability to connect a person’s social media profile to a story – allowing for genuine personalisation of a story: how it affects you, based on the details you’ve already provided such as your school, age, marital status, etc.

What should young journalists train for specifically regarding data journalism? (techniques/software etc.)

There are some generic base skills, such as the ability to use spreadsheets, but I think focusing on tools is the wrong approach. It’s always best to be led by your editorial needs: if you are interested in a field which involves public institutions then FOI is useful; if not, then scraping.

If you’re dealing with science, financial or health stories, statistical literacy is more important (although some literacy is a good idea in any field). If you want to make useful tools, then learn programming. If your story is about striking facts and figures then perhaps learn visualisation tools and languages to show those in an easy to understand form.

The most important skill is the ability to learn new things: data journalism will always involve new challenges, new tools, and new opportunities – and you will always be learning. Thankfully there are always communities and resources out there to help you. It never gets boring.

3 thoughts on “FAQ: Big data and journalism

  1. Pingback: Can you even count? | Maja's uni blog

  2. Pingback: Go big data in the age of big data!(Doubting castle 4: advanced blogging) | Sigefield

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.