Bárbara Maseda has dedicated the last four years to publishing data where none exists. “In Cuba we use investigative journalism tools to search for information that elsewhere in the world would be in a press release,” she says. Other journalists’ data problems, such as receiving data in formats that are difficult to analyse, “are my highest aspirations”.
This is what you’ll look like after reading all of these books… (“Study of a Man Reading” by Alphonse Legros)
This latest in the frequently asked questions series is an answer to an aspiring data journalism student who asks “Would you be able to direct me to any resources or text books that might help [prepare]?” Here are some recommendations I give to students on my MA in Data Journalism…
Books on data journalism as a profession
Data journalism isn’t just the application of a practical skill, but a profession with a culture, a history, and non-technical practices.
We needed to be able to take a collection of words such as “11 years and 5 months’ imprisonment” and convert that into something that could be used in spreadsheet calculations (specifically, comparing the lengths of time represented by two different phrases).
This latest group of frequently asked questions comes from an interview with Source, published here in full just in case it’s — you know — useful or something…
1. What are the essential computational skills that a journalist should develop?
Firstly, an ability to recognise patterns, or structured information. Spreadsheets are explicitly ‘data’ but some of the most interesting applications of computational journalism are where someone has seen data where others don’t.
Earlier this month I held a special open taster class at Birmingham City University for anyone interested in my full time MA and part time PGCert courses in Data Journalism. As some people couldn’t get to the UK to attend the event I put together two video screencasts recapping some of the material covered in the session.
There’s a story out this week on the BBC website about dialogue and gender in Game of Thrones. It uses data generated by artificial intelligence (AI) — specifically, machine learning — and it’s a good example of some of the challenges that journalists are increasingly going to face as they come to deal with more and more algorithmically-generated data.
Information and decisions generated by AI are qualitatively different from the sort of data you might find in an official report, but journalists may fall back on treating data as inherently factual.
Here, then, are some of the ways the article dealt with that — and what else we can do as journalists to adapt.
Margins of error: journalism doesn’t like vagueness
The story draws on data from an external organisation, Ceretai, which “uses machine learning to analyse diversity in popular culture.” The organisation claims to have created an algorithm which “has learned to identify the difference between male and female voices in video and provides the speaking time lengths in seconds and percentages per gender.”
Crucially, the piece notes that:
“Like most automatic systems, it doesn’t make the right decision every time. The accuracy of this algorithm is about 85%, so figures could be slightly higher or lower than reported.”
It’s the 8th year of the awards. This year the “Best data journalism team” category has been divided into two categories: small and large teams, with the “Small newsrooms (one or more winners)” category making way for the change.