There’s a story out this week on the BBC website about dialogue and gender in Game of Thrones. It uses data generated by artificial intelligence (AI) — specifically, machine learning — and it’s a good example of some of the challenges that journalists are increasingly going to face as they come to deal with more and more algorithmically-generated data.
Information and decisions generated by AI are qualitatively different from the sort of data you might find in an official report, but journalists may fall back on treating data as inherently factual.
Here, then, are some of the ways the article dealt with that — and what else we can do as journalists to adapt.
Margins of error: journalism doesn’t like vagueness
The story draws on data from an external organisation, Ceretai, which “uses machine learning to analyse diversity in popular culture.” The organisation claims to have created an algorithm which “has learned to identify the difference between male and female voices in video and provides the speaking time lengths in seconds and percentages per gender.”
Crucially, the piece notes that:
“Like most automatic systems, it doesn’t make the right decision every time. The accuracy of this algorithm is about 85%, so figures could be slightly higher or lower than reported.”
Last week saw the third Data Journalism UK conference, an opportunity for the country’s data journalists to gather, take stock of the state of the industry and look at what’s ahead.
The BBC Shared Data Unit’s Pete Sherlock kicked off the event, looking back at the first 18 months of the unit’s existence. In that period the unit has trained 15 secondees and helped generate over 600 stories across more than 250 titles in the regional press.
Both stories resulted in strong pushback – from the Ministry of Justice and the electric car industry respectively – but their new data journalism skills gave them the confidence to persist with the story. Continue reading →
The latest in my series of FAQ posts follows on from the last one, in response to a question from an MA student at City University who posed the question “Do you think that an increase in algorithmic input is leading to a decline in human judgement?”. Here’s my response.
Does an increase in computation lead to a decline in human input?
Firstly, it’s important to emphasise that the vast majority of data journalism involves no algorithms or automation at all: it’s journalists making calculations, which historically they would have done manually.
You mention the possibility that “an increase in computation leads to a decline in human input”. An analogy would be to ask whether an increase in pencils leads to a decline in human input in art. Continue reading →
I’ve now been teaching data journalism for over a decade — from one-off guest classes at universities with no internal data journalism expertise, to entire courses dedicated to the field. In the first of two extracts from a commentary I was asked to write for Asia Pacific Media Educator I reflect on the lessons I’ve learned, and the differences between what I describe (after Daniel Kahneman) as “teaching data journalism fast” and “teaching data journalism slow”. First up, ‘teaching data journalism fast‘ — techniques for one-off data journalism classes aimed at general journalism students.
In this commentary, I outline the different pedagogical approaches I have adopted in teaching data journalism within different contexts over the last decade. In each case, there was more than enough data journalism to fill the space — the question was how to decide which bits to leave out, and how to engage students in the process. Continue reading →
Local data journalism in the UK has been undergoing a quiet revolution in the last 12 months, but 2018 in particular has seen a number of landmarks already in its first few months. Here’s some of the highlights in just its first 12 and a half weeks…
January: BBC Shared Data Unit publishes its first secondee-led investigation
The BBC Shared Data Unit had already been producing stories before in late 2017 it took on its first three-month secondees from the news industry. Over the next 12 weeks they received training in data journalism and work on a joint investigation. Continue reading →
Women represent 49.5% of the world’s population, but they do not have a corresponding public, political and social influence. In recent years, more and more women have raised their voices, making society aware of their challenges — data journalists included. To commemorate International Women’s Day, Carla Pedret presentsa list of data journalism projects that detail the sacrifices, injustices and prejudices that women still have to face in the 21st century.
In a guest post for OJB, Barbara Maseda looks at how the media has used text-as-data to cover State of the Union addresses over the last decade.
State of the Union (SOTU) addresses are amply covered by the media —from traditional news reports and full transcripts, to summaries and highlights. But like other events involving speeches, SOTU addresses are also analyzable using natural language processing (NLP) techniques to identify and extract newsworthy patterns.
Every year, a new speech is added to this small collection of texts, which some newsrooms process to add a fresh angle to the avalanche of coverage.