This BBC story involved converting phrases into numbers that could be used in calculations
Earlier this month the BBC Data Unit published a story on unduly lenient sentences which involved working with data that was trapped in phrases.
We needed to be able to take a collection of words such as “11 years and 5 months’ imprisonment” and convert that into something that could be used in spreadsheet calculations (specifically, comparing the lengths of time represented by two different phrases).
It’s a problem you come across every so often as a journalist — especially with FOI requests — so in this post — taken from the book Finding Stories in Spreadsheets — I’ll explain how to do that. Continue reading
Earlier this month I held a special open taster class at Birmingham City University for anyone interested in my full time MA and part time PGCert courses in Data Journalism. As some people couldn’t get to the UK to attend the event I put together two video screencasts recapping some of the material covered in the session.
I’ve embedded the two videos — and slides from the day — below.
And if you want to try out some of the hands-on activities from the class, you can find them here.
The BBC’s chart mentions a margin of error
There’s a story out this week on the BBC website about dialogue and gender in Game of Thrones. It uses data generated by artificial intelligence (AI) — specifically, machine learning — and it’s a good example of some of the challenges that journalists are increasingly going to face as they come to deal with more and more algorithmically-generated data.
Information and decisions generated by AI are qualitatively different from the sort of data you might find in an official report, but journalists may fall back on treating data as inherently factual.
Here, then, are some of the ways the article dealt with that — and what else we can do as journalists to adapt.
Margins of error: journalism doesn’t like vagueness
The story draws on data from an external organisation, Ceretai, which “uses machine learning to analyse diversity in popular culture.” The organisation claims to have created an algorithm which “has learned to identify the difference between male and female voices in video and provides the speaking time lengths in seconds and percentages per gender.”
Crucially, the piece notes that:
“Like most automatic systems, it doesn’t make the right decision every time. The accuracy of this algorithm is about 85%, so figures could be slightly higher or lower than reported.”
And this is the first problem. Continue reading
The Data Journalism Awards is now accepting entries for its 2019 awards.
It’s the 8th year of the awards. This year the “Best data journalism team” category has been divided into two categories: small and large teams, with the “Small newsrooms (one or more winners)” category making way for the change.
The awards website has also been revamped to include a range of resources for data journalists, a “Community” section (in addition to the existing Slack group) and news on data journalism developments.
The deadline to enter is 7 April 2019. Winners get an all-expenses-covered trip to June’s Global Editors Network (GEN) Summit and Data Journalism Awards ceremony.
In Hungary, not-for-profit news site Átlátszó has launched a full-time data team to create a wide range of data visualisations and data-driven stories. Amanda Loviza spoke to data journalist Attila Bátorfy about his plans to have Átló raise the quality of data journalism in Hungary.
Átlátszó was created in 2011 as Hungary’s first crowd-funded independent investigative news site, with a stated goal of holding the powerful accountable.
Data journalist Attila Bátorfy joined the site two and a half years ago. It was not long before he told editor-in-chief Tamás Bodoky that the site needed a whole separate team to produce higher quality data visualisations. Continue reading
The latest in my series of FAQ posts follows on from the last one, in response to a question from an MA student at City University who posed the question “Do you think that an increase in algorithmic input is leading to a decline in human judgement?”. Here’s my response.
Does an increase in computation lead to a decline in human input?
Firstly, it’s important to emphasise that the vast majority of data journalism involves no algorithms or automation at all: it’s journalists making calculations, which historically they would have done manually.
You mention the possibility that “an increase in computation leads to a decline in human input”. An analogy would be to ask whether an increase in pencils leads to a decline in human input in art. Continue reading