Category Archives: data journalism

When you get data in sentences: how to use a spreadsheet to extract numbers from phrases

Unduly lenient sentences review scheme inadequate

This BBC story involved converting phrases into numbers that could be used in calculations

Earlier this month the BBC Data Unit published a story on unduly lenient sentences which involved working with data that was trapped in phrases.

We needed to be able to take a collection of words such as “11 years and 5 months’ imprisonment” and convert that into something that could be used in spreadsheet calculations (specifically, comparing the lengths of time represented by two different phrases).

It’s a problem you come across every so often as a journalist — especially with FOI requests — so in this post — taken from the book Finding Stories in Spreadsheets — I’ll explain how to do that. Continue reading

Advertisements

FAQ: What are the essential computational skills that a journalist should develop?

Blue skyscrapers

Recognising patterns is a key skill in computational journalism (image by Stanley Zimny)

This latest group of frequently asked questions comes from an interview with Source, published here in full just in case it’s — you know — useful or something…

1. What are the essential computational skills that a journalist should develop?

Firstly, an ability to recognise patterns, or structured information. Spreadsheets are explicitly ‘data’ but some of the most interesting applications of computational journalism are where someone has seen data where others don’t.

Continue reading

Here are 2 videos and slides from my MA/PGCert Data Journalism taster day

Earlier this month I held a special open taster class at Birmingham City University for anyone interested in my full time MA and part time PGCert courses in Data Journalism. As some people couldn’t get to the UK to attend the event I put together two video screencasts recapping some of the material covered in the session.

I’ve embedded the two videos — and slides from the day — below.

And if you want to try out some of the hands-on activities from the class, you can find them here.

If we are using AI in journalism we need better guidelines on reporting uncertainty

Chart: women speak 27% of the time in Game of Thrones

The BBC’s chart mentions a margin of error

There’s a story out this week on the BBC website about dialogue and gender in Game of Thrones. It uses data generated by artificial intelligence (AI) — specifically, machine learning —  and it’s a good example of some of the challenges that journalists are increasingly going to face as they come to deal with more and more algorithmically-generated data.

Information and decisions generated by AI are qualitatively different from the sort of data you might find in an official report, but journalists may fall back on treating data as inherently factual.

Here, then, are some of the ways the article dealt with that — and what else we can do as journalists to adapt.

Margins of error: journalism doesn’t like vagueness

The story draws on data from an external organisation, Ceretai, which “uses machine learning to analyse diversity in popular culture.” The organisation claims to have created an algorithm which “has learned to identify the difference between male and female voices in video and provides the speaking time lengths in seconds and percentages per gender.”

Crucially, the piece notes that:

“Like most automatic systems, it doesn’t make the right decision every time. The accuracy of this algorithm is about 85%, so figures could be slightly higher or lower than reported.”

And this is the first problem. Continue reading

Data Journalism Awards 2019 open for entries

Data Journalism Awards 2019 logo

The Data Journalism Awards is now accepting entries for its 2019 awards.

It’s the 8th year of the awards. This year the “Best data journalism team” category has been divided into two categories: small and large teams, with the “Small newsrooms (one or more winners)” category making way for the change.

The awards website has also been revamped to include a range of resources for data journalists, a “Community” section (in addition to the existing Slack group) and news on data journalism developments.

The deadline to enter is 7 April 2019. Winners get an all-expenses-covered trip to June’s Global Editors Network (GEN) Summit and Data Journalism Awards ceremony.

Data journalism in Hungary: how Átlátszó’s new datavis project seeks to be both investigative and educational

In Hungary, not-for-profit news site Átlátszó has launched a full-time data team to create a wide range of data visualisations and data-driven stories. Amanda Loviza spoke to data journalist Attila Bátorfy about his plans to have Átló raise the quality of data journalism in Hungary.

Átlátszó was created in 2011 as Hungary’s first crowd-funded independent investigative news site, with a stated goal of holding the powerful accountable.

Data journalist Attila Bátorfy joined the site two and a half years ago. It was not long before he told editor-in-chief Tamás Bodoky that the site needed a whole separate team to produce higher quality data visualisations. Continue reading

FAQ: Do you think that an increase in algorithms is leading to a decline in human judgement?

recipe by Phillip Stewart

This algorithm has been quality tested. Image by Phillip Stewart

The latest in my series of FAQ posts follows on from the last one, in response to a question from an MA student at City University who posed the question “Do you think that an increase in algorithmic input is leading to a decline in human judgement?”. Here’s my response.

Does an increase in computation lead to a decline in human input?

Firstly, it’s important to emphasise that the vast majority of data journalism involves no algorithms or automation at all: it’s journalists making calculations, which historically they would have done manually.

You mention the possibility that “an increase in computation leads to a decline in human input”. An analogy would be to ask whether an increase in pencils leads to a decline in human input in art. Continue reading