FAQ: What are the professional boundaries in data journalism teams?

Leave a reply

In this latest post in the FAQ series, I’ve been asked to help answer a question on “the distribution of work between journalist, data analyst and designer” in data journalism.

Continue reading →

VIDEO: How automation played a central role in data journalism — and is now playing it again

1 Reply

Automation was key to the work of data journalism pioneers such as Adrian Holovaty — and it’s becoming increasingly central once again. This video, made for students on the MA in Data Journalism at Birmingham City University, explores the variety of roles that automation plays in data journalism; new concepts such as robot journalism, natural language generation (NLG) and structured journalism; and how data journalists’ editorial role becomes “delegated to the future” through the creation of algorithms.

You can find the video about Poligraft, and the FT on robot journalism at those links.

This video is shared as part of a series of video posts.

The third edition of the Online Journalism Handbook is now out!

2 Replies

The online journalism handbook: skills to survive and thrive in the digital age, by Paul Bradshaw

A new, third, edition of the Online Journalism Handbook is now out.

A comprehensive update to the 2017 second edition, it sees the addition of a new chapter on writing for email and chat.

There are new sections on formats from scrollytelling and charticles to threads, vertical Stories, social audio and audiograms, plus advice on how to use gifs, memes and emoji professionally as a journalist.

One notable development of the last few years reflected in the book is the improvement in accessibility provision — which is covered alongside techniques for better inclusivity and diversity in journalism practice.

Developments around harassment and online abuse, misinformation, news avoidance, and trust are all covered — and, of course, the impact of the pandemic on journalistic practices, including remote interviewing tips.

I’ll be publishing extracts and the material I had to leave out (it’s 20,000 words longer than the last edition) in the coming months.

Jornalismo de dados: um guia rápido

Leave a reply

Está disponível em língua portuguesa meu e-book ‘Jornalismo de dados: um guia rápido.’

Traduzido pela jornalista brasileira Amanda Maia, a publicação segue a linha de raciocínio de meu outro livro ‘Excel for journalists’ ou ‘Excel para periodistas’ (tradução espanhola). E está disponível nos formatos PDF, Kindle e para iPad.

This is how I’ll be teaching journalism students ChatGPT (and generative AI) next semester

10 Replies

Robot with books — Image by kjpargeter on Freepik

I’m speaking at the Broadcast Journalism Teaching Council‘s summer conference this week about artificial intelligence — specifically generative AI. It’s a deceptively huge area that presents journalism educators with a lot to adapt to in their teaching, so I decided to put those in order of priority.

Each of these priorities could form the basis for part of a class, or a whole module – and you may have a different ranking. But at least you know which one to do first…

Priority 1: Understand how generative AI works

The first challenge in teaching about generative AI is that most people misunderstand what it actually is — so the first priority is to tackle those misunderstandings.

Continue reading →

Generative AI: Here are 6 principles for using it in journalism that address diversity and inclusion (it’s just good journalism)

Leave a reply

AI (ChatGPT etc) is a massive threat to diversity in journalism, amplifying existing biases & entrenching racial (&other) inequalities.
The @LHC4MD has produced a 6 point guideline of how journos and newsrooms can use it responsibly and respect diversity 🧵https://t.co/qyFR9B2rlT pic.twitter.com/m3MLWk5RvF
— Marcus Ryder (@marcusryder) June 16, 2023

Artificial intelligence is known to suffer from deep-seated issues when it comes to diversity: machine learning algorithms are trained on historical data that can embed institutional discrimination; NLP and generative AI suffer from the same problems; and the industry itself has a diversity challenge.

It’s surprising, then, that the discussion emerging in the industry around generative AI has so far failed to engage with these issues: UK regulator Ofcom’s news release on it earlier this month doesn’t mention it; nor does the US Radio Television Digital News Association’s new guidelines. BuzzFeed’s lessons from, and discussion of its use of the technology doesn’t touch on it; CNET, despite being burned by the tech, doesn’t mention bias in its policy.

Continue reading →

The Inverted Pyramid of Data Journalism in Finnish (Datajournalismin käänteinen pyramidi)

Leave a reply

I was recently invited to speak to students at Tampere University in Finland, and had the opportunity — with the help of Esa Sirkkunen — to translate the ‘Inverted pyramid of data journalism‘ into Finnish. I’m sharing it here for anyone else who might find it useful.

Datajournalismin käänteinen pyramidi
Ideoi
Kokoa
Siisti
Taustoita
Yhdistä
Kysymys
Kommunikoi

What is dirty data and how do I clean it? A great big guide for data journalists

7 Replies

If you’re working with data as a journalist it won’t be long before you come across the phrases “dirty data” or “cleaning data“. The phrases cover a wide range of problems, and a variety of techniques for tackling them, so in this post I’m going to break down exactly what it is that makes data “dirty”, and the different cleaning strategies that a journalist might adopt in tackling them.

Four categories of dirty data problem

Look around for definitions of dirty data and the same three words will crop up: inaccurate, incomplete, or inconsistent.

Dirty data problems:
Inaccurate: Data stored as wrong type; Misentered data; Duplicate data; abbreviation and symbols.
Incomplete: Uncategorised; missing data.
Inconsistent: Inconsistency in naming of entities; mixed data
Incompatible data: Wrong shape;
‘Dirty’ characters (e.g. unescaped HTML)

Inaccurate data includes duplicate or misentered information, or data which is stored as the wrong data type.

Incomplete data might only cover particular periods of time, specific areas, or categories — or be lacking categorisation entirely.

Inconsistent data might name the same entities in different ways or mix different types of data together.

To those three common terms I would also add a fourth: data that is simply incompatible with the questions or visualisation that we want to perform with it. One of the most common cleaning tasks in data journalism, for example, is ‘reshaping‘ data from long to wide, or vice versa, so that we can aggregate or filter along particular dimensions. (More on this later).

Continue reading →

Angles for data stories — in Finnish (yleistä näkökulmaa datatarinoihin)

Leave a reply

I recently had the opportunity — thanks to Esa Sirkkunen of Tampere University — to translate the diagram from ‘8 angles that journalists use most often to tell data stories‘ into Finnish. I’m sharing it here for anyone else who might find it useful.

8 yleistä näkökulmaa datatarinoihin
Mittakaava
Muutos
Sijoitus
Variaatio
Tutkia
Suhteet
Puuttuva/huono
Johtaa

FOI, diversity, and imposter syndrome — an interview with Jenna Corderoy

Leave a reply

Lyra McKee Memorial Lecture - Tuesday March 28 5.30-7pm at Birmingham City University Curzon Building. Jenna Corderoy on holding power to account and getting into journalism.

On Tuesday I will be hosting the award-winning investigative journalist and FOI campaigner Jenna Corderoy at the Lyra McKee Memorial Lecture. Ahead of the event, I asked Jenna about her tips on investigations, FOI, confidence, and the challenges facing the industry.

What’s the story you have learned the most from?

The story that I learned the most from was definitely our Clearing House investigation. Back in November 2020, we revealed the existence of a unit within the heart of government, which screened Freedom of Information (FOI) requests and instructed government departments on how to respond to requests. The unit circulated the names of requesters across Whitehall, notably the names of journalists.

Continue reading →

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.

FAQ: What are the professional boundaries in data journalism teams?

VIDEO: How automation played a central role in data journalism — and is now playing it again

Jornalismo de dados: um guia rápido

This is how I’ll be teaching journalism students ChatGPT (and generative AI) next semester

Priority 1: Understand how generative AI works

Generative AI: Here are 6 principles for using it in journalism that address diversity and inclusion (it’s just good journalism)

The Inverted Pyramid of Data Journalism in Finnish (Datajournalismin käänteinen pyramidi)

What is dirty data and how do I clean it? A great big guide for data journalists

Four categories of dirty data problem

Angles for data stories — in Finnish (yleistä näkökulmaa datatarinoihin)

FOI, diversity, and imposter syndrome — an interview with Jenna Corderoy

What’s the story you have learned the most from?