Tag Archives: ChatGPT

Investigative journalism and ChatGPT: using generative AI for story ideas

Applications of genAI in the journalism process
Pyramid with the bottom 'pre-production' layer highlighted: Idea generation and stimulation: identify and map systems and rules, apply brainstorming frameworks (iceberg model, 5 whys, 8 angles of data journalism). Planning.
Generative AI can be used at all points in the journalism process: this post focuses on pre-production

Last week I delivered a session at the Centre for Investigative Journalism Summer School about using generative AI tools such as ChatGPT and Google Gemini for investigations. In the first of a series of posts from the talk, here are my tips on using those tools for idea generation.

Generative AI tools may not be entirely reliable, but that doesn’t mean that they’re not useful. Journalism, after all, is about more than just gathering information: reporters also need to generate story ideas, identify and approach potential sources, plan ahead, write and edit stories and solve a range of technical challenges. All of these are areas where genAI can help.

Continue reading

This is how I’ll be teaching journalism students ChatGPT (and generative AI) next semester

Robot with books
Image by kjpargeter on Freepik

I’m speaking at the Broadcast Journalism Teaching Council‘s summer conference this week about artificial intelligence — specifically generative AI. It’s a deceptively huge area that presents journalism educators with a lot to adapt to in their teaching, so I decided to put those in order of priority.

Each of these priorities could form the basis for part of a class, or a whole module – and you may have a different ranking. But at least you know which one to do first…

Priority 1: Understand how generative AI works

The first challenge in teaching about generative AI is that most people misunderstand what it actually is — so the first priority is to tackle those misunderstandings.

Continue reading

What is dirty data and how do I clean it? A great big guide for data journalists

Image: George Hodan

If you’re working with data as a journalist it won’t be long before you come across the phrases “dirty data” or “cleaning data“. The phrases cover a wide range of problems, and a variety of techniques for tackling them, so in this post I’m going to break down exactly what it is that makes data “dirty”, and the different cleaning strategies that a journalist might adopt in tackling them.

Four categories of dirty data problem

Look around for definitions of dirty data and the same three words will crop up: inaccurate, incomplete, or inconsistent.

Dirty data problems:
Inaccurate: Data stored as wrong type; Misentered data; Duplicate data; abbreviation and symbols.
Incomplete: Uncategorised; missing data.
Inconsistent: Inconsistency in naming of entities; mixed data
Incompatible data:  Wrong shape;
‘Dirty’ characters (e.g. unescaped HTML)

Inaccurate data includes duplicate or misentered information, or data which is stored as the wrong data type.

Incomplete data might only cover particular periods of time, specific areas, or categories — or be lacking categorisation entirely.

Inconsistent data might name the same entities in different ways or mix different types of data together.

To those three common terms I would also add a fourth: data that is simply incompatible with the questions or visualisation that we want to perform with it. One of the most common cleaning tasks in data journalism, for example, is ‘reshaping‘ data from long to wide, or vice versa, so that we can aggregate or filter along particular dimensions. (More on this later).

Continue reading