Traduzido pela jornalista brasileira Amanda Maia, a publicação segue a linha de raciocínio de meu outro livro ‘Excel for journalists’ ou ‘Excel para periodistas’ (tradução espanhola). E está disponível nos formatos PDF, Kindle e para iPad.
I’m speaking at the Broadcast Journalism Teaching Council‘s summer conference this week about artificial intelligence — specifically generative AI. It’s a deceptively huge area that presents journalism educators with a lot to adapt to in their teaching, so I decided to put those in order of priority.
Each of these priorities could form the basis for part of a class, or a whole module – and you may have a different ranking. But at least you know which one to do first…
Priority 1: Understand how generative AI works
The first challenge in teaching about generative AI is that most people misunderstand what it actually is — so the first priority is to tackle those misunderstandings.
AI (ChatGPT etc) is a massive threat to diversity in journalism, amplifying existing biases & entrenching racial (&other) inequalities. The @LHC4MD has produced a 6 point guideline of how journos and newsrooms can use it responsibly and respect diversity 🧵https://t.co/qyFR9B2rlTpic.twitter.com/m3MLWk5RvF
I was recently invited to speak to students at Tampere University in Finland, and had the opportunity — with the help of Esa Sirkkunen — to translate the ‘Inverted pyramid of data journalism‘ into Finnish. I’m sharing it here for anyone else who might find it useful.
If you’re working with data as a journalist it won’t be long before you come across the phrases “dirty data” or “cleaning data“. The phrases cover a wide range of problems, and a variety of techniques for tackling them, so in this post I’m going to break down exactly what it is that makes data “dirty”, and the different cleaning strategies that a journalist might adopt in tackling them.
Four categories of dirty data problem
Look around for definitions of dirty data and the same three words will crop up: inaccurate, incomplete, or inconsistent.
Inaccurate data includes duplicate or misentered information, or data which is stored as the wrong data type.
Incomplete data might only cover particular periods of time, specific areas, or categories — or be lacking categorisation entirely.
Inconsistent data might name the same entities in different ways or mix different types of data together.
To those three common terms I would also add a fourth: data that is simply incompatible with the questions or visualisation that we want to perform with it. One of the most common cleaning tasks in data journalism, for example, is ‘reshaping‘ data from long to wide, or vice versa, so that we can aggregate or filter along particular dimensions. (More on this later).
On Tuesday I will be hosting the award-winning investigative journalist and FOI campaigner Jenna Corderoy at the Lyra McKee Memorial Lecture. Ahead of the event, I asked Jenna about her tips on investigations, FOI, confidence, and the challenges facing the industry.
What’s the story you have learned the most from?
The story that I learned the most from was definitely our Clearing House investigation. Back in November 2020, we revealed the existence of a unit within the heart of government, which screened Freedom of Information (FOI) requests and instructed government departments on how to respond to requests. The unit circulated the names of requesters across Whitehall, notably the names of journalists.
Python is an extremely powerful language for journalists who want to scrape information from online sources. This series of videos, made for students on the MA in Data Journalism at Birmingham City University, explains some core concepts to get started in Python, how to use Colab notebooks within Google Drive, and introduces some code to get started with scraping.
A couple of years ago I mapped out eight common angles for identifying stories in data. It turns out that the same framework is useful for finding stories in company accounts, too — but not only that: the angles also map neatly onto three broad techniques.
In this post I’ll go through each of the three techniques — looking at cash flow statements; compiling data from multiple accounts; and tracing people and connections — and explain how they can be used to get stories, with examples of articles that have used those techniques successfully.