Category Archives: data journalism

Double counting: how to spot it and how to avoid it

Double counting — counting something more than once in data — can present particular risks for journalists, leading to an incorrect total or proportion. Here’s how to spot it — and what to do about it.

Look at the following chart showing the gender of teachers in UK schools, based on data on teacher headcounts. Notice anything wrong? (There are at least two problems)

Pie chart: Sex of teachers in UK schools
There are three visible slices: male, female and 'total', which takes up more than half of the pie.

The most obvious problem is that the chart appears to be ‘comparing apples with oranges’ (things that aren’t comparable). Specifically: “male”, “female”, and “unknown” are similar categories which can fairly be compared with each other, but “total” is a wider category that contains the other three.

I’ve used a pie chart here to make it easier to spot: we expect a pie chart to show parts of a whole, not the whole as well as its parts.

But the same problem should be obvious from the same data in a table before visualising it:

Table showing headcount of teachers in the categories: female, male, unknown, total, plus a grand total of all four at the bottom

The table shows us that we have both a “Total” and a “Grand Total”. This is a red flag. There can only be one total, so if there’s more than one that’s a strong sign of double counting.

Why is this happening? We need to take a look at the data.

Continue reading

7 técnicas de design de prompts para IA generativa que todo jornalista deveria conhecer

Ferramentas como o ChatGPT podem parecer falar a sua língua, mas na verdade falam uma linguagem de probabilidade e suposições fundamentadas. Você pode fazer-se entender melhor — e obter resultados mais profissionais — com algumas técnicas simples de prompting. Aqui estão as principais para adicionar ao seu kit de ferramentas (Este post foi traduzido do inglês original usando o Claude Sonnet 4.5 como parte de uma experiência. Por favor, avise-me se encontrar algum erro ou traduções incorretas).

Técnicas de design de prompts para IA generativa

Prompting de papel

Prompting de exemplo único

Prompting recursivo

Geração aumentada por recuperação

Cadeia de pensamento

Meta prompting

Prompting negativo

Prompting de papel

O prompting de papel envolve atribuir um papel específico à sua IA. Por exemplo, você pode dizer “Você é um correspondente experiente de educação” ou “Você é o editor de um jornal nacional britânico” antes de delinear o que está a pedir que façam. Quanto mais detalhes, melhor.

pesquisas contraditórias sobre a eficácia do prompting de papel, mas no nível mais básico, fornecer um papel é uma boa maneira de garantir que você fornece contexto, o que faz uma grande diferença na relevância das respostas.

Continue reading

Telling stories with data: more on the difference between ‘variation’ stories and ‘ranking’ angles

7 common angles for data storie: scale, change, ranking, variation, explore, relationships, bad data, leads
The 7 angles. Also available in Norwegian and Finnish.

One of the most common challenges I encounter when teaching people the 7 most common story angles in data journalism is confusion between variation and ranking stories. It all comes down to the difference between process and product.

That’s because both types of story involve ranking as a piece of data analysis.

We might rank the number of specialist teachers in the country’s schools, for example, in order to tell either of the following stories:

  • “There are more specialist science teachers than those in any other subject, new data reveals”
  • “New data reveals stark differences in the number of specialists teaching each subject in secondary schools

The first story reveals which subject has the most teachers — it is a ranking angle because it ranks teachers by subject.

The second story reveals the simple fact that variation exists, without focusing on any particular subject.

Continue reading

6 Wege, Datenjournalismus zu kommunizieren (Die umgekehrte Pyramide des Datenjournalismus Teil 2)

Datenjournalismus: Daten kommunizieren Visualisiern Erzählen Herunterbrechen Personalisieren Audiolisieren/materialisieren Nutzen bieten

Die umgekehrte Pyramide des Datenjournalismus bildet den Prozess der Datennutzung in der Berichterstattung ab, von der Ideenentwicklung über die Bereinigung, Kontextualisierung und Kombination bis hin zur Kommunikation. In dieser letzten Phase – der Kommunikation – sollten wir einen Schritt zurücktreten und unsere Optionen betrachten: von Visualisierung und Erzählung bis hin zu Personalisierung und Werkzeugen.

(Auch auf Englisch und Spanisch verfügbar.)

1. Visualisieren

Visualisierung kann ein schneller Weg sein, die Ergebnisse des Datenjournalismus zu vermitteln: Kostenlose Tools wie Datawrapper und Flourish erfordern oft nur, dass du deinen Daten hochlädst und aus verschiedenen Visualisierungsoptionen auswählst.

Continue reading

How to (not) write about numbers

Image by Andy Maguire | CC BY 2.0

If you’ve been working on a story involving data, the temptation can be to throw all the figures you’ve found into the resulting report — but the same rules of good writing apply to numbers too. Here are some tips to make sure you’re putting the story first.

Continue reading

How to ask AI to perform data analysis

Consider the model: Some models are better for analysis — check it has run code

Name specific columns and functions: Be explicit to avoid ‘guesses’ based on your most probably meaning

Design answers that include context: Ask for a top/bottom 10 instead of just one answer

'Ground' the analysis with other docs: Methodologies, data dictionaries, and other context

Map out a method using CoT: Outline the steps needed to be taken to reduce risk

Use prompt design techniques to avoid gullibility and other risks: N-shot prompting (examples), role prompting, negative prompting and meta prompting can all reduce risk

Anticipate conversation limits: Regularly ask for summaries you can carry into a new conversation

Export data to check: Download analysed data to check against the original

Ask to be challenged: Use adversarial prompting to identify potential blind spots or assumptions

In a previous post I explored how AI performed on data analysis tasks — and the importance of understanding the code that it used to do so. If you do understand code, here are some tips for using large language models (LLMs) for analysis — and addressing the risks of doing so.

Continue reading

I tested AI tools on data analysis — here’s how they did (and what to look out for)

Mug with 'Data or it didn't happen' on it
Photo: Jakub T. Jankiewicz | CC BY-SA 2.0

TL;DR: If you understand code, or would like to understand code, genAI tools can be a useful tool for data analysis — but results depend heavily on the context you provide, and the likelihood of flawed calculations mean code needs checking. If you don’t understand code (and don’t want to) — don’t do data analysis with AI.

ChatGPT used to be notoriously bad at maths. Then it got worse at maths. And the recent launch of its newest model, GPT-5, showed that it’s still bad at maths. So when it comes to using AI for data analysis, it’s going to mess up, right?

Well, it turns out that the answer isn’t that simple. And the reason why it’s not simple is important to explain up front.

Generative AI tools like ChatGPT are not calculators. They use language models to predict a sequence of words based on examples from its training data.

But over the last two years AI platforms have added the ability to generate and run code (mainly Python) in response to a question. This means that, for some questions, they will try to predict the code that a human would probably write to solve your question — and then run that code.

When it comes to data analysis, this has two major implications:

  1. Responses to data analysis questions are often (but not always) the result of calculations, rather than a predicted sequence of words. The algorithm generates code, runs that code to calculate a result, then incorporates that result into a sentence.
  2. Because we can see the code that performed the calculations, it is possible to check how those results were arrived at.
Continue reading

Tre flere vinkler som oftest brukes til å fortelle datahistorier: utforskere, sammenhenger og metadatahistorier

I et tidligere innlegg skrev jeg om fire av vinklene som oftest brukes til å fortelle historier om data. I denne andre delen ser jeg på de tre øvrige vinklene: historier som fokuserer på sammenhenger; ‘metadata’-vinkler som fokuserer på dataenes fravær, dårlige kvalitet eller innsamling — og utforskende artikler som blander flere vinkler eller gir en mulighet til å bli kjent med selve dataene.

7 vanlige vinkler for datahistorier

Omfang: 'Så stort er problemet'
Endring/stillstand: ‘Dette øker/synker/blir ikke bedre’
Rangering: ‘De beste/verste/hvor vi rangerer’
Variasjon: "Geografisk lotteri" 
Utforske: Reportasjer, interaktivitet og kunst
Relasjoner/avmystifisering: ‘Ting er forbundet’ — eller ikke; nettverk og strømmer av makt og penger
Metadata: ‘Bekymringer rundt data’; ‘Manglende data’, ‘Få tak i dataene’
Continue reading

Die umgekehrte Pyramide des Datenjournalismus: Vom Datensatz zur Story

Die umgekehrte Pyramide des Datenjournalismus
Ideen entwickeln
Daten sammeln
Reinigen
Kontextualisieren
Kombinieren
Fragen
Kommunizieren

Datenjournalistische Projekte lassen sich in einzelne Schritte aufteilen – jeder einzelne Schritt bringt eigene Herausforderungen. Um dir zu helfen, habe ich die “Umgekehrte Pyramide des Datenjournalismusentwickelt. Sie zeigt, wie du aus einer Idee eine fokussierte Datengeschichte machst. Ich erkläre dir Schritt für Schritt, worauf du achten solltest, und gebe dir Tipps, wie du typische Stolpersteine vermeiden kannst.

(Auch auf Englisch, Spanisch, Finnisch, Russisch and Ukrainisch verfügbar.)

Continue reading

9 takeaways from the Data Journalism UK conference

Attendees in a lecture theatre with 'data and investigative journalism conference 2025 BBC Shared Data Unit' on the screen.

Last month the BBC’s Shared Data Unit held its annual Data and Investigative Journalism UK conference at the home of my MA in Data Journalism, Birmingham City University. Here are some of the highlights…

Continue reading