How to ask AI to perform data analysis

Consider the model: Some models are better for analysis — check it has run code

Name specific columns and functions: Be explicit to avoid ‘guesses’ based on your most probably meaning

Design answers that include context: Ask for a top/bottom 10 instead of just one answer

'Ground' the analysis with other docs: Methodologies, data dictionaries, and other context

Map out a method using CoT: Outline the steps needed to be taken to reduce risk

Use prompt design techniques to avoid gullibility and other risks: N-shot prompting (examples), role prompting, negative prompting and meta prompting can all reduce risk

Anticipate conversation limits: Regularly ask for summaries you can carry into a new conversation

Export data to check: Download analysed data to check against the original

Ask to be challenged: Use adversarial prompting to identify potential blind spots or assumptions

In a previous post I explored how AI performed on data analysis tasks — and the importance of understanding the code that it used to do so. If you do understand code, here are some tips for using large language models (LLMs) for analysis — and addressing the risks of doing so.

Continue reading

I tested AI tools on data analysis — here’s how they did (and what to look out for)

Mug with 'Data or it didn't happen' on it
Photo: Jakub T. Jankiewicz | CC BY-SA 2.0

TL;DR: If you understand code, or would like to understand code, genAI tools can be a useful tool for data analysis — but results depend heavily on the context you provide, and the likelihood of flawed calculations mean code needs checking. If you don’t understand code (and don’t want to) — don’t do data analysis with AI.

ChatGPT used to be notoriously bad at maths. Then it got worse at maths. And the recent launch of its newest model, GPT-5, showed that it’s still bad at maths. So when it comes to using AI for data analysis, it’s going to mess up, right?

Well, it turns out that the answer isn’t that simple. And the reason why it’s not simple is important to explain up front.

Generative AI tools like ChatGPT are not calculators. They use language models to predict a sequence of words based on examples from its training data.

But over the last two years AI platforms have added the ability to generate and run code (mainly Python) in response to a question. This means that, for some questions, they will try to predict the code that a human would probably write to solve your question — and then run that code.

When it comes to data analysis, this has two major implications:

  1. Responses to data analysis questions are often (but not always) the result of calculations, rather than a predicted sequence of words. The algorithm generates code, runs that code to calculate a result, then incorporates that result into a sentence.
  2. Because we can see the code that performed the calculations, it is possible to check how those results were arrived at.
Continue reading

Tre flere vinkler som oftest brukes til å fortelle datahistorier: utforskere, sammenhenger og metadatahistorier

I et tidligere innlegg skrev jeg om fire av vinklene som oftest brukes til å fortelle historier om data. I denne andre delen ser jeg på de tre øvrige vinklene: historier som fokuserer på sammenhenger; ‘metadata’-vinkler som fokuserer på dataenes fravær, dårlige kvalitet eller innsamling — og utforskende artikler som blander flere vinkler eller gir en mulighet til å bli kjent med selve dataene.

7 vanlige vinkler for datahistorier

Omfang: 'Så stort er problemet'
Endring/stillstand: ‘Dette øker/synker/blir ikke bedre’
Rangering: ‘De beste/verste/hvor vi rangerer’
Variasjon: "Geografisk lotteri" 
Utforske: Reportasjer, interaktivitet og kunst
Relasjoner/avmystifisering: ‘Ting er forbundet’ — eller ikke; nettverk og strømmer av makt og penger
Metadata: ‘Bekymringer rundt data’; ‘Manglende data’, ‘Få tak i dataene’
Continue reading

This is what happened when I asked journalism students to keep an ‘AI diary’

Last month I wrote about my decision to use an AI diary as part of assessment for a module I teach on the journalism degrees at Birmingham City University. The results are in — and they are revealing.

AI diary screenshots, including AI diary template which says:
Use this document to paste and annotate all your interactions with genAI tools. 

Interactions should include your initial prompt and response, as well as follow up prompts (“iterations”) and the responses to those. Include explanatory and reflective notes in the right hand column. Reflective notes might include observations about potential issues such as bias, accuracy, hallucinations, etc. You can also explain what you did outside of the genAI tool, in terms of other work. 

At least some of the notes should include links to literature (e.g. articles, videos, research) that you have used in creating the prompt or on reflecting on it. You do not need to use Harvard referencing - but the link must go directly to the material. See the examples on Moodle for guidance.

To add extra rows place your cursor in the last box and press the Tab key on your keyboard, or right-click in any row and select ‘add new row’.
Excerpts from AI diaries

What if we just asked students to keep a record of all their interactions with AI? That was the thinking behind the AI diary, a form of assessment that I introduced this year for two key reasons: to increase transparency about the use of AI, and to increase critical thinking.

Continue reading

How to reduce the environmental impact of using AI

Generative AI: reducing environmental impact
Disable AI or switch tool
Compare AI vs non-AI
Compare models
Prompt planning
Prompt design and templating
Measuring and reviewing
Run locally

One of the biggest concerns over the use of generative AI tools like ChatGPT is their environmental impact. But what is that impact — and what strategies are there for reducing it? Here is what we know so far — and some suggestions for good practice.

What exactly is the environmental impact of using generative AI? It’s not an easy question to answer, as the MIT Technology Review’s James O’Donnell and Casey Crownhart found when they set out to find some answers.

“The common understanding of AI’s energy consumption,” they write, “is full of holes.”

Continue reading

Die umgekehrte Pyramide des Datenjournalismus: Vom Datensatz zur Story

Die umgekehrte Pyramide des Datenjournalismus
Ideen entwickeln
Daten sammeln
Reinigen
Kontextualisieren
Kombinieren
Fragen
Kommunizieren

Datenjournalistische Projekte lassen sich in einzelne Schritte aufteilen – jeder einzelne Schritt bringt eigene Herausforderungen. Um dir zu helfen, habe ich die “Umgekehrte Pyramide des Datenjournalismusentwickelt. Sie zeigt, wie du aus einer Idee eine fokussierte Datengeschichte machst. Ich erkläre dir Schritt für Schritt, worauf du achten solltest, und gebe dir Tipps, wie du typische Stolpersteine vermeiden kannst.

(Auch auf Englisch, Spanisch, Finnisch, Russisch and Ukrainisch verfügbar.)

Continue reading

9 takeaways from the Data Journalism UK conference

Attendees in a lecture theatre with 'data and investigative journalism conference 2025 BBC Shared Data Unit' on the screen.

Last month the BBC’s Shared Data Unit held its annual Data and Investigative Journalism UK conference at the home of my MA in Data Journalism, Birmingham City University. Here are some of the highlights…

Continue reading

Teaching journalism students generative AI: why I switched to an “AI diary” this semester

The Thinker status
Image by Fredrik Rubensson CC BY-SA 2.0

As universities adapt to a post-ChatGPT era, many journalism assessments have tried to address the widespread use of AI by asking students to declare and reflect on their use of the technology in some form of critical reflection, evaluation or report accompanying their work. But having been there and done that, I didn’t think it worked.

So this year — my third time round teaching generative AI to journalism students — I made a big change: instead of asking students to reflect on their use of AI in a critical evaluation alongside a portfolio of journalism work, I ditched the evaluation entirely.

Continue reading

Why I’m no longer saying AI is “biased”

TLDR; Saying “AI has biases” or “biased training data” is preferable to “AI is biased” because it reduces the risk of anthropomorphism and focuses on potential solutions, not problems.

Searches for "AI bias" peaked in 2025. In March 2025 twice as many searches were made for "AI bias" compared to 12 months before.
Click image to explore an interactive version

For the last two years I have been standing in front of classes and conferences saying the words “AI is biased” — but a couple months ago, I stopped.

As journalists, we are trained to be careful with language — and “AI is biased” is a sloppy piece of writing. It is a thoughtless cliche, often used without really thinking what it means, or how it might mislead.

Because yes, AI is “biased” — but it’s not biased in the way most people might understand that word.

Continue reading