In a previous post I explored how AI performed on data analysis tasks — and the importance of understanding the code that it used to do so. If you do understand code, here are some tips for using large language models (LLMs) for analysis — and addressing the risks of doing so.
TL;DR: If you understand code, or would like to understand code, genAI tools can be a useful tool for data analysis — but results depend heavily on the context you provide, and the likelihood of flawed calculations mean code needs checking. If you don’t understand code (and don’t want to) — don’t do data analysis with AI.
ChatGPT used to be notoriously bad at maths. Then it got worse at maths. And the recent launch of its newest model, GPT-5, showed that it’s still bad at maths. So when it comes to using AI for data analysis, it’s going to mess up, right?
Well, it turns out that the answer isn’t that simple. And the reason why it’s not simple is important to explain up front.
But over the last two years AI platforms have added the ability to generate and run code (mainly Python) in response to a question. This means that, for some questions, they will try to predict the code that a human would probably write to solve your question — and then run that code.
When it comes to data analysis, this has two major implications:
Responses to data analysis questions are often (but not always) the result of calculations, rather than a predicted sequence of words. The algorithm generates code, runs that code to calculate a result, then incorporates that result into a sentence.
Because we can see the code that performed the calculations, it is possible to checkhow those results were arrived at.
I et tidligere innlegg skrev jeg om fire av vinklene som oftest brukes til å fortelle historier om data. I denne andre delen ser jeg på de tre øvrige vinklene: historier som fokuserer på sammenhenger; ‘metadata’-vinkler som fokuserer på dataenes fravær, dårlige kvalitet eller innsamling — og utforskende artikler som blander flere vinkler eller gir en mulighet til å bli kjent med selve dataene.
What if we just asked students to keep a record of all their interactions with AI? That was the thinking behind the AI diary, a form of assessment that I introduced this year for two key reasons: to increase transparency about the use of AI, and to increase critical thinking.
One of the biggest concerns over the use of generative AI tools like ChatGPT is their environmental impact. But what is that impact — and what strategies are there for reducing it? Here is what we know so far — and some suggestions for good practice.
What exactly is the environmental impact of using generative AI? It’s not an easy question to answer, as the MIT Technology Review’s James O’Donnell and Casey Crownhart found when they set out to find some answers.
“The common understanding of AI’s energy consumption,” they write, “is full of holes.”
Datenjournalistische Projekte lassen sich in einzelne Schritte aufteilen – jeder einzelne Schritt bringt eigene Herausforderungen. Um dir zu helfen, habe ich die “Umgekehrte Pyramide des Datenjournalismus” entwickelt. Sie zeigt, wie du aus einer Idee eine fokussierte Datengeschichte machst. Ich erkläre dir Schritt für Schritt, worauf du achten solltest, und gebe dir Tipps, wie du typische Stolpersteine vermeiden kannst.
Last month the BBC’s Shared Data Unit held its annual Data and Investigative Journalism UK conference at the home of my MA in Data Journalism, Birmingham City University. Here are some of the highlights…
As universities adapt to a post-ChatGPT era, many journalism assessments have tried to address the widespread use of AI by asking students to declare and reflect on their use of the technology in some form of critical reflection, evaluation or report accompanying their work. But having been there and done that, I didn’t think it worked.
So this year — my third time round teaching generative AI to journalism students — I made a big change: instead of asking students to reflect on their use of AI in a critical evaluation alongside a portfolio of journalism work, I ditched the evaluation entirely.
TLDR; Saying “AI has biases” or “biased training data” is preferable to “AI is biased” because it reduces the risk of anthropomorphism and focuses on potential solutions, not problems.
For the last two years I have been standing in front of classes and conferences saying the words “AI is biased” — but a couple months ago, I stopped.
As journalists, we are trained to be careful with language — and “AI is biased” is a sloppypiece of writing. It is a thoughtless cliche, often used without really thinking what it means, or how it might mislead.
Because yes, AI is “biased” — but it’s not biased in the way most people might understand that word.