In a previous post I explored how AI performed on data analysis tasks — and the importance of understanding the code that it used to do so. If you do understand code, here are some tips for using large language models (LLMs) for analysis — and addressing the risks of doing so.
TL;DR: If you understand code, or would like to understand code, genAI tools can be a useful tool for data analysis — but results depend heavily on the context you provide, and the likelihood of flawed calculations mean code needs checking. If you don’t understand code (and don’t want to) — don’t do data analysis with AI.
ChatGPT used to be notoriously bad at maths. Then it got worse at maths. And the recent launch of its newest model, GPT-5, showed that it’s still bad at maths. So when it comes to using AI for data analysis, it’s going to mess up, right?
Well, it turns out that the answer isn’t that simple. And the reason why it’s not simple is important to explain up front.
But over the last two years AI platforms have added the ability to generate and run code (mainly Python) in response to a question. This means that, for some questions, they will try to predict the code that a human would probably write to solve your question — and then run that code.
When it comes to data analysis, this has two major implications:
Responses to data analysis questions are often (but not always) the result of calculations, rather than a predicted sequence of words. The algorithm generates code, runs that code to calculate a result, then incorporates that result into a sentence.
Because we can see the code that performed the calculations, it is possible to checkhow those results were arrived at.
One of the biggest concerns over the use of generative AI tools like ChatGPT is their environmental impact. But what is that impact — and what strategies are there for reducing it? Here is what we know so far — and some suggestions for good practice.
What exactly is the environmental impact of using generative AI? It’s not an easy question to answer, as the MIT Technology Review’s James O’Donnell and Casey Crownhart found when they set out to find some answers.
“The common understanding of AI’s energy consumption,” they write, “is full of holes.”
A new AI function is being added to Google Sheets that could make most other functions redundant. But is it any good? And what can it be used for? Here’s what I’ve learned in the first week…
The AI function avoids the Clippy-like annoyances of Gemini in Sheets
AI has been built into Google Sheets for some time now in the Clippy-like form of Gemini in Sheets. But Google Sheets’s AI function is different.
Available to a limited number of users for now, it allows you to incorporate AI prompts directly into a formula rather than having to rely on Gemini to suggest a formula using existing functions.
At the most basic level that means the AI function can be used instead of functions like SUM, AVERAGE or COUNT by simply including a prompt like “Add the numbers in these cells” (or “calculate an average for” or “count”). But more interesting applications come in areas such as classification, translation, analysis and extraction, especially where a task requires a little more ‘intelligence’ than a more literally-minded function can offer.
I put the AI function through its paces with a series of classification challenges to see how it performed. Here’s what happened — and some ways in which the risks of generative AI need to be identified and addressed.
Tools like ChatGPT might seem to speak your language, but they actually speak a language of probability and educated guesswork. You can make yourself better understood — and get more professional results — with a few simple prompting techniques. Here are the key ones to add to your toolkit. (also in Portuguese)
In the latest in a series of posts on using generative AI, I look at how tools such as ChatGPT and Claude.ai can help help identify potential bias and check story drafts against relevant guidelines.
We are all biased — it’s human nature. It’s the reason stories are edited; it’s the reason that guidelines require journalists to stick to the facts, to be objective, and to seek a right of reply. But as the Columbia Journalism Review noted two decades ago: “Ask ten journalists what objectivity means and you’ll get ten different answers.”
Generative AI is notoriously biased itself — but it has also been trained on more material on bias than any human likely has. So, unlike a biased human, when you explicitly ask it to identify bias in your own reporting, it can perform surprisingly well.
It can also be very effective in helping us consider how relevant guidelines might be applied to our reporting — a checkpoint in our reporting that should be just as baked-in as the right of reply.
In this post I’ll go through some template prompts and tips on each. First, a recap of the rules of thumb I introduced in the previous post.
In the fifth of a series of posts from a workshop at the Centre for Investigative Journalism Summer School, I look at using generative AI tools such as ChatGPT and Google Gemini to help with reviewing your work to identify ways it can be improved, from technical tweaks and tightening your writing to identifying jargon.
Having an editor makes you a better writer. At a basic level, an editor is able to look at your work with fresh eyes and without emotional attachment: they will not be reluctant to cut material just because it involved a lot of work, for example.
An editor should also be able to draw on more experience and knowledge — identifying mistakes and clarifying anything that isn’t clear.
But there are good editors, and there are bad editors. There are lazy editors who don’t care about what you’re trying to achieve, and there are editors with great empathy and attention to detail. There are editors who make you a better writer, and those who don’t.
Generative AI can be a bad editor. Ensuring it isn’t requires careful prompting and a focus on ensuring that it’s not just the content that improves, but you as a writer.
One of the most common reasons a journalist might need to learn to code is scraping: compiling information from across multiple webpages, or from one page across a period of time.
But scraping is tricky: it requires time learning some coding basics, and then further time learning how to tackle the particular problems that a specific scraping task involves. If the scraping challenge is anything but simple, you will need help to overcome trickier obstacles.
Large language models (LLMs) like ChatGPT are especially good at providing this help because writing code is a language challenge, and material about coding makes up a significant amount of the material that these models have been trained on.
This can make a big difference in learning to code: in the first year that I incorporated ChatGPT into my data journalism Masters at Birmingham City University I noticed that students were able to write more advanced scrapers earlier than previously — and also that students were less likely to abandon their attempts at coding.
You can also start scraping pretty quickly with the right prompts (Google Colab allows you to run Python code within Google Drive). Here are some tips on how to do so…
Spreadsheet analysis is part of the research phase of a story
Generative AI tools like ChatGPT and Gemini can be a big help when dealing with data in spreadsheets. In this third of a series of posts from a workshop at the Centre for Investigative Journalism Summer School (the first part covered idea generation; the second research), I outline tips and techniques for using those tools to help with spreadsheet formulae and reshaping data.
Whether you come across data as part of story research, or compile data yourself, chances are that at some point you will need to write a formula to ask questions of that data, or make it possible to ask questions (such as creating a column which extracts data from another).
If you find yourself coming up against the limits of your spreadsheet knowledge, then genAI tools can be useful both in breaking through those — while expanding your knowledge of functions and formula writing.
Writing spreadsheet formulae with ChatGPT or other genAI tools
Generative AI can be used at all points in the journalism process: this post focuses on the research stage
In the second of a series of posts from a workshop at the Centre for Investigative Journalism Summer School (read the first part on idea generation here), I look at using generative AI tools such as ChatGPT and Google Gemini to improve sourcing and story research.
Research is arguably the second-highest risk area (after content generation) for using generative AI within journalism. The most obvious reason for this is AI’s ability to make things up (“hallucinate“) — but there are other reasons too.