How to ask AI to perform data analysis

Consider the model: Some models are better for analysis — check it has run code

Name specific columns and functions: Be explicit to avoid ‘guesses’ based on your most probably meaning

Design answers that include context: Ask for a top/bottom 10 instead of just one answer

'Ground' the analysis with other docs: Methodologies, data dictionaries, and other context

Map out a method using CoT: Outline the steps needed to be taken to reduce risk

Use prompt design techniques to avoid gullibility and other risks: N-shot prompting (examples), role prompting, negative prompting and meta prompting can all reduce risk

Anticipate conversation limits: Regularly ask for summaries you can carry into a new conversation

Export data to check: Download analysed data to check against the original

Ask to be challenged: Use adversarial prompting to identify potential blind spots or assumptions

In a previous post I explored how AI performed on data analysis tasks — and the importance of understanding the code that it used to do so. If you do understand code, here are some tips for using large language models (LLMs) for analysis — and addressing the risks of doing so.

Start by uploading a dataset — ideally in CSV format

In order to perform data analysis with genAI tools, you need to upload the dataset. It’s best if that dataset is in CSV format rather than other spreadsheet formats like XLSX or ODS. There are a few reasons for this: first, a CSV will be smaller, making it less likely that you hit the tool’s limits; and second, a CSV can only have one sheet of data, ensuring that you know which sheet the AI tool is looking at.

To convert an XLSX or ODS file as a CSV, open the file in spreadsheet software (Excel or Google Sheets), go to the sheet with the data you want, and use the File > Save As… menu to save that sheet in CSV format. It will probably warn you that you will lose the data in the other sheets – that’s fine, you only want the one sheet.

Consider the model

Each genAI platform has a default language model that it uses, but this may not be the best one for analysis.

Google’s Gemini, for example, defaults to 2.5 Flash at the moment, but 2.5 Pro is described as being for “Reasoning maths and code” (change the model by clicking on its name in the upper right corner). If you’re paying for a Pro account you’ll have other model options too.

Claude’s guide to choosing a model and OpenAI’s Cookbook can help explain the differences between models. (GPT-5 complicates things by choosing a model for you, making it vital that you design a prompt which steers it towards an appropriate one).

It’s not just about choosing a model for its power — a less powerful model can still generate working code, will often be faster, and certainly have lower environmental impact. Try different models to see which one is good enough for your purposes (Gemini 2.5 Flash is fine for most analysis, for example).

Name columns and functions in your prompts

What you write: Which company has the biggest gender pay gap?
What AI does: Predict what ‘biggest’ is most likely to mean.
Predict what column ‘gender pay gap’ most likely refers to. 
Write code that sorts the data by that column and filters to the name in the first row
What you need to do: Check what column the code sorted on. 
Check how it calculated ‘biggest’
Check if it was the only company with that number
To reduce the risk of AI ‘misunderstanding’ you, be specific about columns and functions

A genAI language model works by identifying the most probable meaning of your words, so there is always a risk that it will get that wrong.

One simple practice to reduce this risk is to name the columns that you want it to use.

For example, instead of a prompt like “Count the total fires” you would write “Use the Incidents column to calculate the total number of fires“.

The same applies to calculations. Any request for a calculation will be translated into a (most probable) Python or JavaScript function. So when you ask for an “average” or a “total”, think what you actually mean in practical terms. Do you want it to use a median or a mean function? Do you want it to use a sum function, or count?

A clearer prompt will say something like “calculate the median value for the column PatientTotal” or "use a mean function to calculate an average value for the column".

For some calculations you may want to break it down into a series of steps. Here’s an example of a prompt which attempts to be as explicit as possible about what it wants AI to do when it generates code:

Here is data on the gender pay gap for over 10,000 companies. I want you to calculate how many companies there are in Birmingham. To do this you need to look at two columns: Address and Postcode.
In the Postcode column look for postcodes that start with B, followed by a digit (examples include B9 or B45). Exclude postcodes that start with a B, followed by a letter, (examples include BL2 or BB22).
In the Address column only count addresses where Birmingham appears either at the very end of the address, or before a comma or a word like 'England' or 'UK'. If an address contains 'Birmingham Road' or 'Birmingham Business Park' this does not necessarily mean it is in Birmingham, unless the address also contains Birmingham towards the end of the address, as detailed. Adjust the code so that either a postcode match OR an address match is counted - it doesn't have to meet both criteria

Include context by asking for more than one figure or row

When working with data directly, the figures surrounding your focus can provide useful clues to avoid mistakes. You can replicate this in your analysis by avoiding prompts that ask for a single figure or row. For example:

  • Instead of asking for the bodies or categories that are ‘biggest’ on a particular metric, ask for a ‘top 10’ and a ‘bottom 10’. Sometimes there is more than one organisation with the same figure, and sometimes the biggest is a meaningless outlier for statistical reasons. Sometimes the largest negative numbers are the ‘biggest’.
  • Instead of asking for a single average, ask for different types of average, e.g. mean, median and mode.
  • Ask for a statistical summary of the columns you are interested in. A summary for a numerical column typically shows the distribution of values (mean, median, quartiles, max and min, standard deviation). You can also ask for the data type(s) of the field(s) that you’re interested in, the number of entries and empty cells,

Use prompt design techniques to avoid gullibility and other risks

Prompt design techniques for genAI
Role prompting
One-shot prompting
Recursive prompting
Retrieval augmented generation
Chain of thought
Meta prompting
Negative prompting

AI models are eager to please, so will generally fail to challenge you when your question is flawed or lacking detail. Instead they will do what they can with the information provided, increasing the risk of incorrect answers.

Here are some prompt design techniques to use when asking for data analysis and template prompts to adapt:

  • Meta-prompting: once you’ve designed your own prompt see what AI would suggest, and if you can adapt yours based on its own attempt. Try: I am a data journalist looking to perform analysis on this data. Suggest three advanced prompts which employ prompt design techniques and could be used to ask an LLM to answer this question, and explain why each might work well (and why they might not):
  • Role prompting: the ‘role’ you give to the AI model can play an important role in prompting it to be less sycophantic and more of a critical assistant. For example: You are an experienced, sceptical and cautious data analyst. You are always conscious of the blind spots and mistakes made by data journalists when analysing data sets. Use code to perform analysis on the attached dataset which answers the following question, but also highlight any potential mistakes or blind spots to consider:
  • N-shot prompting: this involves providing a certain number (“n”) of examples (“shots”). These could be examples of previous stories using similar data, or it could be examples of methods used previously. For example: Below I've pasted some examples of angles drawn from this dataset in the past. Identify what calculations or code might have been used to arrive at those numbers [PASTE EXCERPTS FROM PREVIOUS STORIES]:
  • Recursive prompting: This is simply following up on responses. As a follow-up to the analysis provided, you might prompt: Review the code you used to arrive at that answer. Identify any potential blind spots or problems, and list three alternative ways to answer the question.
  • Negative prompting: Try this: Do not make any assumptions about the question that have not been explicitly stated, and do not proceed until you have clarified any ambiguity or assumptions embedded in the question.
  • Structured output prompting involves asking it to provide its output in a particular data format: Provide the code used as a downloadable .py file. Provide the results in [CSV/JSON/Markdown table] format

Chain of Thought and RAG deserve special consideration…

Map out a method using Chain of Thought (CoT)

Inverted pyramid of data journalism: conceive, compile, clean, context, combine (with 'question' throughout). Communicate: vis, narrate, humanise, personalise, socialise, utilise

Chain of Thought (CoT) prompting involves setting out a series of steps to be followed, and/or asking the model to explain the steps that it took to arrive at a result. This can be very useful in analysis because a significant factor in the accuracy of any analysis is the method being used.

Here’s an example of a prompt using CoT to reduce the risks involved in data analysis:

First, identify any aspects of the question which are ambiguous, or could be better expressed, and seek clarification on those. Once the question is clear enough, identify which columns are relevant to the question. Then outline three potential strategies for answering the question through code. Review the strategies and pick the one which is most rigorous, provides the most context, and is least likely to contain blind spots or errors. Explain your thinking.

The advantage of CoT is that it pushes you to think about what steps are important in the analysis process, because CoT means you must communicate those steps.

In the case of data analysis, we might identify that the first step is the question itself — but we could go back even further to the selection or understanding of the dataset being used.

The Inverted Pyramid of Data Journalism provides a useful framework here, as it does exactly that: lay out the steps that a data journalism project often involves. Important to highlight here is that the ‘Question’ stage runs throughout all others. The post with the updated model outlines those questions in more detail, and these can be incorporated into a prompt.

In fact, you could include that post, or extracts from it, as extra context to your prompt — a technique called RAG

‘Ground’ the analysis with other documents (RAG)

Retrieval Augmented Generation (RAG) is one of the most powerful ways to improve responses from AI models. It involves ‘augmenting’ your prompt with useful or important information, typically in the form of extracts or attachments.

Attaching the dataset itself is a form of RAG — but you can also attach other material that puts the dataset into context. Examples include:

Here is an example of a template response which might use RAG. One advantage of a template prompt like this is that it reminds you to seek out the documents you need:

As well as the data itself I have attached a document explaining what each column means, and a methodology. Below is an extract on the different questions that need to be asked at every stage of the data analysis process.

Check assumptions built in to the question and challenge them, and add context that is relevant to the questions being asked. Here is the extract: [PASTE EXTRACT AND ATTACH DOCUMENTS]

Another application of RAG is to contrast a dataset with the claims made about it. For example:

You are a sceptical data journalist that works for a factchecking organisation. You are used to powerful people misrepresenting data, putting a positive spin on it, or cherry-picking one facet of the data while ignoring less positive facets. You are checking the attached public statement made by a powerful person about a dataset. Compare this statement to the data and identify any claims that do not appear to be supported by the data, or any evidence of cherry picking. Identify any aspects of the data or other documents attached that are not mentioned in the statement but which might be newsworthy because they highlight potential problems, low-ranking performers, change, missing data, or outliers.

Message limits and conversation limits can interrupt analysis

Diagram showing how the context window contains the user message and text response for all turns in a conversation
Claude’s diagram illustrating a “standard” context window where the model does not use extended thinking

Remember that genAI tools have a limit on the amount of memory — the ‘context window‘ — they can hold in a conversation, and at some point you might have to start a new conversation in order to continue the analysis.

In my testing, Claude in particular tended to hit these limits earlier, because it also tended to employ extended thinking and provide more detailed responses to prompts, considering aspects that weren’t mentioned in the question.

There are a few strategies to consider if you hit these limits:

  • Reduce the length of responses through negative prompting. For example you might say “in more than 300 words” or “do not do any more than is asked“. However, this does make it less likely that you will be alerted to potential blind spots or important context, so it should be done with care.
  • Ask it to summarise the conversation or code so far (and paste it at the start of any new conversation). Copying the summary will allow you to ‘export’ some memory from one conversation into another. You will need to do this before hitting any limits, so establish a routine of doing this after a certain number of interactions (for example after every five prompts in Claude, or ten in ChatGPT, depending on the complexity of the prompts and responses).
  • Plan ahead and break up the analysis into different parts. Instead of trying to complete the analysis in a single conversation, break it down into different tasks, and use a different conversation for each. This can create more natural break points and reduce the need for exporting responses between conversations.

You might ask for an estimate of the tokens used so far but in my testing I found re-running the same query in the same point in a conversation generated very different results, and none of them close to the reality.

Always export a version to check

Because it’s always useful to see data in context, ask for a download of the results of the data analysis. If it involved sorting, for example, ask it for a downloadable CSV of the sorted data so you can see it in full. If cleaning or filtering was involved, a downloaded version will allow you to compare it with the original.

Ask it to challenge you

A final tip is to temper AI’s sycophancy bias using adversarial prompting to identify potential blind spots or assumptions in your approach to the analysis. For example:

Act as a sceptical editor and ask critical questions about the prompts and methods used throughout this interaction. Identify potential blind spots, assumptions, potential ambiguity, or other problems with the approach, and other criticisms that might be made.

Have you used AI for data analysis and have any tips? Post them in the comments below or comment on LinkedIn.

*The models used in the tests were as follows: ChatGPT GPT-4o, Claude Sonnet 4, Gemini 2.5 Flash, Copilot GPT-4-turbo.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.