
It’s over a decade since I published the Inverted Pyramid of Data Journalism. The model has been translated into multiple languages, taught all over the world, and included in a number of books and research papers. But in that time the model has also developed and changed through discussion and teaching, so here’s a round up of everything I’ve written or recommended on the different stages — along with a revised model in English (shown above; versions have been published before in German, Spanish, Finnish, Russian and Ukrainian).
The most basic change to the Inverted Pyramid of Data Journalism is the recognition of a stage that precedes all others — idea generation — labelled ‘Conceive’ in the diagram above.
This is often a major stumbling block to people starting out with data journalism, and I’ve written a lot about it in recent years (see below for a full list).
The second major change is to make questioning more explicit as a process that (should) take place through all stages — not just in data analysis but in the way we question our sources, our ideas, and the reliability of the data itself.
A third change is to remove the ‘socialise‘ option from the communication pyramid: in conversation with Alexandra Stark I realised that this is covered sufficiently by the ‘utilise’ stage (i.e. making something useful socially).
Replacing that is a new communication option — in fact, two: audiolise and physicalise. This recognises the emergence of sonification as a method of communicating data, and physical methods of representing data from crochet to art installations.
Alongside the updated pyramid I’ve been using for the past few years I also wanted to round up links to a number of resources that relate to each stage. Here they are…
Stage 1: Conceive
Data journalism ideas can range from the simplest angles for turning around stories quickly from new datasets, to in-depth investigations. The following links cover both situations, and map out the different pathways that journalists follow to get there.
- Here are the angles journalists use most often to tell the stories in data
- This is where data journalists get their ideas from
- How to brainstorm COVID-19 data story ideas (these techniques can be applied to any topic)
- How to use the ‘4 stages of curiosity’ as a framework for investigations
- Empathy as an investigative tool: how to map systems to come up with story ideas
Stage 2: Compile
Data for a story can come from a variety of sources. The links below cover a range of scenarios, from identifying regular sources of data and APIs, to compiling data yourself through data entry or scraping, to using FOI or company accounts, and treating text as data.
- VIDEO: Where data journalists get data from
- How to: create a data news diary
- How to: plan a journalism project that needs data entry
- Data scraping for stories
- What Data Journalists Need to Know About Application Programming Interfaces (APIs)
- How to: find the data behind an interactive chart or map using the inspector
- VIDEO PLAYLIST: Finding stories in company accounts
- How to search for information in data black holes: Barbara Maseda and the Inventario project
- 11 FOI tips and other highlights from ‘FOIA Without the Lawyer’
- What do journalists do with large amounts of text?
- Using satellite data for journalism — tips from the experts
- How do I get data if my country doesn’t publish any?
- See also the research paper (£): Scrape, Request, Collect, Repeat: How Data Journalists Around the World Transcend Obstacles to Public Data
Stage 3: Clean
Data cleaning can take up a disproportionate amount of time in a data project (although not the widely reported 80% factoid) — and yet it’s the area that’s perhaps least written about. Hadley Wickham‘s Tidy Data is the exception to the rule here, while below I’ve listed some posts and a video which cover this stage.
- What is dirty data and how do I clean it? A great big guide for data journalists
- Cleaning data using Google Refine: a quick guide
- VIDEO: Computational thinking in data journalism
- What are regular expressions — and how to use them in Google Sheets to get data from text
- How to: fix spreadsheet dates that are in both US and UK formats
- Jonathan Stray’s Curious Journalist’s Guide to Data is also a great introduction to the issues in this area
- See also the research paper (£): Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
Stage 4: Context
I should probably write more about putting data into context, but the two main places where I have are:
- CCTV spending by councils/how many police officers would that pay? – statistics in context
- Here are the angles journalists use most often to tell the stories in data (which explains the different contexts to consider in each)
- I’d also recommend the free ebook Data Feminism which outlines some essential contextual factors that often shape data, or Caroline Criado Perez‘s book Invisible Women .
Stage 5: Combine
A need for context is just one of the reasons why you might combine datasets. You might also need to combine datasets as part of cleaning (to fill gaps in the data), to expand the timescale of a story, or to explore relationships.
The most common way of doing so is a spreadsheet function called VLOOKUP (or, increasingly, XLOOKUP). Last month I published this extract from my book Finding Stories In Spreadsheets on this process which walks through combining two datasets, and includes an embedded video walkthrough.
Questioning (at every stage)
Questioning involves analysing your data to get answers — but it also takes place in every stage:
- When conceiving ideas, question your biases and blind spots. Whose voices are missing from the idea development process? Are you only getting ideas from public datasets — could you also consider unpublished data or compiling data yourself? Are you focusing on the public sector but not the private sector? Do you tend to generate particular types of story angles more than others (such as change rather than scale)?
- When compiling data, question how authoritative the source is, and how reliable their methodology is. This will affect how you communicate the results, and what further compiling, cleaning and contextualising needs to be done.
- Ask questions about what context might be needed: populations are often needed to put the number of events into the context of how many events there were per person. Demographic context allows you to ask questions about potential relationships with deprivation, age, and other factors. Historical data allows you to put recent data into the context of previous years. Data about money will need the context of inflation: what would a certain figure five years ago be equivalent to now?
- Ask questions about what data you are combining: should you use the whole population or a particular age group or other demographic that relates to the story? Should you use overall inflation or focus on a particular type of goods or services?
Here are some posts that particularly relate to questioning:
- VIDEO: The 3 chords of data journalism
- A journalist’s introduction to network analysis
- The 7 habits of successful journalists: how do you develop scepticism?
- How to prevent confirmation bias affecting your journalism
- A journalist’s guide to cognitive bias (and how to avoid it)
Stage 6: Communicate
The ‘communicate’ stage of journalism can go in a number of directions, from data visualisation to TV, and from short news updates to longform narrative journalism. Here are posts where I’ve explored a particular dimension of the storytelling stage, and other useful resources.
Visualise
- Visualising data – charts and graphs (book chapter draft)
- VIDEO: Mapping for data journalists
- When to use maps in data visualisation: a great big guide (and part two)
- 8 easy tips to avoid bad visualisation (by Steve Carufel)
- How to: create a treemap in Tableau
Narrate
- Here are the angles journalists use most often to tell the stories in data
- Longform writing: how to write a beginning to hook the reader
- Longform writing: how to avoid the ‘saggy middle’ — and end strongly
- Data + Journalism: A Story-Driven Approach to Learning Data Reporting
- The Data Storytelling Workbook
Humanise
- Data journalism in broadcast news and video: 27+ examples to inspire and educate
- Data journalism on radio, audio and podcasts
- Tim Harford on telling data stories with audio: “You need to keep simplifying” (by Niels de Hoog)
- One ambassador’s embarrassment is a tragedy, 15,000 civilian deaths is a statistic
Personalisation
- VIDEO: Genres of interactivity: from ergodic storytelling to games
- VIDEO: How concepts of interactivity can help you with storytelling ideas
Audiolise/physicalise
- Data journalism on radio, audio and podcasts
- Leveraging music to help people understand data (by Ion Mates)
- Data Sonification Toolkit for Journalists, Information Designers, and Communicators
- Loud Numbers
- Making with Data: Physical Design and Craft in a Data-Driven World
- Data Driven News Installations: A digital fabrication cookbook for journalists
- Data Physicalization wiki
- What a yarn! Journalists are turning to crochet to tell data stories
Utilise
- VIDEO: JavaScript Journalism and interactivity
- 3 more angles most often used to tell data stories: explorers, relationships and bad data stories
What resources do you recommend?
Those are just some of the resources I’ve written or come across — please let me know in the comments or on X/LinkedIn/etc. of any that you’d recommend.
