VIDEO PLAYLIST: An introduction to Python for data journalism and scraping

Python is an extremely powerful language for journalists who want to scrape information from online sources. This series of videos, made for students on the MA in Data Journalism at Birmingham City University, explains some core concepts to get started in Python, how to use Colab notebooks within Google Drive, and introduces some code to get started with scraping.

Continue reading

Availability bias: a guide for journalists

Diagram showing a large circle labelled 'All the information about a subject' and a smaller circle within that labelled 'The information that is easiest to recall'

I’ve written previously about the role that cognitive biases play in journalism, how to avoid confirmation bias, and anticipate criticism based on fallacies — but one cognitive bias I haven’t written about yet is the availability heuristic — or availability bias.

Availability bias is the tendency to reach for the most available reason, event, or tool, when confronted with a problem or decision.

Continue reading

Here’s how the ‘8 data story angles’ can help you get stories from company accounts

8 common angles for accounts stories
Scale: of profit/loss, of bonuses, payoffs, cuts
Change/stasis: profit/loss/bonuses going up/down
Outliers/ranking: based on any metric
Variation: within a sector
Exploration: a company structure; a director; payments
Relationships: mapping a corporate network or director’s interests
Bad data: Undeclared interests
Leads: Background, conflicts of interest, factchecks

A couple of years ago I mapped out eight common angles for identifying stories in data. It turns out that the same framework is useful for finding stories in company accounts, too — but not only that: the angles also map neatly onto three broad techniques.

In this post I’ll go through each of the three techniques — looking at cash flow statements; compiling data from multiple accounts; and tracing people and connections — and explain how they can be used to get stories, with examples of articles that have used those techniques successfully.

We start, naturally, with the money…

Continue reading

9 способов найти историю в финансовых отчётах компаний

Моя статья на русском здесь.

This is a masterclass in writing a story about company directors’ pay — so I reverse-engineered it

Owner of UK care home group paid himself £21m despite safety concerns

Company directors’ pay regularly provides material for stories — and this front page story by The Guardian’s Robert Booth was such a masterclass in the genre (as well as other open source intelligence techniques) that I decided to reverse-engineer it for a Twitter thread.

I’ve embedded the thread below, or you can read it on Threadreader here.

Using company accounts in journalism

You can find other posts about using company accounts at the following links:

How ‘triangulating’ can help you identify more sources

Triangulating journalistic sources: diagram showing that each of the three types of sources (people, data and documents) leads to the other two types of sources

In this edited extract from the forthcoming third edition of the Online Journalism Handbook I look at how a ‘triangulation’ approach to sourcing can help broaden story research and improve reporting.

Two centuries ago journalists were called reporters because they drew their information from official reports — documents.

Then in the late 19th century a new source became part of journalistic practice: people, as interviews and eyewitness accounts were added to news articles. 

The late 20th century saw reporting undergo another expansion in sourcing, as digital data was added to the journalist’s toolkit.

Although reports had included tables and other sources of data, the properties of digital data — filterable, sortable and searchable — have been significant, and make data a qualitatively different type of source.

How documents, people and data all lead to each other

Considering sourcing along those three dimensions — people, documents, and data — can be particularly useful when planning sourcing.

Continue reading

Defending an investigation — and planning one: lessons from ProPublica’s Black Snow

Sugar Companies Said Our Investigation Is Flawed and Biased. Let’s Dive Into Why That’s Not the Case.

In the summer of last year ProPublica published a major investigation into air pollution in Florida, and its connection to the sugar industry. The story itself, Black Snow, is an inspiring example of scrollytelling — but equally instructive is the methodology article which accompanies it, responding to criticisms from the sugar industry.

Not only does it demonstrate how to respond when large organisations attack a piece of journalism — it also provides a great lesson on the tactics that are adopted by organisations when attacking data-driven stories.

In this post I want to break down the three most common attack tactics, how ProPublica deal with two of those, and how to use the same tactics during planning to ensure your project design isn’t flawed.

Continue reading

What Data Journalists Need to Know About Application Programming Interfaces (APIs)

A list of APIs on the Parliament website
The UK Parliament publishes a series of APIs for political data

I’ve written a post for the Global Investigative Journalism Network about how APIs can be useful sources of data for journalists. The article is based on an earlier video post.

The article explains what APIs are and how they differ from other data sources; the basic principles of how they work and how they can be used for stories; some of the jargon to expect — and where to find them. Read the article here.

Making video and audio interviews searchable: how Pinpoint helped with one investigation

Pinpoint creates a ranking of people, organisations, and locations with the number of times they are mentioned on your uploaded documents.

MA Data Journalism student Tony Jarne spent eight months investigating exempt accommodation, collecting hundreds of documents, audio and video recordings along the way. To manage all this information, he turned to Google’s free tool Pinpoint. In a special guest post for OJB, he explains how it should be an essential part of any journalist’s toolkit.

The use of exempt accommodation — a type of housing for vulnerable people — has rocketed in recent years.

At the end of December, a select committee was set up in Parliament to look into the issue. The select committee opened a deadline, and anyone who wished to do so could submit written evidence.

Organisations, local authorities and citizens submitted more than 125 pieces of written evidence to be taken into account by the committee. Some are only one page — others are 25 pages long.

In addition to the written evidence, I had various reports, news articles, Land Registry titles an company accounts downloaded from Companies House.

I needed a tool to organise all the documentation. I needed Pinpoint

Continue reading

Here’s a framework to help fill the ‘human gap’ in your story

One of the most common challenges for student journalists is identifying the right human sources to turn a lead into a fleshed out story. And one of the most common mistakes is not to spend enough time on this vital step in the reporting process.

To help with this, here’s a framework for brainstorming potential sources.

Different types of source and potential roles in stories: matrix of 5 source categories (power; expert; representative; witness; case study) and 4 roles (action; context/colour; reaction; reply).

The five categories of source

There are five categories of source in the framework:

Continue reading