Tag Archives: scraping

How do I get data if my country doesn’t publish any?

Spotlight photo by Paul Green on Unsplash

In many countries public data is limited, and access to data is either restricted, or information provided by the authorities is not credible. So how do you obtain data for a story? Here are some techniques used by reporters around the world.

Continue reading →

I’ve updated the Inverted Pyramid of Data Journalism — and brought together resources for every stage

Leave a reply

Inverted pyramid of data journalism: conceive, compile, clean, context, combine (with 'question' throughout). Communicate: vis, narrate, humanise, personalise, audiolise/materialise, utilise

It’s over a decade since I published the Inverted Pyramid of Data Journalism. The model has been translated into multiple languages, taught all over the world, and included in a number of books and research papers. But in that time the model has also developed and changed through discussion and teaching, so here’s a round up of everything I’ve written or recommended on the different stages — along with a revised model in English (shown above; versions have been published in German, Spanish, Finnish, Russian and Ukrainian).

The most basic change to the Inverted Pyramid of Data Journalism is the recognition of a stage that precedes all others — idea generation — labelled ‘Conceive’ in the diagram above.

This is often a major stumbling block to people starting out with data journalism, and I’ve written a lot about it in recent years (see below for a full list).

The second major change is to make questioning more explicit as a process that (should) take place through all stages — not just in data analysis but in the way we question our sources, our ideas, and the reliability of the data itself.

A third change is to remove the ‘socialise‘ option from the communication pyramid: in conversation with Alexandra Stark I realised that this is covered sufficiently by the ‘utilise’ stage (i.e. making something useful socially).

Replacing that is a new communication option — in fact, two: audiolise and physicalise. This recognises the emergence of sonification as a method of communicating data, and physical methods of representing data from crochet to art installations.

Alongside the updated pyramid I’ve been using for the past few years I also wanted to round up links to a number of resources that relate to each stage. Here they are…

Continue reading →

VIDEO PLAYLIST: An introduction to Python for data journalism and scraping

2 Replies

Python is an extremely powerful language for journalists who want to scrape information from online sources. This series of videos, made for students on the MA in Data Journalism at Birmingham City University, explains some core concepts to get started in Python, how to use Colab notebooks within Google Drive, and introduces some code to get started with scraping.

Continue reading →

I’m delivering a 3 day workshop on scraping for journalists in January

Here’s the thinking behind my new MA in Data Journalism

3 Replies

Cogs image by Stuart Madeley

A few weeks ago I announced that I was launching a new MA in Data Journalism, and promised that I would write more about the thinking behind it. Here, then, are some of the key ideas underpinning the new course — from coding and storytelling to security and relationships with industry — and how they have informed its development. Continue reading →

Data journalism in broadcast news and video: 27+ examples to inspire and educate

7 Replies

This network diagram comes from a Channel 4 News story

The best-known examples of data journalism tend to be based around text and visuals — but it’s harder to find data journalism in video and audio. Ahead of the launch of my new MA in Data Journalism I thought I would share my list of the examples of video data journalism that I use with students in exploring data storytelling across multiple platforms. If you have others, I’d love to hear about them.

FOI stories in broadcast journalism

victoria derbyshire gif

Freedom of Information stories are one of the most common situations when broadcasters will have to deal with more in-depth data. These are often brought to life by through case studies and interviewing experts. Continue reading →

How the BBC England data unit scraped airport noise complaints

3 Replies

This news story used scraping to gather data on noise complaints

BBC England Data Unit’s Daniel Wainwright tried to explain basic web scraping at this year’s Data Journalism Conference but technical problems got in the way. This is what should have happened:

I’d wondered for a while why no-one who had talked about scraping at conferences had actually demonstrated the procedure. It seemed to me to be one of the most sought-after skills for any investigative journalist.

Then I tried to do so myself in an impromptu session at the first Data Journalism Conference in Birmingham (#DJUK16) and found out why: it’s not as easy as it’s supposed to look.

To anyone new to data journalism, a scraper is as close to magic as you get with a spreadsheet and no wand. Continue reading →

ScraperWiki has rediscovered its old free scraping tool – and is now calling it QuickCode

4 Replies

A screenshot from before the 2013 relaunch of Scraperwiki

7 years ago ScraperWiki launched with a plan to make scraping accessible to a wider public. It did this by creating an online space where people could easily write and run scrapers; and by making it possible to read and adapt scrapers written by other users (the ‘wiki’ part).

I loved it. The platform inspired me to learn Python, write Scraping for Journalists, and has been part of my journalism workflow since. Continue reading →

How one Mexican data team uncovered the story of 4,000 missing women

7 Replies

4534

by Maria Crosas Batista

Mexican newspaper El Universal has put a face to the 4,534 women who have gone missing in Mexico City and the State of Mexico over the last decade: Ausencias Ignoradas (Ignored Absences) aims to put pressure on the government and eradicate this situation.

Daniela Guazo, from the data journalism team, explains how they gathered the data and presented the information not as numbers but as close people: Continue reading →

Create your own Instagram/Facebook/Twitter API with Google Drive and IFTTT

2 Replies

My Birmingham City University colleague Nick Moreton has a neat little hack for connecting a JavaScript app to social media accounts by combining the automation tool IFTTT, and Google Drive. As he explains:

“Most of the big web apps provide their API in JSON format (Facebook, Twitter, Instagram) however, as you may know if you’ve ever tried to use these, they often require an OAuth login in order to access the API.”

Continue reading →

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.

Tag Archives: scraping

How do I get data if my country doesn’t publish any?

I’ve updated the Inverted Pyramid of Data Journalism — and brought together resources for every stage

VIDEO PLAYLIST: An introduction to Python for data journalism and scraping

I’m delivering a 3 day workshop on scraping for journalists in January

Here’s the thinking behind my new MA in Data Journalism

Data journalism in broadcast news and video: 27+ examples to inspire and educate

FOI stories in broadcast journalism

How the BBC England data unit scraped airport noise complaints

ScraperWiki has rediscovered its old free scraping tool – and is now calling it QuickCode

How one Mexican data team uncovered the story of 4,000 missing women

by Maria Crosas Batista

Create your own Instagram/Facebook/Twitter API with Google Drive and IFTTT