Tag Archives: scraping

How the BBC England data unit scraped airport noise complaints

heathrow-noise-story

This news story used scraping to gather data on noise complaints

BBC England Data Unit’s Daniel Wainwright tried to explain basic web scraping at this year’s Data Journalism Conference but technical problems got in the way. This is what should have happened:

I’d wondered for a while why no-one who had talked about scraping at conferences had actually demonstrated the procedure. It seemed to me to be one of the most sought-after skills for any investigative journalist.

Then I tried to do so myself in an impromptu session at the first Data Journalism Conference in Birmingham (#DJUK16) and found out why: it’s not as easy as it’s supposed to look.

To anyone new to data journalism, a scraper is as close to magic as you get with a spreadsheet and no wand. Continue reading

ScraperWiki has rediscovered its old free scraping tool – and is now calling it QuickCode

A screenshot from before the 2013 relaunch of Scraperwiki

A screenshot from before the 2013 relaunch of Scraperwiki

7 years ago ScraperWiki launched with a plan to make scraping accessible to a wider public. It did this by creating an online space where people could easily write and run scrapers; and by making it possible to read and adapt scrapers written by other users (the ‘wiki’ part).

I loved it. The platform inspired me to learn Python, write Scraping for Journalists, and has been part of my journalism workflow since. Continue reading

How one Mexican data team uncovered the story of 4,000 missing women

4534

by Maria Crosas Batista

Mexican newspaper El Universal has put a face to the 4,534 women who have gone missing in Mexico City and the State of Mexico over the last decade: Ausencias Ignoradas (Ignored Absences) aims to put pressure on the government and eradicate this situation.

Daniela Guazo, from the data journalism team, explains how they gathered the data and presented the information not as numbers but as close people: Continue reading

Create your own Instagram/Facebook/Twitter API with Google Drive and IFTTT

Skateboarding images

My Birmingham City University colleague Nick Moreton has a neat little hack for connecting a JavaScript app to social media accounts by combining the automation tool IFTTT, and Google Drive. As he explains:

“Most of the big web apps provide their API in JSON format (Facebook, Twitter, Instagram) however, as you may know if you’ve ever tried to use these, they often require an OAuth login in order to access the API.”

Continue reading

Over 1000 journalists are now exploring scraping techniques. Incredible.

Scraping for Journalists book coverLast week the number of people who have bought my ebook Scraping for Journalists passed the 1,000 mark. That is, to me, incredible. A thousand journalists interested enough in scraping to buy a book? What happened?

When I first began writing the book I imagined there might be perhaps 100 people in the world who would be interested in buying it. It was such a niche subject I didn’t even consider pitching it to my normal publishers.

Now it’s so mainstream that the 1000th ‘book’ was actually 12: purchased by a university which wanted multiple copies for its students to borrow – one of a number of such institutions to approach me to do so.  Continue reading

FAQ: Big data and journalism

The latest in the series of Frequently Asked Questions comes from a UK student, who has questions about big data.

How can data journalists make sense of such quantities of data and filter out what’s meaningful?

In the same way they always have. Journalists’ role has always been to make choices about which information to prioritise, what extra information they need, and what information to include in the story they communicate. Continue reading

Training: scraping in the Netherlands

Scraping for Journalists ebookI’m delivering a course in scraping in Utrecht in the Netherlands on April 2. The booking page with more details about location etc is here – a broad breakdown below:

  • Scraping for journalism: ideas and examples
  • Scraping basics: finding structure in HTML and URLs; what’s possible with programming
  • Simple scraping jobs: how to write a basic scraper in 5 minutes
  • Scraping tools: Outwit Hub and Import.io
  • How to scrape dozens of public webpages
  • Scraping databases with empty searches
  • How to understand scrapers on Scraperwiki: Scraping PDFs, lists of URLs, and databases with specific searches