“Most of the big web apps provide their API in JSON format (Facebook, Twitter, Instagram) however, as you may know if you’ve ever tried to use these, they often require an OAuth login in order to access the API.”
Last week the number of people who have bought my ebook Scraping for Journalists passed the 1,000 mark. That is, to me, incredible. A thousand journalists interested enough in scraping to buy a book? What happened?
When I first began writing the book I imagined there might be perhaps 100 people in the world who would be interested in buying it. It was such a niche subject I didn’t even consider pitching it to my normal publishers.
Now it’s so mainstream that the 1000th ‘book’ was actually 12: purchased by a university which wanted multiple copies for its students to borrow – one of a number of such institutions to approach me to do so. Continue reading
The latest in the series of Frequently Asked Questions comes from a UK student, who has questions about big data.
How can data journalists make sense of such quantities of data and filter out what’s meaningful?
In the same way they always have. Journalists’ role has always been to make choices about which information to prioritise, what extra information they need, and what information to include in the story they communicate. Continue reading
I’m delivering a course in scraping in Utrecht in the Netherlands on April 2. The booking page with more details about location etc is here – a broad breakdown below:
- Scraping for journalism: ideas and examples
- Scraping basics: finding structure in HTML and URLs; what’s possible with programming
- Simple scraping jobs: how to write a basic scraper in 5 minutes
- Scraping tools: Outwit Hub and Import.io
- How to scrape dozens of public webpages
- Scraping databases with empty searches
- How to understand scrapers on Scraperwiki: Scraping PDFs, lists of URLs, and databases with specific searches
This is the third in a series of extracts from a draft book chapter on ethics in data journalism. The first looked at how ethics of accuracy play out in data journalism projects, and the second at culture clashes, privacy, user data and collaboration. This is a work in progress, so if you have examples of ethical dilemmas, best practice, or guidance, I’d be happy to include it with an acknowledgement.
Automated mapping of data – ChicagoCrime.org – image from Source
Mass data gathering – scraping, FOI, deception and harm
The data journalism practice of ‘scraping’ – getting a computer to capture information from online sources – raises some ethical issues around deception and minimisation of harm. Some scrapers, for example, ‘pretend’ to be a particular web browser, or pace their scraping activity more slowly to avoid detection. But the deception is practised on another computer, not a human – so is it deception at all? And if the ‘victim’ is a computer, is there harm? Continue reading
This is the final part of a series of blog posts. The first explains how using feeds and social bookmarking can make for a quicker data journalism workflow. The second looks at how to anticipate and prevent problems; and how collaboration can improve data work.
Workflow tip 5. Think like a computer
The final workflow tip is all about efficiency. Computers deal with processes in a logical way, and good programming is often about completing processes in the simplest way possible.
If you have any tasks that are repetitive, break them down and work out what patterns might allow you to do them more quickly – or for a computer to do them. Continue reading
Last night I published the final chapter of my first ebook: Scraping for Journalists. Since I started publishing it in July, over 40 ‘versions’ of the book have been uploaded to Leanpub, a platform that allows users to receive updates as a book develops – but more importantly, to input into its development.
I’ve been amazed at the consistent interest in the book – last week it passed 500 readers: 400 more than I ever expected to download it. Their comments have directly shaped, and in some cases been reproduced in, the book – something I expect to continue (I plan to continue to update it).
As a result I’ve become a huge fan of this form of ebook publishing, and plan to do a lot more with it (some hints here and here). The format combines the best qualities of traditional book publishing with those of blogging and social media (there’s a Facebook page too).
Meanwhile, there’s still more to do with Scraping for Journalists: publishing to other platforms and in other languages for starters… If you’re interested in translating the book into another language, please get in touch.