Data Journalism Heist is designed to be a relatively short introduction to data journalism skills, demonstrating basic techniques for finding data, spotting possible stories and turning them around to a deadline.
Based on a workshop, the emphasis is on building confidence throughspeed and brevity, rather than headline-grabbing spectacular investigations or difficult datasets (I’m hoping to write a separate ebook on the latter at some point).
I have just spent 10 months publishing an ebook. Not ‘writing’, or ‘producing’, but 10 months publishing. Just as the internet helped flatten the news industry – making reporters into publishers and distributors – it has done the same to the book industry. The question I wanted to ask was: how does that change the book?
Having written books for traditional publishers before, my plunge into self-publishing was prompted when I decided I wanted to write a book for journalists about scraping: the technique of grabbing and combining information from online documents. Continue reading →
Last night I published the final chapter of my first ebook: Scraping for Journalists. Since I started publishing it in July, over 40 ‘versions’ of the book have been uploaded to Leanpub, a platform that allows users to receive updates as a book develops – but more importantly, to input into its development.
I’ve been amazed at the consistent interest in the book – last week it passed 500 readers: 400 more than I ever expected to download it. Their comments have directly shaped, and in some cases been reproduced in, the book – something I expect to continue (I plan to continue to update it).
As a result I’ve become a huge fan of this form of ebook publishing, and plan to do a lot more with it (some hints here and here). The format combines the best qualities of traditional book publishing with those of blogging and social media (there’s a Facebook page too).
Meanwhile, there’s still more to do with Scraping for Journalists: publishing to other platforms and in other languages for starters… If you’re interested in translating the book into another language, please get in touch.
The following was written for three:d, the newsletter of MeCCSA, the Media Communications and Cultural Studies Association (PDF, page 9).
Something has happened to self-publishing over the past few years. No longer the last resort for local historians and wannabe poets, it is now a sign of entrepreneurial spirit, an alternative to the limitations of attention-starved journalism, and a way of kicking against the pricks of mainstream publishing. Self-published books have almost tripled in number over the last five years, with a number of authors making the bestseller lists. More than one in ten ebooks bought by UK readers is now self-published.
This year I finally joined that group, as I made a long-planned move away from writing for traditional publishers towards publishing my own ebooks. In fact, I published three. So what’s the appeal? Continue reading →
In the second part of this extract from Chapter 10 of Scraping for JournalistsI recap the basics before discussing techniques to use in looking for patterns in data, and how regex can deal with non-textual characters such as spaces and carriage returns, special characters such as backslashes, and ‘negative matches’. You can find the first part here.
The following is the first part of an extract from Chapter 10 of Scraping for Journalists. It introduces a particularly useful tool in scraping – regex – which is designed to look for ‘regular expressions’ such as specific words, prefixes or particular types of code. I hope you find it useful.
To do this, you’ll need to install the free scraping tool OutWit Hub. Regex can be used in other tools and programming as well, but this tool is a good way to learn it without knowing any other programming. Continue reading →
My ebook Scraping for Journalists: How to grab data from hundreds of sources, put it in a form you can interrogate – and still hit deadlines is now live.
You can buy it from Leanpub here. Leanpub allows you to publish in installments, so you get an alert every time new content is added and update your version. This means I can adapt and improve the book based on feedback from the people who use it. In other words, it’s agile publishing, which makes for a better book. (Also, I can publish at a Codecademy-like weekly pace which suits learning particularly well.)