The following was written for three:d, the newsletter of MeCCSA, the Media Communications and Cultural Studies Association (PDF, page 9).
Something has happened to self-publishing over the past few years. No longer the last resort for local historians and wannabe poets, it is now a sign of entrepreneurial spirit, an alternative to the limitations of attention-starved journalism, and a way of kicking against the pricks of mainstream publishing. Self-published books have almost tripled in number over the last five years, with a number of authors making the bestseller lists. More than one in ten ebooks bought by UK readers is now self-published.
This year I finally joined that group, as I made a long-planned move away from writing for traditional publishers towards publishing my own ebooks. In fact, I published three. So what’s the appeal? Continue reading →
In the second part of this extract from Chapter 10 of Scraping for JournalistsI recap the basics before discussing techniques to use in looking for patterns in data, and how regex can deal with non-textual characters such as spaces and carriage returns, special characters such as backslashes, and ‘negative matches’. You can find the first part here.
The following is the first part of an extract from Chapter 10 of Scraping for Journalists. It introduces a particularly useful tool in scraping – regex – which is designed to look for ‘regular expressions’ such as specific words, prefixes or particular types of code. I hope you find it useful.
To do this, you’ll need to install the free scraping tool OutWit Hub. Regex can be used in other tools and programming as well, but this tool is a good way to learn it without knowing any other programming. Continue reading →
My ebook Scraping for Journalists: How to grab data from hundreds of sources, put it in a form you can interrogate – and still hit deadlines is now live.
You can buy it from Leanpub here. Leanpub allows you to publish in installments, so you get an alert every time new content is added and update your version. This means I can adapt and improve the book based on feedback from the people who use it. In other words, it’s agile publishing, which makes for a better book. (Also, I can publish at a Codecademy-like weekly pace which suits learning particularly well.)