Read Tony Hirst’s story of taking his dog for a walk – and then tell me your readers can’t help you do better journalism.
Category Archives: data journalism
7 laws journalists now need to know – from database rights to hate speech
When you start publishing online you move from the well-thumbed areas of defamation and libel, contempt of court and privilege and privacy to a whole new world of laws and licences.
This is a place where laws you never knew existed can be applied to your work – while other ones can come in surprisingly useful. Here are the key ones:
Scraping using regular expressions in OutWit Hub – part 2: special characters, negative matches and more
In the second part of this extract from Chapter 10 of Scraping for Journalists I recap the basics before discussing techniques to use in looking for patterns in data, and how regex can deal with non-textual characters such as spaces and carriage returns, special characters such as backslashes, and ‘negative matches’. You can find the first part here.
The US election was a wake up call for data illiterate journalists
So Nate Silver won in 50 states; big data was the winner; and Nate Silver and data won the election. And somewhere along the lines some guy called Obama won something, too.
Elections set the pace for much of journalism’s development: predictable enough to allow for advance planning; big enough to justify the budgets to match, they are the stage on which news organisations do their growing up in public.
For most of the past decade, those elections have been about social media: the YouTube election; the Facebook election; the Twitter election. This time, it wasn’t about the campaigning (yet) so much as it was about the reporting. And how stupid some reporters ended up looking. Continue reading
How-to: Scraping ugly HTML using ‘regular expressions’ in an OutWit Hub scraper
The following is the first part of an extract from Chapter 10 of Scraping for Journalists. It introduces a particularly useful tool in scraping – regex – which is designed to look for ‘regular expressions’ such as specific words, prefixes or particular types of code. I hope you find it useful.
This tutorial will show you how to scrape a particularly badly formatted piece of data. In this case, the UK Labour Party’s publication of meetings and dinners with donors and trade union general secretaries.
To do this, you’ll need to install the free scraping tool OutWit Hub. Regex can be used in other tools and programming as well, but this tool is a good way to learn it without knowing any other programming. Continue reading
Data alone isn’t enough – Tim Davies on “complexity and complementarity”
If people aren’t using data it isn’t just a problem for web developers – it’s a problem for journalists too. If not enough people are looking at information on crime, politics, health, education, or welfare then it makes our work harder.
On that subject, Tim Davies writes about the challenges of ‘getting data used’ and the inclination to focus on data-centric solutions. “Data quality, poor meta-data, inaccessible language, and the difficulty of finding wheat amongst the chaff of data were all diagnosed [at one hack day] as part of the problem,” he reports. “Yet these diagnosis and solutions are still based on linear thinking: when a dataset is truly accessible, then it will be used, and economic benefits will flow. Continue reading
Data visualisation training
If you’re interested in data visualisation I’m delivering a training course on November 7 with the excellent Caroline Beavon. Here’s what we’re covering:
- Pick the right chart for your story – against a deadline
- Mapping tricks and techniques: using Fusion Tables and other tools to map Olympic torchbearers
- Picking the right data to visualise
- Visualisation tips for free chart tools
- Avoiding common visualisation mistakes
- Create an infographic with Tableau and Illustrator
- Making data interactive
More details here. Places can be booked here.
5 principles of data management – for both analytics and data journalism
Whether you’re working with analytics data on your site or data for a story, it strikes me that certain principles apply to both.
At the PPA’s Digital Publishing Conference recently I talked about 5 of those. Here’s the rundown: Continue reading
Data Journalism Handbook now in Russian
The Data Journalism Handbook, a free ebook with contributions from dozens of data journalism folk (including me), is now available in Russian.
The translation was “produced and published by the Russian International News & Information Agency (RIA Novosti) with support from the European Journalism Centre”, according to an EJC press release.
Q&A: 5 questions about the pros and cons of data journalism (Cross-post)
The following Q&A is cross-posted from a post on the Media And Digital Enterprise project of the School of Journalism, Media and Communication at the University of Central Lancashire.
Why do journalists need to learn data skills?
For two key reasons: firstly because information is more widely available, and data skills are one of the few remaining ways for journalists to establish their value in that environment.
And secondly, because data is becoming a very important source of both news and the business case for media organisations. Continue reading



