Tag Archives: extract

There’s more than one way to make an impact with data journalism (book extract)

In an extended extract from the forthcoming second edition of the Data Journalism Handbook, I look at the different types of impact that data journalism can have, and how can better think about it.

If you’ve not seen Spotlight, the film about the Boston Globe’s investigation into institutional silence over child abuse, then you should watch it right now. More to the point — you should watch right through to the title cards right at the end.

In an epilogue to the film — this is a story about old-school-style data journalism, by the way — a list scrolls down the screen. It details the dozens and dozens of places where abuse scandals have been uncovered since the events of the film, from Akute, Nigeria, to Wollongong, Australia.

But the title cards also cause us to pause in our celebrations: one of the key figures involved in the scandal, it says, was reassigned to “one of the highest ranking Roman Catholic churches in the world.”

This is the challenge of impact in data journalism: is raising awareness of a problem “impact”? A mass audience, a feature film? Does the story have to result in penalties for those responsible for bad things? Or visible policy change? Is all impact good impact? Continue reading →

The next wave of data journalism?

12 Replies

Image CC BY-ND 2.0 by Salvatore Vastano.

In the first of three expanded extracts from a forthcoming book chapter on ‘The next wave of data journalism’ I outline some of the ways that data journalism is reinventing itself, and adapting for a world which is rapidly changing again. Where networked communications and processing power were key in the 2000s, automation and AI are becoming key in the decade to come. And just as data journalism raised the bar for journalism as a whole, the bar is about to be raised for data journalism itself.

Data journalism isn’t doing enough. Now into its second decade, the noughties-era technologies that it was built on – networked access to information and vastly improving visualisation capabilities – are now taken for granted, just as the ‘computer assisted’ part of its antecedent Computer Assisted Reporting was.

In just ten years data journalism has settled down into familiar practices and genres, from the interactive map and giant infographics to the quick turnaround “Who comes bottom in the latest dataset” write-up. It’s a sure sign of maturity when press officers are sending you data journalism-based media releases.

Now we need to move forward. And the good news is: there are plenty of places to go. Continue reading →

Scraping using regular expressions in OutWit Hub – part 2: special characters, negative matches and more

13 Replies

Image by Lasse Havelund

In the second part of this extract from Chapter 10 of Scraping for Journalists I recap the basics before discussing techniques to use in looking for patterns in data, and how regex can deal with non-textual characters such as spaces and carriage returns, special characters such as backslashes, and ‘negative matches’. You can find the first part here.

Continue reading →

How-to: Scraping ugly HTML using ‘regular expressions’ in an OutWit Hub scraper

11 Replies

Regular Expressions cartoon from xkcd

The following is the first part of an extract from Chapter 10 of Scraping for Journalists. It introduces a particularly useful tool in scraping – regex – which is designed to look for ‘regular expressions’ such as specific words, prefixes or particular types of code. I hope you find it useful.

This tutorial will show you how to scrape a particularly badly formatted piece of data. In this case, the UK Labour Party’s publication of meetings and dinners with donors and trade union general secretaries.

To do this, you’ll need to install the free scraping tool OutWit Hub. Regex can be used in other tools and programming as well, but this tool is a good way to learn it without knowing any other programming. Continue reading →

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.