Tag Archives: extract

The next wave of data journalism?

In the first of three expanded extracts from a forthcoming book chapter on ‘The next wave of data journalism’ I outline some of the ways that data journalism is reinventing itself, and adapting for a world which is rapidly changing again. Where networked communications and processing power were key in the 2000s, automation and AI are becoming key in the decade to come. And just as data journalism raised the bar for journalism as a whole, the bar is about to be raised for data journalism itself.

Data journalism isn’t doing enough. Now into its second decade, the noughties-era technologies that it was built on – networked access to information and vastly improving visualisation capabilities – are now taken for granted, just as the ‘computer assisted’ part of its antecedent Computer Assisted Reporting was.

In just ten years data journalism has settled down into familiar practices and genres, from the interactive map and giant infographics to the quick turnaround “Who comes bottom in the latest dataset” write-up. It’s a sure sign of maturity when press officers are sending you data journalism-based media releases.

Now we need to move forward. And the good news is: there are plenty of places to go. Continue reading

Advertisements

Scraping using regular expressions in OutWit Hub – part 2: special characters, negative matches and more

Regular Expressions slogan t-shirt

Image by Lasse Havelund

In the second part of this extract from Chapter 10 of Scraping for Journalists I recap the basics before discussing techniques to use in looking for patterns in data, and how regex can deal with non-textual characters such as spaces and carriage returns, special characters such as backslashes, and ‘negative matches’. You can find the first part here.

 

Continue reading

How-to: Scraping ugly HTML using ‘regular expressions’ in an OutWit Hub scraper

Regular Expressions cartoon on xkcd

Regular Expressions cartoon from xkcd

The following is the first part of an extract from Chapter 10 of Scraping for Journalists. It introduces a particularly useful tool in scraping – regex – which is designed to look for ‘regular expressions’ such as specific words, prefixes or particular types of code. I hope you find it useful. 

This tutorial will show you how to scrape a particularly badly formatted piece of data. In this case, the UK Labour Party’s publication of meetings and dinners with donors and trade union general secretaries.

To do this, you’ll need to install the free scraping tool OutWit Hub. Regex can be used in other tools and programming as well, but this tool is a good way to learn it without knowing any other programming. Continue reading