Monthly Archives: November 2012

Yet Another Leveson Pundit conversation

Here’s my attempt to capture some of the more interesting exchanges about the Leveson Report on Twitter yesterday.

Here’s my attempt to capture some of the more interesting exchanges about the Leveson Report on Twitter yesterday.

http://storify.com/paulbradshaw/yet-another-leveson-pundit-conversation

7 laws journalists now need to know – from database rights to hate speech

Law books image by Mr T in DC

Image by Mr T in DC

When you start publishing online you move from the well-thumbed areas of defamation and libel, contempt of court and privilege and privacy to a whole new world of laws and licences.

This is a place where laws you never knew existed can be applied to your work – while other ones can come in surprisingly useful. Here are the key ones:

Continue reading

Live Blogs outperform other online news formats by up to 300%

 

Time Spent on Live Blogs

Comparison of time spent on a selection of Live Blogs, articles, and picture galleries at Guardian.co.uk, March to May 2011

In a guest post for OJB, Neil Thurman highlights a new research report that suggests that Live Blogs outperform other online news formats by up to 300% and are seen by readers as more transparent, trusted, and ‘factual’ than conventional online news stories.

Continue reading

Schofield’s list, the mob and a very modern moral panic

Someone, somewhere right now will be writing a thesis, dissertation or journal paper about the very modern moral panic playing out across the UK media.

What began as a story about allegations of sexual abuse by TV and radio celebrity Jimmy Savile turned into a story about that story being covered up, into how the abuse could take place (at the BBC too, in the 1970s, but also in hospitals and schools), then into wider allegations of a paedophile ring involving politicians.

Continue reading

Scraping using regular expressions in OutWit Hub – part 2: special characters, negative matches and more

Regular Expressions slogan t-shirt

Image by Lasse Havelund

In the second part of this extract from Chapter 10 of Scraping for Journalists I recap the basics before discussing techniques to use in looking for patterns in data, and how regex can deal with non-textual characters such as spaces and carriage returns, special characters such as backslashes, and ‘negative matches’. You can find the first part here.

 

Continue reading

The US election was a wake up call for data illiterate journalists

So Nate Silver won in 50 states; big data was the winner; and Nate Silver and data won the election. And somewhere along the lines some guy called Obama won something, too.

Elections set the pace for much of journalism’s development: predictable enough to allow for advance planning; big enough to justify the budgets to match, they are the stage on which news organisations do their growing up in public.

For most of the past decade, those elections have been about social media: the YouTube election; the Facebook election; the Twitter election. This time, it wasn’t about the campaigning (yet) so much as it was about the reporting. And how stupid some reporters ended up looking. Continue reading

How-to: Scraping ugly HTML using ‘regular expressions’ in an OutWit Hub scraper

Regular Expressions cartoon on xkcd

Regular Expressions cartoon from xkcd

The following is the first part of an extract from Chapter 10 of Scraping for Journalists. It introduces a particularly useful tool in scraping – regex – which is designed to look for ‘regular expressions’ such as specific words, prefixes or particular types of code. I hope you find it useful. 

This tutorial will show you how to scrape a particularly badly formatted piece of data. In this case, the UK Labour Party’s publication of meetings and dinners with donors and trade union general secretaries.

To do this, you’ll need to install the free scraping tool OutWit Hub. Regex can be used in other tools and programming as well, but this tool is a good way to learn it without knowing any other programming. Continue reading

Data alone isn’t enough – Tim Davies on “complexity and complementarity”

If people aren’t using data it isn’t just a problem for web developers – it’s a problem for journalists too. If not enough people are looking at information on crime, politics, health, education, or welfare then it makes our work harder.

On that subject, Tim Davies writes about the challenges of ‘getting data used’ and the inclination to focus on data-centric solutions. “Data quality, poor meta-data, inaccessible language, and the difficulty of finding wheat amongst the chaff of data were all diagnosed [at one hack day] as part of the problem,” he reports. “Yet these diagnosis and solutions are still based on linear thinking: when a dataset is truly accessible, then it will be used, and economic benefits will flow. Continue reading