Tag Archives: data journalism

How the BBC England data unit scraped airport noise complaints

heathrow-noise-story

This news story used scraping to gather data on noise complaints

BBC England Data Unit’s Daniel Wainwright tried to explain basic web scraping at this year’s Data Journalism Conference but technical problems got in the way. This is what should have happened:

I’d wondered for a while why no-one who had talked about scraping at conferences had actually demonstrated the procedure. It seemed to me to be one of the most sought-after skills for any investigative journalist.

Then I tried to do so myself in an impromptu session at the first Data Journalism Conference in Birmingham (#DJUK16) and found out why: it’s not as easy as it’s supposed to look.

To anyone new to data journalism, a scraper is as close to magic as you get with a spreadsheet and no wand. Continue reading

Those Android Trump tweets: David Robinson on using text data to get an election scoop

Washington Post story tweet

Data scientist David Robinson was behind one of the most striking data stories of this US election season, when his analysis of Donald Trump tweets appeared to confirm that Trump was posting the angriest comments on that account (jointly managed by his campaign staff). Barbara Maseda spoke to Robinson about the story behind that text analysis and what comes next. 

It was August 9 when David Robinson published his analysis of Trump tweets on his blog. Robinson had used a series of libraries in the programming language R to collect, clean, process and visualise the data. The process took just 12 hours, from Saturday night through Tuesday morning.

In the following days, the piece would be re-posted and cited by multiple websites, including The Washington Post and Mashable. The original piece alone had hundreds of thousands of views in just a few days.

The result wasn’t just one election story, but one of the biggest indications yet of the potential of text analysis for journalists, with three takeaways in particular: Continue reading

I’m organising a data journalism conference – you should come

Data Journalism UK 2016In just over 4 weeks I’ll be holding a day of workshops and industry panels for aspiring and working data journalists across the UK. Want to come?

Data Journalism UK 2016, in Birmingham on November 22, will be focusing on the latest wave of regional data journalism projects, from the data journalists at Trinity Mirror and BBC Scotland to startups like Northern Ireland’s The Detail and winners of Google Digital News Initiative funding Talk About Local’s News Engine and the Bureau of Investigative Journalism.

I’m particularly pleased to have one of the most experienced data journalists in the country, Claire Miller, speaking too.

claire-miller

Claire Miller, author of the book Getting Started with Data Journalism

The event will mix industry speakers and experts with practical sessions: there’ll be drop-in sessions on getting started with data journalism, an information security ‘surgery’, and some speakers have been asked to focus on practical skills too.

On top of all that, attendees will have the opportunity to nominate skills they want to learn – we’ll put on workshops for the most popular topics!

You can sign up for the event here, and tell me what sessions you want covered on Twitter @paulbradshaw

The event is being jointly sponsored by the University of Stirling and Birmingham City University.

Crowdsourcing investigative journalism at Convoca: “Our aim is create a community network not just in Peru, but global”

convoca

After winning two prestigious data journalism awards since launching in 2015, the Peruvian medium Convoca has launched its first crowdsourcing campaign to build a global community around its investigations. Nuria Riquelme spoke to founder Aramis Castro about the project.

Convoca has become a reference point for data journalism in South America. With a team of around ten people including system engineers, computer technicians and journalists, led by Milagros Salazar, a professional with over 15 years journalistic experience, they have pioneered data journalism in Peru. Continue reading

Data journalism’s commissioning problem

Square peg in a round hole

Data journalism is still a square peg in a round hole when it comes to commissioning. Image by Yoel Ben-Avraham

Peter Yeung has a good point: why is it so difficult to get editors to pay for data journalism?

In a series of tweets we tried to find some answers.

Firstly, commissioning isn’t set up for data journalism. Editors instead try to fit it into established structures for commissioning text-based news and features, with the result that:

a) The pricing doesn’t reflect the work involved; and

b) Any interactivity and visuals become incidental to the process instead of integral.

And yet the value of data journalism has been repeatedly proven, and organisations are spending money on it: just not on commissioning. As Yeung added:

“I find it strange publications invest in data editors and journalists, but not data budgets”

The FT’s Martin Stabe suspected it wasn’t just a data journalism problem:

“This probably extends to lots of digital-only content, not just data journalism.”

A related problem is the lack of standardisation in data journalism: there is no equivalent to the payment by wordcount which print journalists have so long worked by.

Instead, organisations ‘insource‘ data journalism work to internal teams, either data teams or ad hoc teams formed from existing personnel (think the MPs’ expenses or Wikileaks investigations…

…Or they ‘outsource‘ data journalism work to external agencies etc.

This is a problem also highlighted by Alfred Hermida in his research into Canadian data journalism, ‘Finding the Data Unicorn‘: only one job title showed up four times “and that was the general reporter/journalist category.”

That’s our take. What about yours? Why isn’t data journalism properly commissioned? And how do freelance data journalists get work?

Related:

My latest data journalism ebook is now finished

Data journalism book Stories with SpreadsheetsMy third data journalism ebook, Finding Stories With Spreadsheets, is now finished. It’s a book which covers a wide range of spreadsheet techniques from basic calculations like proportions through to techniques for merging datasets, looking for errors and working with dates.

I’ve tried to cover all the functions used most commonly within data journalism, including some specific to Google Sheets, but if you know of any that aren’t mentioned, or have a problem which isn’t solved by the book, I’d love to know.

Likewise, many chapters have sample datasets to try the techniques out, but I’m always on the lookout for particularly illustrative datasets or examples.

I’ll continue to add to and update the book (one of the reasons I publish with Leanpub) as I come across new techniques and examples. Let me know if you want me to add anything.

What I learned at Jan Willem Tulp’s workshop at Tutki! 2016/NODA16

Jan Willem Tulp

Jan Willem Tulp’s workshop

In a guest post first published on her blog, Maria Crosas Batista sums up the key takeaways from a session at the Nordic investigative journalism conference Tutki! 2016 by Jan Willem Tulp, the data experience designer behind Tulp Interactive.

Continue reading