Category Archives: data journalism

Dutch regional newspapers launch data journalism project RegioHack

In a guest post for OJB, Jerry Vermanen explains the background to RegioHack

The internet is bursting with information, but journalists – at least in The Netherlands – don’t get the full potential out of it. Basic questions on what data driven journalism is, and how to practise it, still have to be answered. Two Dutch regional newspapers (de Stentor and TC Tubantia) have launched RegioHack, an experiment with data driven journalism around local issues and open data.

Both newspapers circulate in the eastern and middle part of the Netherlands. In November, journalists will collaborate with local students, programmers and open data experts in a 30 hour coding event. In preparation for this hackathon, the forum on our website (www.regiohack.nl) is opened for discussion. Anyone can start a thread for a specific problem. For example, what’s the average age of each town in our region? And in 10 years, do we have enough facilities to accommodate the future population? And if not, what do we need?

The newspapers provide the participants with hot pizza, energy drink and 30 hours to find, clean up and present the data on these subjects.

After the hackathon, the projects are presented and participants will be named in the publications. That’s what RegioHack is all about: making unique stories with data, helping each other to develop new skills and finding out how to practise data driven journalism.

If you happen to be in The Netherlands on November 10th and 11th, contact me on jerry@regiohack.nl or Twitter (@JerryVermanen) for an invite to the final presentation.

We’re also searching for guest bloggers – and yes, that can be in English.

The New Online Journalists #11: Jack Dearlove

Jack Dearlove

Reviving an ongoing series of profiles of young journalists, I interviewed Leeds university journalism student Jack Dearlove about his work in data journalism. Jack works as a BA on BBC Radio York’s Breakfast show and is also a third year Broadcast Journalism student at the University of Leeds, where he is News Editor for Leeds Student Radio.

How did you get into data journalism?

I started exploring data journalism when I saw how the Guardian was publishing stories attached to the raw spreadsheets on their guardian.co.uk/data blog. I liked the way they could bring a little extra to a story by digging up a big old spreadsheet and letting people play around with it.

I’m really a spreadsheet guy, doing the classic autofilter and then ordering things by the biggest and smallest values and slowly going down each line in the spreadsheet. This can take a while but it’s the only way you can be sure you’ve seen the whole picture.

I’d like to get into ‘scraping’ but haven’t really had the time to play around with it. But any technique that means data that I might not have naturally come across is something I’d love to get the hang of.

How do you use it in your work for the BBC?

I’ve worked for the BBC for nearly 4 years and it’s something i’ve built into my role as my job has changed. It will certainly be something that I use when it comes to future job interviews though, because hopefully it sets me apart from your standard journalist.

I think my colleagues were quite sceptical at first, but I have a very supportive and data savvy Assistant Editor who’s just as keen to use the techniques as I am. So there’s an air of curiosity, as there is in many newsrooms. Continue reading

How to use the CableSearch API to quickly reference names against Wikileaks cables (SFTW)

Cablesearch logo

CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it’s been around for some time, I’ve only just noticed the site’s API, so I thought I’d show how such an API can be useful as a way to draw on such data sources to complement data of your own. Continue reading

Gathering data: a flow chart for data journalists

Gathering data - a flow chart

Above is a flow chart that I sketched out during a long car journey to the Balkan Investigative Reporters Network Summer School in Croatia (don’t worry: I wasn’t driving).

It aims to help those doing data journalism identify how best to get hold of and deal with data by asking a series of questions about the information you want to compile and making suggestions on ways both to get hold of it and tools to then get it into a state which makes it easier to ask questions.

It also illustrates at a glance how the process of ‘getting hold of the data’ can vary widely, and how different projects can often involve completely different tools and skillsets from previous ones.

I will have missed obvious things, so please help me improve this. And if you find it useful, let me know.

Click on the image for other sizes.

Has investigative journalism found its feet online? (part 1)

Earlier this year I was asked to write a chapter for a book on the future of investigative journalism – ‘Investigative Journalism: Dead Or Alive?‘. I’m reproducing it here. The chapter was originally published on my Facebook page. An open event around the book’s launch, with a panel discussion, is being held at the Frontline Club next month.

We may finally be moving past the troubled youth of the internet as a medium for investigative journalism. For more than a decade observers looked at this ungainly form stumbling its way around journalism, and said: “It will never be able to do this properly.”

They had short memories, of course. Television was an equally awkward child: the first news broadcast was simply a radio bulletin on a black screen, and for decades print journalists sneered at the idea that this fleeting, image-obsessed medium could ever do justice to investigative journalism. But it did. And it did it superbly, finding a new way to engage people with the dry, with the political, and the complex.
Continue reading

Why we need open courts data – and newspapers need to improve too

Justice

Justice photo by mira66

Few things sum up the division of the UK around the riots like the sentencing of those involved. Some think courts are too lenient, while others gape at six month sentences for people who stole a bottle of water.

These judgments are often made on the basis of a single case, rather than any overall view. And you might think, in such a situation, that a journalist’s role would be to find out just how harsh or lenient sentencing has been – not just across the 1,600 or more people who have been arrested during the riots, but also in comparison to previous civil disturbances – or indeed, to similar crimes outside of a riot situation.

As Martin Belam argues:

“Really good data journalism will help us untangle the truth from those prejudiced assumptions. But this is data journalism that needs to stay the course, and seems like an ideal opportunity to do “long-form data journalism”. How long will these looters serve? What is the ethnic make-up and age range of those convicted? How many other criminals will get an early release because our jails are newly full of looters? How many people convicted this week will go on to re-offend?”

And yet, amazingly, we cannot reliably answer these questions – because it is still not possible to get raw data on sentencing in UK courts, not even through FOI. Continue reading

INFOGRAPHIC: UK riots – Gauging the Columnists Blame Game

Here’s a quick experiment in data visualisation to provide an instant insight into a story on how the blame game is being played by columnists.

The data is taken from a Liberal Conspiracy blog post – I’ve transferred that into a spreadsheet with limited categories and used the Gauges gadget to visualise the totals.

A screengrab is below – but there is also an embed code that provides a gauge that will be updated whenever a new columnist is added. See the spreadsheet for both the gauge and the raw data.

Columnist Blame Game Gauge - UK Riots

Columnist Blame Game Gauge

How to: convert easting/northing into lat/long for an interactive map

A map generated in Google Fusion Tables from a geocoded dataset

A map generated in Google Fusion Tables from a dataset cleaned using these methods

Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn’t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed cameras.

So you’ll need a way to convert easting and northing into something that Fusion Tables does like – such as latitude and longitude.

Here’s how I did it – quickly. Continue reading

SFTW: Asking questions of a webpage – and finding out when those answers change

Previously I wrote on how to use the =importXML formula in Google Docs to pull information from an XML page into a conventional spreadsheet. In this Something For The Weekend post I’ll show how to take that formula further to grab information from webpages – and get updates when that information changes.

Animation from Digital Inspiration

Animation from Digital Inspiration

Asking questions of a webpage – or find out when the answer changes

Despite its name, the =importXML formula can be used to grab information from HTML pages as well. This post on SEO Gadget, for example, gives a series of examples ranging from grabbing information on Twitter users to price information and web analytics (it also has some further guidance on using these techniques, and is well worth a read for that).

Asking questions of webpages typically requires more advanced use of XPath than I outlined previously – and more trial and error.

This is because, while XML is a language designed to provide structure around data, HTML – used as it is for a much wider range of purposes – isn’t quite so tidy.

Finding the structure

To illustrate how you can use =importXML to grab data from a webpage, I’m going to grab data from Gorkana, a job ads site.

Continue reading