Monthly Archives: September 2011

Dutch regional newspapers launch data journalism project RegioHack

In a guest post for OJB, Jerry Vermanen explains the background to RegioHack

The internet is bursting with information, but journalists – at least in The Netherlands – don’t get the full potential out of it. Basic questions on what data driven journalism is, and how to practise it, still have to be answered. Two Dutch regional newspapers (de Stentor and TC Tubantia) have launched RegioHack, an experiment with data driven journalism around local issues and open data.

Both newspapers circulate in the eastern and middle part of the Netherlands. In November, journalists will collaborate with local students, programmers and open data experts in a 30 hour coding event. In preparation for this hackathon, the forum on our website (www.regiohack.nl) is opened for discussion. Anyone can start a thread for a specific problem. For example, what’s the average age of each town in our region? And in 10 years, do we have enough facilities to accommodate the future population? And if not, what do we need?

The newspapers provide the participants with hot pizza, energy drink and 30 hours to find, clean up and present the data on these subjects.

After the hackathon, the projects are presented and participants will be named in the publications. That’s what RegioHack is all about: making unique stories with data, helping each other to develop new skills and finding out how to practise data driven journalism.

If you happen to be in The Netherlands on November 10th and 11th, contact me on jerry@regiohack.nl or Twitter (@JerryVermanen) for an invite to the final presentation.

We’re also searching for guest bloggers – and yes, that can be in English.

20 recent hyperlocal developments (June-August 2011)

Ofcom’s Damian Radcliffe produces a regular round-up of developments in hyperlocal publishing. In this guest post he cross-publishes his latest presentation for this summer, as well as the background to the reports.

Ofcom’s 2009 report on Local and Regional Media in the UK identified the increasing role that online hyperlocal media is playing in the local and regional media ecology.

New research in the report identified that

“One in five consumers claimed to use community websites at least monthly, and a third of these said they had increased their use of such websites over the past two years.”

That was two years ago, and since then, this nascent sector has continued to evolve, with the web continuing to offer a space and platform for community expression, engagement and empowerment.

The diversity of these offerings is manifest in the Hyperlocal Voices series found on this website, as well as Talk About Local’s Ten Questions feature, both of which speak to hyperlocal practitioners about their work.

For a wider view of developments in this sector, you may want to look at the bi-monthly series of slides I publish on SlideShare every two months.

Each set of slides typically outlines 20 recent hyperlocal developments; usually 10 from the UK and 10 from the US.

Topics in the current edition include Local TV, hyperlocal coverage of the recent England riots, the rise of location based deals and marketing, as well as the FCC’s report on The Information Needs of Communities.

Feedback and suggestions for future editions – including omissions from current slides – are actively welcomed.

Data Journalists Engaging in Co-Innovation…

You may or may not have noticed that the Boundary Commission released their take on proposed parliamentary constituency boundaries today.

They could have released the data – as data – in the form of shape files that can be rendered at the click of a button in things like Google Maps… but they didn’t… [The one thing the Boundary Commission quango forgot to produce: a map] (There are issues with publishing the actual shapefiles, of course. For one thing, the boundaries may yet change – and if the original shapefiles are left hanging around, people may start to draw on these now incorrect sources of data once the boundaries are fixed. But that’s a minor issue…)

Instead, you have to download a series of hefty PDFs, one per region, to get a flavour of the boundary changes. Drawing a direct comparison with the current boundaries is not possible.

The make-up of the actual constituencies appears to based on their member wards, data which is provided in a series of spreadsheets, one per region, each containing several sheets describing the ward makeup of each new constituency for the counties in the corresponding region.

It didn’t take long for the data junkies to get on the case though. From my perspective, the first map I saw was on the Guardian Datastore, reusing work by University of Sheffield academic Alasdair Rae, apparently created using Google Fusion Tables (though I haven’t see a recipe published anywhere? Or a link to the KML file that I saw Guardian Datablog editor Simon Rogers/@smfrogers tweet about?)

[I knew I should have grabbed a screen shot of the original map…:-(]

It appears that Conrad Quilty-Harper (@coneee) over at the Telegraph then got on the case, and came up with a comparative map drawing on Rae’s work as published on the Datablog, showing the current boundaries compared to the proposed changes, and which ties the maps together so the zoom level and focus are matched across the maps (MPs’ constituencies: boundary changes mapped):

Telegraph side by side map comparison

Interestingly, I was alerted to this map by Simon tweeting that he liked the Telegraph map so much, they’d reused the idea (and maybe even the code?) on the Guardian site. Here’s a snapshot of the conversation between these two data journalists over the course of the day (reverse chronological order):

Datajournalists in co-operative bootstrapping mode

Here’s the handshake…

Collaborative co-evolution

I absolutely love this… and what’s more, it happened over the course of four or five hours, with a couple of technology/knowledge transfers along the way, as well as evolution in the way both news agencies communicated the information compared to the way the Boundary Commission released it. (If I was evil, I’d try to FOI the Boundary Commission to see how much time, effort and expense went into their communication effort around the proposed changes, and would then try to guesstimate how much the Guardian and Telegraph teams put into it as a comparison…)

At the time of writing (15.30), the BBC have no data driven take on this story…

And out of interest, I also wondered whether Sheffield U had a take…

Sheffiled u media site

Maybe not…

PS By the by, the DataDrivenJournalism.net website relaunched today. I’m honoured to be on the editorial board, along with @paulbradshaw @nicolaskb @mirkolorenz @smfrogers and @stiles, and looking forward to seeing how we can start to drive interest, engagement and skills development in, as well as analysis and (re)use of, and commentary on, public open data through the data journalism route…

PPS if you’re into data journalism, you may also be interested in GetTheData.org, a question and answer site in the model of Stack Overflow, with an emphasis on Q&A around how to find, access, and make use of open and public datasets.

The New Online Journalists #11: Jack Dearlove

Jack Dearlove

Reviving an ongoing series of profiles of young journalists, I interviewed Leeds university journalism student Jack Dearlove about his work in data journalism. Jack works as a BA on BBC Radio York’s Breakfast show and is also a third year Broadcast Journalism student at the University of Leeds, where he is News Editor for Leeds Student Radio.

How did you get into data journalism?

I started exploring data journalism when I saw how the Guardian was publishing stories attached to the raw spreadsheets on their guardian.co.uk/data blog. I liked the way they could bring a little extra to a story by digging up a big old spreadsheet and letting people play around with it.

I’m really a spreadsheet guy, doing the classic autofilter and then ordering things by the biggest and smallest values and slowly going down each line in the spreadsheet. This can take a while but it’s the only way you can be sure you’ve seen the whole picture.

I’d like to get into ‘scraping’ but haven’t really had the time to play around with it. But any technique that means data that I might not have naturally come across is something I’d love to get the hang of.

How do you use it in your work for the BBC?

I’ve worked for the BBC for nearly 4 years and it’s something i’ve built into my role as my job has changed. It will certainly be something that I use when it comes to future job interviews though, because hopefully it sets me apart from your standard journalist.

I think my colleagues were quite sceptical at first, but I have a very supportive and data savvy Assistant Editor who’s just as keen to use the techniques as I am. So there’s an air of curiosity, as there is in many newsrooms. Continue reading

Creating Thematic Maps Based on UK Constituency Boundaries in Google Fusion Tables

I don’t have time to chase this just now, but it could be handy… Over the last few months, several of Alasdair Rae (University of Sheffield) Google Fusion Tables generated maps have been appearing on the Guardian Datablog, including one today showing the UK’s new Parliamentay constituency boundaries.

Looking at Alasdair’s fusion table for English Indices of Deprivation 2010, we can see how it contains various output area codes as well as KML geometry shape files that can be used to draw the boundaries on map.

Google fusion table - UK boundaries

On the to do list, then, is to a set of fusion tables that we can use to generate maps from datatables containing particular sorts of output area code. Because it’s easy to join two fusion tables by a common column, we’d then have a Google Fusion Tables simple recipe for thematic maps:

1) get data containing output area or constituency codes;
2) join with the appropriate mapping fusion table to annotate original data with appropriate shape files;
3) generate map…

I wonder – have Alasdair or anyone from the Guardian Datablog/Datastore team already published such a tutorial?

PS Ah, here’s one example tutorial: Peter Aldhous: Thematic Maps with Google Fusion Tables [PDF]

PPS for constituency boundary shapefiles as KML see http://www.google.com/fusiontables/DataSource?dsrcid=1574396 or the Guardian Datastore’s http://www.google.com/fusiontables/exporttable?query=select+col0%3E%3E1+from+1474106+&o=kmllink&g=col0%3E%3E1

How to use the CableSearch API to quickly reference names against Wikileaks cables (SFTW)

Cablesearch logo

CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it’s been around for some time, I’ve only just noticed the site’s API, so I thought I’d show how such an API can be useful as a way to draw on such data sources to complement data of your own. Continue reading

Gathering data: a flow chart for data journalists


Gathering data - a flow chart

Above is a flow chart that I sketched out during a long car journey to the Balkan Investigative Reporters Network Summer School in Croatia (don’t worry: I wasn’t driving).

It aims to help those doing data journalism identify how best to get hold of and deal with data by asking a series of questions about the information you want to compile and making suggestions on ways both to get hold of it and tools to then get it into a state which makes it easier to ask questions.

It also illustrates at a glance how the process of ‘getting hold of the data’ can vary widely, and how different projects can often involve completely different tools and skillsets from previous ones.

I will have missed obvious things, so please help me improve this. And if you find it useful, let me know.

Click on the image for other sizes.