Last week I spent a thoroughly fascinating day at a hackday for journalists and web developers organised by Scraperwiki. It’s an experience that every journalist should have, for reasons I’ll explore below but which can be summed up thus: it will challenge the way you approach information as a journalist.
Disappointingly, the mainstream press and broadcast media were represented by only one professional journalist. This may be due to the time of year, but then that didn’t prevent journalists attending last week’s Liverpool event in droves. Senior buy-in is clearly key here – and I fear the Birmingham media are going to left behind if this continues.
Because on the more positive side there was a strong attendance from local bloggers such as Michael Grimes, Andy Brightwell (Podnosh), Clare White (Talk About Local) and Nicola Hughes (Your Local Scientist) – plus Martin Moore from the Media Standards Trust and some journalism students.
How it worked
After some brief scene-setting presentations, and individual expressions of areas of interest, the attendees split into 5 topic-based groups. They were:
- The data behind the cancellation of Building for Schools projects
- Leisure facilities in the Midlands.
- Issues around cervical smear testing
- Political donations
- And our group, which decided to take on the broad topic of ‘health’, within the particular context of plans to give spending power to GP consortia.
By the end of the day all groups had results – which I’ll outline at the end of the post – but more interesting to me was the process of developers and journalists working together, and how it changed both camp’s working practices.
The work process
This is a genuinely collaborative process – not the linear editorial-and-production divide that so many journalists are used to.
Developers and journalists are continually asking each other for direction as the project develops: while the developers are shaping data into a format suitable for interpretation, the journalist might be gathering related data to layer on top of it or information that would illuminate or contextualise it.
This made for a lot of hard journalistic work – finding datasets, understanding them, and thinking of the stories within them, particularly with regard to how they connected with other sets of data and how they might be useful for users to interrogate themselves.
It struck me as a different skill to that normally practised by journalists – we were looking not for stories but for ‘nodes’: links between information such as local authority or area codes, school identifiers, and so on. Finding a story in data is relatively easy when compared to a project like this, and it did remind me more of the investigative process than the way a traditional newsroom works.
Afterwards I wondered what an effective hackday team might consist of, organisationally. Typically hackdays split people into journalists and coders, but in practice the skillsets are more subtle than that. Being a journalist, for example, doesn’t guarantee that you can find datasets; likewise the range of data we came across made it necessary for someone to keep track of it all (social bookmarking helps) and ensure we avoided distraction – but also were able to adapt and change if a more interesting discovery became possible.
So here is an initial attempt to outline the sort of roles a hackday team might work best with:
- Coders, obviously – some knowledge of datasets is advantage; having pre-cleaned a dataset even better. For this reason it may be worth using a wiki to lead up to the hackday to identify questions and datasets of interest so that work might be done in advance.
- A project manager of sorts. Someone to keep track of the focus of the project and the datasets involved or needed (and the progress with those). Again, a wiki might suit this well – or a Posterous blog so that those outside the hackday can more easily comment.
- Someone with computer assisted reporting (CAR) skills to find datasets.
- A journalist, blogger or expert who understands the data, e.g. codes, context, etc. – and/or who to speak to to get more data or context
- Someone to do visualisation. If no one is able, then an alternative might be to assign someone the role of creating the visualisation and as part of that, researching visualisation tools [LINK] and examples.
I’m probably missing roles – particularly within the subtleties of coding – so please post a comment if you can think of others.
What we did
Broadly we had gathered because of a shared interest in health. In discussion we decided that the key topical element was the plan to hand spending power to GP consortia. I’m not sure whether this was too broad a topic or whether that actually made for a longer-term project that we will return to. The initial thought was that we would create a resource that would become increasingly useful as the handover took place.
The first thing we needed was a clean list of all GP practices in GB. By midday Rob Styles had compiled such a list in RDF format, including location, their relationship to the Primary Care Trust (PCT), and their LSOA (Lower Layer Super Output Area) Code. In addition this had a unique URI for each practice code and a URI for each PCT code. These things are important in linking this data to other datasets.
Clare White, meanwhile, was identifying indicators & data to add on, such as life expectancy, income, etc.
Keith Alexander was turning life expectancy into another RDF with area codes as unique URIs, with a view to match to codes from other area using geocoding.
Martin Moore was also tracking down data; Mark Bentley tracked down PCT and SHA populations; and I used my Twitter network to find other sources, while using advanced search techniques to track down a range of NHS data websites (there are, it appears, a lot), including earnings data and translations of industry jargon. Andrew Mackenzie actually used a telephone, calling people at West Midlands Observatory, getting contextual information and leads for NHS data, and working out ONS subgroup area codes & ONS group areas.
If that paragraph has not completely geeked you out, you’re doing well.
I bookmarked most of the data sources I found at http://delicious.com/paulb/data+health. Here are just a selection:
- Online GP practice results database
- Health Poverty Index allows you to compare poverty indicators in 2 different areas (click on ‘Tabled data’ on the left to get it in more scrapable form).
- NHS Comparators allows you to look at differences in referral and access rates at various levels including GP practice but requires you to register – Andrew Mckenzie found that it would be a week or so before that was processed.
- Hospital episodes statistics (HES) and
- NHS Choices API
- West Midlands Public Health Observatory
By the end of the day we had a national map of all 8,000 GP practices (the first 1,000 shown in the image above) – quite an achievement in itself, but not headline-grabbing yet. To give us an editorial angle for the end-of-day presentations, Rob searched for a spreadsheet of mental health prevalence and came up with the headlines that Manchester was the most ‘phobic’ place in Britain, and the Isle of Wight the least.
The next step would be to add GP population information (taken from an FOI response on What Do They Know) and Quality and Outcomes Framework data (scraped from the GP practice results database). There may be ways to match results from the GMC database of medical practitioners to practices too, while FOI requests could add more specific information, and I’ve since been sent links to other potentially interesting datasets.
The Scraperwiki blog gives more detail on what the other projects produced – all of them were hugely impressive. Andy Brightwell blogged about their leisure data project – which “hoped to find out more about the relationship between population density, deprivation and the provision of leisure facilities” here. And Michael Grimes has created a page using HTML5 to visualise how the majority of the cancelled Building Schools for the Future were in Labour-controlled areas.
Both links provide further insight into the possibilities of data journalism in just one day.
Thankfully, it’s not going to be just one day. Podnosh have already organised a Speed Data event, and there are further hack days planned for Birmingham – specifically around health – by Talis and NHSlocal.