Manchester Police tweets and the MEN – local data journalism part 2

Manchester Evening News visualisation of Police incident tweets

A week ago I blogged about how the Manchester Evening News were using data visualisation to provide a deeper analysis of the local police force’s experiment in tweeting incidents for 24 hours. In that post Head of Online Content Paul Gallagher said he thought the real benefit would “come afterwards when we can also plot the data over time”.

Now that data has been plotted, and you can see the results here.

In addition, you can filter the results by area, type (crime or ‘social work’) and category (specific sort of crime or social issue). To give the technical background: Carl Johnstone put the data into a mysql database, wrote some code in Perl for the filters and used a Flash applet for the graphs.

It’s a good follow up, although at the current time somewhat short of illuminating findings. The page introducing the interactive chart links to just one time-based story from the data: that between 9pm and 10pm at night a quarter of all calls relate to anti-social behaviour. There’s no indication that journalists will be digging for others (UPDATE: Paul has since told me they used the data to produce “separate stories for each of our district weekly print titles”.

The text also fails to invite users to contribute their own insights, instead presenting the tools as a way to find a personalised ‘story’ rather than the start of any collaborative process.

The visualisation tool could also be improved. While allowing you to look at any particular category and area in isolation, it doesn’t allow you to visually compare them to see, for example, whether Bolton or Bury is quieter at night, or whether burglary peaks in the morning in one area, but in the evening in another.

And of course, they’ve not linked to the original data to allow a helpful developer to do that for them (going into greater depth: the URL for each set of results is ‘hackable’ – i.e. easy to construct if you know what you’re looking for – and so easier to scrape the resulting tweets. However, the chart itself with the numbers in it is Flash-based which creates a problem). UPDATE: Paul tells me they are planning to make the data public and invite developers to do their own work, but the eruption of other major news in the city means they “just have not had time yet”.

On the positive side, it’s good to see a clear basic visualisation with a base starting at 0.

If you do want the raw data, it’s been put together by The Guardian’s Michael Brunton-Spall and is available here.

This formed the basis for a day of activity at a Hacks & Hackers Day last week, which the Manchester Evening News took part in. The results of that can be read on the Scraperwiki blog and on Andy Dickinson’s blog. These included:

“David Kendal produced his own project mapping 999 calls in the area. He took the tweet data and put it through the Yahoo placemaker tool, plotting information on a Google map, to see which areas got calls over certain periods of time.

“Yuwei Lin and Enrico Zini [produced] a GMP tweet database, and showed a very neat search tool that allowed analysis of certain aspects of the police data (3257 items).”

And unrelated to the police tweets but of enormous use to journalists was the creation of, a website of United Kingdom case judgment data.

“At the moment this is only available via Bailli and the team wanted to make something more usable and searchable (Bailli’s data cannot be scraped or indexed by Google).

“It is still a work in progress, but could eventually provide a very useful tool for journalists. Although the data is not updated past a certain point, journalists would be able to analyse the information for different factors: which judges made which judgments? What is the level of activity in different courts? Which times of year are busier? It could be scrutinised to determine different aspects of the cases.”

I’m immensely pleased to see this come about as a result in part (I’m told) of an investigation on Help Me Investigate last year.

5 thoughts on “Manchester Police tweets and the MEN – local data journalism part 2

  1. Carl Johnstone

    Hi, as the developer who did the web work for this I’d like to address a few points. Note that these are my personal views, and not necessarily those of my employer MEN Media.

    Currently the (tiny) web development team here are all working on a major project that needs to be finished next week – so the results are from me spending a reasonable amount of personal time on it as it “scratched an itch”. There are loads of things that I would’ve liked to have done with the data, but did what I could with the limited time. I could do more, but realistically in these fast-moving news days, the story is already dead. (Of course given sufficient up-front time we could’ve been doing this live on the day.)

    As initially there were no plans to do anything at all like this, I’m really pleased that *something* has gone up. Add to that that as far as I’m aware we’re the only mainstream media organisation that has done anything other than report on the event.

    As far as allowing further analysis, most people will have a quick browse – probably looking at the division that equates to where they live – but almost certainly aren’t interested in further analysing the data. That said, following a discussion internally we’ll have a download link up sometime this afternoon.

    Finally I’m hoping that this is going to serve as the first baby steps and we’ll be able to do more stuff like this (and better) in the future.

  2. Paul Gallagher

    Thanks for your interest in this and for your observations. We have had more spinoffs from the data than you mention here as we have been able to to create separate articles for each of our district weekly print titles by narrowing down the dataset to their area – which was one of the main reasons for doing this in the first place.
    I think a lot of people elsewhere found it difficult to extract much in the way of useful information from the raw text of the tweets from GMP24 because the police operators did not follow a standard template with regards to geography and category of incident.
    To turn the data into information we could use we had to go through all 3,025 tweets and categorise each one according to its district and type.
    This required a certain amount of local knowledge and editorial judgement on the reporting of crime – as well as half a dozen willing volunteers – and fortunately we were able to draw on all of these among the staff in the MEN newsroom.
    The final spreadsheet is not really ‘raw data’ because it is coloured by our collective judgement on how each tweet should be categorised. There will be errors and differences of opinion in how various people here have interpreted similar tweets. However, with those health warnings in place, I believe we were sufficiently robust in our methods to produce a reliable dataset and we are happy to make it openly available to developers. The spreadsheet can now be downloaded from our site.

  3. Paul Bradshaw

    Thanks Carl – I think it’s to MEN’s credit that any resources at all were allocated, and I hope my previous blog post gives enough credit for using that to dig beyond the novelty angle of the story. I also think the MEN deserve credit for getting involved in the hackday.

    Like you, I’m really pleased that *something* went up, and was followed up too. With this post I’m trying to think where the MEN and others might go next, or next time. More broadly, I’m trying to move the data journalism discussion beyond the ‘ooh’ stage!

    Paul – the point about ‘raw’ data is a good one that I should have explored more. It also emphasises the role of journalists in cleaning up and contextualising raw data.

  4. Pingback: links for 2010-10-23 « pabwall

  5. Pingback: links for 2010-10-23 « Köszönjük, Emese!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.