Category Archives: online journalism

Manchester Police tweets and the MEN – local data journalism part 2

Manchester Evening News visualisation of Police incident tweets

A week ago I blogged about how the Manchester Evening News were using data visualisation to provide a deeper analysis of the local police force’s experiment in tweeting incidents for 24 hours. In that post Head of Online Content Paul Gallagher said he thought the real benefit would “come afterwards when we can also plot the data over time”.

Now that data has been plotted, and you can see the results here.

In addition, you can filter the results by area, type (crime or ‘social work’) and category (specific sort of crime or social issue). To give the technical background: Carl Johnstone put the data into a mysql database, wrote some code in Perl for the filters and used a Flash applet for the graphs. Continue reading

Creating an emergency notification system in 15 hours

I’ve written a post on the Scraperwiki blog about a hackathon I attended where a small group of developers and people with experience of crowdsourcing in emergencies created a fantastic tool to inform populations in an emergency.

The primary application is non-journalistic, but the subject matter has obvious journalistic potential for any event that requires exchanges of information. Here are just some that spring to mind:

  • A protest where protestors and local residents can find out where it is at that moment and what streets are closed.
  • A football match with potential for violence (i.e. local derby) where supporters can be alerted of any trouble and what routes to use to avoid it.
  • A music festival where you could text the name of the bands you want to see and receive alerts of scheduled appearances and any delays
  • A conference where you could receive all the above – as well as text updates on presentations that you’re missing (taken from hashtagged tweets, even)

There are obvious commercial applications for some of the above too – you might have to register your mobile ahead of the event and pay a fee to ensure you receive the texts.

Not bad for 15 hours’ work.

You can read the blog post in full here.

A template for '100 percent reporting'

progress bar for 100 percent reporting

Last night Jay Rosen blogged about a wonderful framework for networked journalism – what he calls the ‘100 percent solution‘:

“First, you set a goal to cover 100 percent of… well, of something. In trying to reach the goal you immediately run into problems. To solve those problems you often have to improvise or innovate. And that’s the payoff, even if you don’t meet your goal”

In the first example, he mentions a spreadsheet. So I thought I’d create a template for that spreadsheet that tells you just how far you are in achieving your 100% goal, makes it easier to organise newsgathering across a network of actors, and introduces game mechanics to make the process more pleasurable. Continue reading

Review: Yahoo! Pipes tutorial ebook

Pipes Tutorial ebook

I’ve been writing about Yahoo! Pipes for some time, and am consistently surprised that there aren’t more books on the tool. Pipes Tutorial – an ebook currently priced at $14.95 – is clearly aiming to address that gap.

The book has a simple structure: it is, in a nutshell, a tour around the various ‘modules’ that you combine to make a pipe.

Some of these will pull information from elsewhere – RSS feeds, CSV spreadsheets, Flickr, Google Base, Yahoo! Local and Yahoo! Search, or entire webpages.

Some allow the user to input something themselves – for example, a search phrase, or a number to limit the type of results given.

And others do things with all the above – combining them, splitting them, filtering, converting, translating, counting, truncating, and so on.

When combined, this makes for some powerful possibilities – unfortunately, its one-dimensional structure means that this book doesn’t show enough of them.

Modules in isolation

While the book offers a good introduction into the functionality of the various parts of Yahoo! Pipes, it rarely demonstrates how those can be combined. Typically, tutorial books will take you through a project that utilises the power of the tools covered, but Pipes Tutorial lacks this vital element. Sometimes modules will be combined in the book but this is mainly done because that is the only way to show how a single module works, rather than for any broader pedagogical objective.

At other times a module is explained in isolation and it is not explained how the results might actually be used. The Fetch Page module, for example – which is extremely useful for scraping content from a webpage – is explained without reference to how to publish the results, only a passing mention that the reader will have to use ‘other modules’ to assign data to types, and that Regex will be needed to clean it up.

Continue reading

Practical steps for improving visualisation

Here’s a useful resource for anyone involved in data journalism and visualising the results. ‘Dataviz‘ – a site for “improving data visualisation in the public sector” – features a step by step guide to good visualisation, as well as case studies and articles.

Although it’s aimed at public sector workers, the themes in the former provide a good starting point for journalists: “What do we need to do?”; “How do we do it?” and “How did we do?” Each provides a potential story angle. Clicking through those themes takes you through some of the questions to ask of the data, taking you to a gallery of visualisation possibilities. Even if you never get that far, it’s a good way to narrow the question you’re asking – or find other questions that might result in interesting stories and insights.

Lessons in community from community managers #12: Lorna Mitchell

It’s been a while since the last in the community management series. In this latest post Lorna Mitchell gives her 3 tips. Lorna is co-project lead for http://joind.in – an open source development project for gathering event feedback. She says “The other project lead is Chris Cornutt, a guy I’ve met three times over three years, who lives in a timezone 6 hours out from mine.”

Lorna worked as a telecommuter for a number of years and did community relations in that role, and was involved in running PHPWomen, a global user group bringing together women programming PHP, “with all the cultural and linguistic variations that brings.”

Lorna’s tips are:

Keep communicating

A running commentary of what you are doing and thinking is essential when you are working with people who can’t see you and may not have met you.

Communicate appropriately

Don’t hold a discussion over Twitter that would be better in long hand over email. Make a phone call rather than having days of comment and response on a bug tracker.

Be inclusive

Nothing turns newcomers off faster than lots of in-jokes or references to people they don’t know or places they didn’t go.

Hyperlocal voices: Bart Brouwers, Telegraaf hyperlocal project, Netherlands

Bart Brouwers has been overseeing the establishment of a whole group of hyperlocal sites in the Netherlands with the Telegraaf Media Group. As part of the Hyperlocal Voices series, he explains the background to the project and what they’ve learned so far. Two presentations on the project can be seen above.

Who were the people behind the blog, and what were their backgrounds?

About a year ago, I came up with the plan for a hyperlocal, hyperpersonal news and data network covering all of the Netherlands. My dream was to give every single Dutchman (we have 16 million & counting…) his own platform for local relevance.

I wanted to roll it out myself and in order to get it financed I made contact with the board of directors of the Telegraaf Media Groep. I already worked for them (as the editor-in-chief of national free newspaper Sp!ts and before that as the editor-in-chief of regional newspaper Dagblad De Limburger), so it felt kind of natural to tell and ask them before I would pitch my idea somewhere else.

What I didn’t know is that TMG was already working on a hyperlocal platform, so after a few talks we decided to combine both plans. So instead of quitting TMG and starting my own company, I’m still an employee.

What made you decide to set up the blogs?

I was convinced local relevance would/will be a strong force in media. The combination of local business and local information (news, data) could easily become the trigger for a fine enterprise. Continue reading

Manchester police tweets – live data visualisation by the MEN

Manchester police tweets - live data visualisation

Greater Manchester Police (GMP) have been experimenting today with tweeting every incident they deal with. The novelty value of the initiative has been widely reported – but local newspaper the Manchester Evening News has taken the opportunity to ask some deeper questions of the data generated by experimenting with data visualisation.

A series of bar charts – generated from Google spreadsheets and updated throughout the day – provide a valuable – and instant – insight into the sort of work that police are having to deal with.

In particular, the newspaper is testing the police’s claim that they spend a great deal of time dealing with “social work” as well as crime. At the time of writing, it certainly does take up a significant proportion – although not the “two-thirds” mentioned by GMP chief Peter Fahy. (Statistical disclaimer: the data does not yet even represent 24 hours, so is not yet going to be a useful guide. Fahy’s statistics may be more reliable).

Also visualised are the areas responsible for the most calls, the social-crime breakdown of incidents by area, and breakdowns of social incidents and serious crime incidents by type.

I’m not sure how much time they had to prepare for this, but it’s a good quick hack.

That said, the visualisation could be improved: 3D bars are never a good idea, for instance, and the divisional breakdown showing serious crime versus “social work” is difficult to visually interpret (percentages of the whole would be more easy to directly compare). The breakdowns of serious crimes and “social work”, meanwhile, should be ranked from most popular down with labelling used rather than colour.

Head of Online Content Paul Gallagher says that it’s currently a manual exercise that requires a page refresh to see updated visuals. But he thinks “the real benefit of this will come afterwards when we can also plot the data over time”. Impressively, the newspaper plans to publish the raw data and will be bringing it to tomorrow’s Hacks and Hackers Hackday in Manchester.

More broadly, the MEN is to be commended for spotting this more substantial angle to what could easily be dismissed as a gimmick by the GMP. Although that doesn’t stop me enjoying the headlines in coverage elsewhere (shown below).

UPDATE: The data is also visualised as a word cloud and line chart at Data Driven.

Manchester police twitter headlines

Statistical analysis as journalism – Benford’s law

 

drug-related murder map

I’m always on the lookout for practical applications of statistical analysis for doing journalism, so this piece of work by Diego Valle-Jones, on drug-related murders, made me very happy.

I’ve heard of the first-digit law (also known as Benford’s law) before – it’s a way of spotting dodgy data.

What Diego Valle-Jones has done is use the method to highlight discrepancies in information on drug-delated murders in Mexico. Or, as Pete Warden explains:

“With the help of just Benford’s law and data sets to compare he’s able to demonstrate how the police are systematically hiding over a thousand murders a year in a single state, and that’s just in one small part of the article.”

Diego takes up the story:

“The police records and the vital statistics records are collected using different methodologies: vital statistics from the INEGI [the statistical agency of the Mexican government] are collected from death certificates and the police records from the SNSP are the number of police reports (“averiguaciones previas”) for the crime of murder—not the number of victims. For example, if there happened to occur a particular heinous crime in which 15 teens were massacred, but only one police report were filed, all the murders would be recorded in the database as one. But even taking this into account, the difference is too high.

“You could also argue that the data are provisional—at least for 2008—but missing over a thousand murders in Chihuahua makes the data useless at the state level. I could understand it if it was an undercount by 10%–15%, or if they had added a disclaimer saying the data for Chihuahua was from July, but none of that happened and it just looks like a clumsy way to lie. It’s a pity several media outlets and the UN homicide statistics used this data to report the homicide rate in Mexico is lower than it really is.”

But what brings the data alive is Diego’s knowledge of the issue. In one passage he checks against large massacres since 1994 to see if they were recorded in the database. One of them – the Acteal Massacre (“45 dead, December 22, 1997″) – is not there. This, he says, was “committed by paramilitary units with government backing against 45 Tzotzil Indians … According to the INEGI there were only 2 deaths during December 1997 in the municipality of Chenalho, where the massacre occurred. What a silly way to avoid recording homicides! Now it is just a question of which data is less corrupt.”

The post as a whole is well worth reading in full, both as a fascinating piece of journalism, and a fascinating use of a range of statistical methods. As Pete says, it is a wonder this guy doesn’t get more publicity for his work.

Statistical analysis as journalism – Benford's law

drug-related murder map

I’m always on the lookout for practical applications of statistical analysis for doing journalism, so this piece of work by Diego Valle-Jones, on drug-related murders, made me very happy.

I’ve heard of the first-digit law (also known as Benford’s law) before – it’s a way of spotting dodgy data.

What Diego Valle-Jones has done is use the method to highlight discrepancies in information on drug-delated murders in Mexico. Or, as Pete Warden explains:

“With the help of just Benford’s law and data sets to compare he’s able to demonstrate how the police are systematically hiding over a thousand murders a year in a single state, and that’s just in one small part of the article.”

Diego takes up the story:

“The police records and the vital statistics records are collected using different methodologies: vital statistics from the INEGI [the statistical agency of the Mexican government] are collected from death certificates and the police records from the SNSP are the number of police reports (“averiguaciones previas”) for the crime of murder—not the number of victims. For example, if there happened to occur a particular heinous crime in which 15 teens were massacred, but only one police report were filed, all the murders would be recorded in the database as one. But even taking this into account, the difference is too high.

“You could also argue that the data are provisional—at least for 2008—but missing over a thousand murders in Chihuahua makes the data useless at the state level. I could understand it if it was an undercount by 10%–15%, or if they had added a disclaimer saying the data for Chihuahua was from July, but none of that happened and it just looks like a clumsy way to lie. It’s a pity several media outlets and the UN homicide statistics used this data to report the homicide rate in Mexico is lower than it really is.”

But what brings the data alive is Diego’s knowledge of the issue. In one passage he checks against large massacres since 1994 to see if they were recorded in the database. One of them – the Acteal Massacre (“45 dead, December 22, 1997”)is not there. This, he says, was “committed by paramilitary units with government backing against 45 Tzotzil Indians … According to the INEGI there were only 2 deaths during December 1997 in the municipality of Chenalho, where the massacre occurred. What a silly way to avoid recording homicides! Now it is just a question of which data is less corrupt.”

The post as a whole is well worth reading in full, both as a fascinating piece of journalism, and a fascinating use of a range of statistical methods. As Pete says, it is a wonder this guy doesn’t get more publicity for his work.