Monthly Archives: October 2010

Stories hidden in the data, stories in the comments

the tax gap

My attention was drawn this week by David Hayward to a visualisation by David McCandless of the tax gap (click on image for larger version). McCandless does some beautiful stuff, but what was particularly interesting in this graphic was how it highlighted areas that rarely make the news agenda.

Tax avoidance and evasion, for example, account for £7.4bn each, while benefit fraud and benefit system error account for £1.5 and £1.6bn respectively.

Yet while the latter dominate the news agenda, and benefit cheats subject to regular exposure, tax avoidance and evasion are rare guests on the pages of newspapers.

In other words, the data is identifying a news hole of sorts. There are many reasons for this – Galtung & Ruge would have plenty of ideas, for example – but still: there it is.

The comments

But that’s only part of what makes this so interesting. By publishing the data and having built the healthy community that exists around the data blog, McCandless and The Guardian benefit from some very useful comments (aside from the odd political one) on how to improve both the data and the visualisation.

This is a great example of how the newspaper is stealing an enormous march on its rivals in working beyond its newsroom in collaboration with users – benefiting from what Clay Shirky would call cognitive surplus. Data is not just an informational object, but a social one too.

Statistical analysis as journalism – Benford’s law

 

drug-related murder map

I’m always on the lookout for practical applications of statistical analysis for doing journalism, so this piece of work by Diego Valle-Jones, on drug-related murders, made me very happy.

I’ve heard of the first-digit law (also known as Benford’s law) before – it’s a way of spotting dodgy data.

What Diego Valle-Jones has done is use the method to highlight discrepancies in information on drug-delated murders in Mexico. Or, as Pete Warden explains:

“With the help of just Benford’s law and data sets to compare he’s able to demonstrate how the police are systematically hiding over a thousand murders a year in a single state, and that’s just in one small part of the article.”

Diego takes up the story:

“The police records and the vital statistics records are collected using different methodologies: vital statistics from the INEGI [the statistical agency of the Mexican government] are collected from death certificates and the police records from the SNSP are the number of police reports (“averiguaciones previas”) for the crime of murder—not the number of victims. For example, if there happened to occur a particular heinous crime in which 15 teens were massacred, but only one police report were filed, all the murders would be recorded in the database as one. But even taking this into account, the difference is too high.

“You could also argue that the data are provisional—at least for 2008—but missing over a thousand murders in Chihuahua makes the data useless at the state level. I could understand it if it was an undercount by 10%–15%, or if they had added a disclaimer saying the data for Chihuahua was from July, but none of that happened and it just looks like a clumsy way to lie. It’s a pity several media outlets and the UN homicide statistics used this data to report the homicide rate in Mexico is lower than it really is.”

But what brings the data alive is Diego’s knowledge of the issue. In one passage he checks against large massacres since 1994 to see if they were recorded in the database. One of them – the Acteal Massacre (“45 dead, December 22, 1997″) – is not there. This, he says, was “committed by paramilitary units with government backing against 45 Tzotzil Indians … According to the INEGI there were only 2 deaths during December 1997 in the municipality of Chenalho, where the massacre occurred. What a silly way to avoid recording homicides! Now it is just a question of which data is less corrupt.”

The post as a whole is well worth reading in full, both as a fascinating piece of journalism, and a fascinating use of a range of statistical methods. As Pete says, it is a wonder this guy doesn’t get more publicity for his work.

Statistical analysis as journalism – Benford's law

drug-related murder map

I’m always on the lookout for practical applications of statistical analysis for doing journalism, so this piece of work by Diego Valle-Jones, on drug-related murders, made me very happy.

I’ve heard of the first-digit law (also known as Benford’s law) before – it’s a way of spotting dodgy data.

What Diego Valle-Jones has done is use the method to highlight discrepancies in information on drug-delated murders in Mexico. Or, as Pete Warden explains:

“With the help of just Benford’s law and data sets to compare he’s able to demonstrate how the police are systematically hiding over a thousand murders a year in a single state, and that’s just in one small part of the article.”

Diego takes up the story:

“The police records and the vital statistics records are collected using different methodologies: vital statistics from the INEGI [the statistical agency of the Mexican government] are collected from death certificates and the police records from the SNSP are the number of police reports (“averiguaciones previas”) for the crime of murder—not the number of victims. For example, if there happened to occur a particular heinous crime in which 15 teens were massacred, but only one police report were filed, all the murders would be recorded in the database as one. But even taking this into account, the difference is too high.

“You could also argue that the data are provisional—at least for 2008—but missing over a thousand murders in Chihuahua makes the data useless at the state level. I could understand it if it was an undercount by 10%–15%, or if they had added a disclaimer saying the data for Chihuahua was from July, but none of that happened and it just looks like a clumsy way to lie. It’s a pity several media outlets and the UN homicide statistics used this data to report the homicide rate in Mexico is lower than it really is.”

But what brings the data alive is Diego’s knowledge of the issue. In one passage he checks against large massacres since 1994 to see if they were recorded in the database. One of them – the Acteal Massacre (“45 dead, December 22, 1997”)is not there. This, he says, was “committed by paramilitary units with government backing against 45 Tzotzil Indians … According to the INEGI there were only 2 deaths during December 1997 in the municipality of Chenalho, where the massacre occurred. What a silly way to avoid recording homicides! Now it is just a question of which data is less corrupt.”

The post as a whole is well worth reading in full, both as a fascinating piece of journalism, and a fascinating use of a range of statistical methods. As Pete says, it is a wonder this guy doesn’t get more publicity for his work.

Andrew Marr fails to learn from his own history

“It is frightful that someone who is no one… can set any error into circulation with no thought of responsibility & with the aid of this dreadful disproportioned means of communication”

That’s not a quote from Andrew Marr, but Soren Kierkegaard writing about newspapers in the 19th century. Here’s another:

“I do not mean to be the slightest bit critical of TV newspeople, who do a superb job, considering that they operate under severe time constraints and have the intellectual depth of hamsters.  But TV news can only present the “bare bones” of a story; it takes a newspaper, with its capability to present vast amounts of information, to render the story truly boring”

Strange that the author of one of the best histories of British journalism can fail to remember how each new platform for journalism has been greeted, and how fuzzy the concept of journalism is.

“Journalism includes drunks and dyslexics and some of the least trustworthy, wickedest people in the land … The reader doesn’t know who pretends to make the necessary phone calls, but never bothers; or that this one hates Tories and always writes them down.”

That’s a quote from Andrew Marr’s book. Here’s another:

“In a complicated, developed society, much of the most important finding out can only be done by people with narrower, sharper skills – microbiologists, meteorologists, opinion pollsters and market analysts, whose discoveries journalism simply passes on in a more popular (and generally distorted) form.”

Sounds like bloggers to me.

Marr doesn’t even need to look very far back. This fake-debate was laid to rest years ago (is anyone really claiming that citizen journalism will entirely replace professional journalism? Or still trying to compare blogging – a technical process – with journalism – a cultural construct?). As I tweeted yesterday: the year 2005 called, Andrew. They want their prejudices back.

Meanwhile, Channel 4 journalist Krishnan Guru-Murthy has written eloquently in defence of bloggers and the need to engage through social media.

Revisiting Rodolfo Walsh, father of Argentinian non fiction

For Argentinians like me, it was Rodolfo Walsh – and not Truman Capote, who published In Cold Blood almost a decade later – that invented non fiction journalism with his famous 1957 book Operación Masacre, a masterpiece of investigative journalism.

Twenty years later, on the first anniversary of Jorge Rafael Videla’s dictatorship, he was intercepted by soldiers, murdered, and his remains vanished: he became a “desaparecido”, just after delivering his Open Letter from a Writer to the Military Junta (Carta Abierta de un Escritor a la Junta Militar) to Argentine newspapers and correspondents at foreign media organizations.

OperacionMasacreBook

To commemorate his work, Alvaro Liuzzi is starting a “journalistic experiment” called Proyecto Walsh searching for an answer to an interesting question: “What would have happened if, for the research of Operacion Masacre, Rodolfo Walsh had had access to the digital tools we have today?”.

The Twitter user @rodolfowalsh is the first step of Proyecto Walsh that will try to create an digital ecosystem in order to gather all of the research that Rodolfo accomplished 54 years ago, and remix it using the  journalistic tools of today.

Local newspaper data journalism – school admissions in Birmingham

data journalism at the Birmingham Mail - school admissions data

The Birmingham Mail has been trying its hand at data journalism with school admissions data. It’s a good place to start – the topic attracts a lot of interest (and so justifies the investment of time) while people tend to be interested in more than just who finishes top and bottom of the tables (justifying the choice of medium).

The results are impressive. Applications data is plotted on a Google map on the main page, while an “interactive chart” page allows you to compare schools across various criteria, and also narrow the sample by selecting from two drop down menus (town and school).

The charts have been made in Tableau, which includes a download link at the bottom. However, you need Tableau itself (free, but PC only) to open it.

A further page features links to tables for each area. Sadly, the pages containing tables do not contain any link to the raw data. This presents an extra hurdle to users – although you can scrape the table into a Google spreadsheet using the =import formula. If you want to see how, here’s a spreadsheet I created from the data by doing just that. Click on the first cell to see the formula that generates it.

I asked David Higgerson, Trinity Mirror’s Head of Multimedia and the man whose name appears on the Tableau data, to explain the process behind the project. It seems the information was a combination of freely available data and that acquired via FOI.

“The Mail took the data available – number of places available, number of first choice applicants and number of total applicants – and worked out a ratio of first choice applicants per place. This is relevant to parents because councils try to allocate places to children based on preference once they’ve decided which schools a child is eligible for. Eligibility varies depending on type of school.

“The figures showed how popular faith schools were, and also how fierce competition was for places at grammar schools. That’s the story which generated most interest.

“As you’ve said on your blog, the hardest part was making the data uniform, and the making it relevant to readers.

“In print, it ran across three days. Day one was grammar schools, day two was all schools and day three revealed how catchment areas for oversubscribed schools which use distance from school to fill their last few places.

“Online, Google Fusion was used to create maps, Tableau for the interactive chart which lets people choose based on town or school, and Tableizer for the quick tables which appear in the section too. We also had a play with Scribble Maps, which we think has real potential for print/online newsrooms.”

It seems education reporter Kat Keogh deserves the credit for spotting the stories in the data, “with the usual support you’d expect in the newsroom – newsdesk etc.”

David and Anna Jeys experimented with the online presentation and others laid out the data for print.

BBC new linking guidelines issued – science journals mentioned

The BBC have just emailed new linking guidelines to their staff. They stipulate that linking is “essential” to online journalism and in one slide (it’s a PowerPoint document) titled ‘If you remember nothing else’ highlight how linking will change:

What we used to do…

  • Lists of archive news stories
  • Homepages only on external websites
  • No inline linking in news stories

What we do now – think adding value…

  • Avoid news stories and link to useful stuff – analysis, explainers, Q&As, pic galleries etc
  • On external websites look beyond homepage to pages of specific relevance
  • Inline linking in news stories is OK when it’s to a primary source

Other points of note in the document include the repeated emphasis on useful deep linking, and the importance of the newstracker module (which links to coverage on other news sites). Curiously, when referring to inline links it does say that “different rules can apply” to BBC blogs – “speak to blogs team if in doubt”.

Something I did look for – and find – was a reference to linking to scientific journals. And here it is: “In news stories inline links must go to primary sources only– eg scientific journal article or policy report (1 or 2 per story; avoid intro)”

This is significant given the previous campaigning on this issue.

On the whole it’s a good set of guidance – I’ll refrain from publishing it in hope that the BBC will…

UPDATE: It seems The Guardian followed up the story and embedded the document, so here it is:

BBC guidelines for linking – Sept 2010

‘Making it findable’ – the creed of the hyperlocal blogger

I’ve written a post over at Podnosh.com (full disclosure: where I do some training and consultancy) on ‘Making it findable’ – the creed of the hyperlocal blogger, reporting on a discussion berween hyperlocal bloggers and local government officials at Hyperlocal Govcamp West Midlands. The meat of what I’m saying is in the middle:

“I noticed a recurring theme from the bloggers’ perspective on their role – something unique to online journalism, and which I’ve commented on before: the duty to make things findable.

“Bloggers repeatedly referred to information about the local democratic process that was hidden away on council websites – and which they worked hard to make available and interesting to their community. Council meeting times; minutes; planning meetings.

“At one point someone said that the bloggers were there to “hold power to account”. Not always in the active sense of posing difficult questions – but also in making the invisible visible; the obscure findable.

“By doing so they are not only shedding a light on the workings of local government, but transferring power. “This is your responsibility”, it says – not “This is my story”.”

There’s a nice comment below saying it “is the closest anyone, including me – has ever got to stating what my blog is about.” Full post here.

Online journalism student RSS reader starter pack: 50 RSS feeds

Teaching has begun in the new academic year and once again I’m handing out a list of recommended RSS feeds. Last year this came in the form of an OPML file, but this year I’m using Google Reader bundles (instructions on how to create one of your own are here). There are 50 feeds in all – 5 feeds in each of 10 categories. Like any list, this is reliant on my own circles of knowledge and arbitrary in various respects. But it’s a start. I’d welcome other suggestions.

Here is the list with links to the bundles. Each list is in alphabetical order – there is no ranking:

5 of the best: Community

A link to the bundle allowing you to add it to your Google Reader is here.

  1. Blaise Grimes-Viort
  2. Community Building & Community Management
  3. FeverBee
  4. ManagingCommunities.com
  5. Online Community Strategist

5 of the best: Data

This was a particularly difficult list to draw up – I went for a mix of visualisation (FlowingData), statistics (The Numbers Guy), local and national data (CountCulture and Datablog) and practical help on mashups (OUseful). I cheated a little by moving computer assisted reporting blog Slewfootsnoop into the 5 UK feeds and 10,000 Words into Multimedia. Bundle link here. Continue reading

Interview: Ton Zijlstra on open data in the EU (audio)

A couple weeks ago I spoke at the PICNIC festival in Amsterdam. While I was there I grabbed an interview with Ton Zijlstra, who has been following open data developments across EU governments very closely. You can find the interview embedded below:

[audio:http://audioboo.fm/boos/186944-ton-zijlstra-on-open-data-in-the-eu.mp3%5D