I’m always on the lookout for practical applications of statistical analysis for doing journalism, so this piece of work by Diego Valle-Jones, on drug-related murders, made me very happy.
I’ve heard of the first-digit law (also known as Benford’s law) before – it’s a way of spotting dodgy data.
What Diego Valle-Jones has done is use the method to highlight discrepancies in information on drug-delated murders in Mexico. Or, as Pete Warden explains:
“With the help of just Benford’s law and data sets to compare he’s able to demonstrate how the police are systematically hiding over a thousand murders a year in a single state, and that’s just in one small part of the article.”
Diego takes up the story:
“The police records and the vital statistics records are collected using different methodologies: vital statistics from the INEGI [the statistical agency of the Mexican government] are collected from death certificates and the police records from the SNSP are the number of police reports (“averiguaciones previas”) for the crime of murder—not the number of victims. For example, if there happened to occur a particular heinous crime in which 15 teens were massacred, but only one police report were filed, all the murders would be recorded in the database as one. But even taking this into account, the difference is too high.
“You could also argue that the data are provisional—at least for 2008—but missing over a thousand murders in Chihuahua makes the data useless at the state level. I could understand it if it was an undercount by 10%–15%, or if they had added a disclaimer saying the data for Chihuahua was from July, but none of that happened and it just looks like a clumsy way to lie. It’s a pity several media outlets and the UN homicide statistics used this data to report the homicide rate in Mexico is lower than it really is.”
But what brings the data alive is Diego’s knowledge of the issue. In one passage he checks against large massacres since 1994 to see if they were recorded in the database. One of them – the Acteal Massacre (“45 dead, December 22, 1997”) – is not there. This, he says, was “committed by paramilitary units with government backing against 45 Tzotzil Indians … According to the INEGI there were only 2 deaths during December 1997 in the municipality of Chenalho, where the massacre occurred. What a silly way to avoid recording homicides! Now it is just a question of which data is less corrupt.”
The post as a whole is well worth reading in full, both as a fascinating piece of journalism, and a fascinating use of a range of statistical methods. As Pete says, it is a wonder this guy doesn’t get more publicity for his work.
Pingback: Statistical Analysis with R- by John M Quick « DECISION STATS
Pingback: Data journalism stats time: seasonal adjustment | Online Journalism Blog
Pingback: Data journalism training – some reflections | Online Journalism Blog
Pingback: A Sticky, Engaging, and Valuable Talk on Data Journalism - Mardahl.dk
Pingback: Statistics as journalism redux: Benford’s Law used to question company accounts | Online Journalism Blog
Pingback: When data goes bad | Online Journalism Blog