A new data journalism tool – and a new way of reporting uncertainty

guesstimate: how long it takes to get ready for preschool

On the last day of last year, web developer Ozzie Gooen launched his new project Guesstimate, a spreadsheet ‘for things that aren’t certain’.

It is an inspired idea: software plays a key role in shaping what we do, and we take spreadsheets’ certainty about numbers for granted. Why should we?

Throw in journalism’s default dislike of ambiguity and a political tendency to play to that… well, it can all make for some flawed reporting.

I was so impressed with Guesstimate and the opportunities it presents for a new style of data reporting that I sought out Gooen to find out more about the project and how he came to launch it.

‘Respecting the unknowns’

“Many important things are not known,” Gooen writes in his post about the launch of Guesstimate.

“No one really knows what the US GDP will be if Donald Trump gets elected, or if the US can ‘win’ if we step up our fight in Syria. But we can make estimates, and we can use tools to become as accurate as possible.”

Perhaps one of the best examples of this is sex trafficking figures, which have ranged from 71 to 25,000 in various reports from charities, politicians, and the media – a process unpicked by The Guardian’s Nick Davies in a 2009 article.

Inflated sex trafficking statistics

Gooen has his own examples:

“One that comes to mind was the estimate that ‘every time one clicks onto a new web page the electricity used is equivalent to boiling a kettle’. This got a lot of attention before being thoroughly debunked by a few other models.

“If Guesstimate had been used, it may have been more obvious how uncertain that final number was. Perhaps the model could also have been published, then copied and altered by other people immediately.

But that doesn’t mean we shouldn’t report on those stories at all. “I hope journalists, and all other people, become more comfortable making models of their own,” says Gooen.

“It’s interesting to me how often I see very rough estimates by academics cited in the media, but how writers and other people seem intimidated to make numeric models themselves. It reminds me of coding, where a bit of knowledge can be incredibly useful, but many people get too intimidated to begin.

“Basically, I think a lot of information in the media is treated either as ‘completely true’ or ‘not worth touching’. I hope this changes to more of a gradient, where the ‘completely true’ things have some amount of uncertainty, and the really uncertain things are still discussed.”

“The question isn’t if we can know something with complete confidence, but if there are gains to be made by better understanding it”

Gooen says that he’s spent a lot of time with the futurism and Effective Altruism communities, neither of which offers a great deal of certainty.

Futurism, as he puts it, is about “Trying to predict the distant future while respecting the unknowns”, while Effective Altruism is about “Trying to predict which charity interventions or good deeds would have the most expected benefit.”

“Some people dismiss them both because they aren’t nearly as certain as other scientific fields. I think this misses the point.

“The question isn’t if we can know something with complete confidence, but if there are gains to be made by better understanding it. The solution to uncertainty isn’t to give up or ignore it, it’s to understand and minimize it.”

Engineering solutions to uncertainty

guesstimate carbon balance

Margins of error are the classic problem when it comes to reporting uncertainty: these are the ranges within which researchers or pollsters can be reasonably confident (about 95%) that their figures are accurate, because they are relying on samples to represent a larger population.

In 2011 the BBC wrote about a “worrying” rise in jobless figures which was actually smaller than the margin of error. As Ben Goldacre explained:

“The changes reported are clearly not statistically significant: the estimated change over the past quarter is 38,000 but the 95% confidence interval is ±87,000, running from -49,000 to 125,000.”

And just last year I wrote about the NSPCC’s use of a dodgy survey which failed to include any margin of error.

But it’s not statistics that’s been the driving force behind Guesstimate. Instead it’s engineering.

“Engineering of course involves statistics and scientific research, but keeps a strong focus on application.

“One of the main bottlenecks to optimizing business, political, and personal decisions in the same way that we optimize engineering systems is that the amount of uncertainty is far greater.  Making uncertainty easy to understand and work with seemed like the first step towards more general and powerful decision systems.”

From events to parenting

As the initial version of Guesstimate is taken up by a group of early adopters, however, it’s not just science and polling that’s finding it useful. Gooen says he has “definitely” been surprised by the range of things people have tried estimating.

“First off, there’s been a huge variety of cost projections and project time estimates for all different industries: software, marketing, travel, event coordination, blogging and others. There are many more scientific or technical uses as well. A few engineers have been experimenting with modeling their cloud architectures to understand expected response times and server loads. There have been a few models of CO2 emissions.

“Parenting has been another surprising topic. One person estimated their ‘Number of Hours of Free Time Per Week as a Parent’ and another ‘How Long it Takes to Get Ready for Preschool’. Both of those were quite sophisticated.

Now the challenge is too see if we can find a use in journalism. Suggestions welcome…

2 thoughts on “A new data journalism tool – and a new way of reporting uncertainty

  1. Pingback: A new data journalism tool – and a new way of reporting uncertainty | Online Journalism Blog | do not drop the ball

  2. Pingback: Nova ferramenta para visualização de dados | Webmanario

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.