Tag Archives: simon rogers

Sigma Awards: new data journalism competition launched

Sigma awards

Data journalists are being invited to enter a new data journalism award, launched to “celebrate the best data journalism around the world [and] to empower, elevate and enlighten the global community of data journalists.”

The Sigma Awards were created by Aron Pilhofer and Reginald Chua, with support from Marianne Bouchart and Google’s Simon Rogers. Bouchart managed the Data Journalism Awards organised by the Global Editors Network, which closed last year.

There are nine awards across six categories:

  • Best data-driven reporting (small and large newsrooms)
  • Best visualisation (small and large newsrooms)
  • Innovation (small and large newsrooms)
  • Young journalist
  • Open data; and
  • Best news application

Aside from a trophy, up to two people from each winning project will receive an all-expenses-covered trip to the International Journalism Festival in Perugia on 1–5 April 2020 where the awards will be celebrated.

The organisers hope that winners will “participate in and lead data journalism panels, discussions and workshops” at the festival.

Entries to the competition are open until 3 February 2020 at 11:59 pm ET via an online form.

Teaching data journalism in developing countries: lessons from ODECA

Eva Constantaras

Eva Constantaras

Eva Constantaras is a data journalist and trainer who recently wrote the Data Journalism Manual for the UN Development Program. In a special guest post she talks about the background to the manual, her experiences in working with journalists and professors who want to introduce data journalism techniques in developing nations, and why the biggest challenges not technological, but cultural.

Over the last few years, there has been a significant shift in global experiments in data journalism education away from short term activities like boot camps and hackathons to more sustained and sustainable interventions including fellowships and institutes.

There is a growing awareness that the challenge of teaching data journalism in many countries is split straight down the middle between teaching data and teaching journalism — where neither data science nor public interest journalism are particularly common. Open data can be a boon to democracy — but only if there are professionals capable and motivated to transform that data into information for the public. Continue reading

That massive open online course on data journalism now has a start date

In case you haven’t seen the tweets and blog posts, that MOOC on data journalism I’m involved in has a start date: May 19.

The launch was delayed a little due to the amount of people who signed up – which I think was a sensible decision.

You can watch the introduction video above, or ‘meet the instructors’ below. Looking forward to this…

That free online data journalism course I’m involved in

I’m happy to announce that I’ll be part of the delivery team for a free data journalism course online early next year that is being hosted by The European Journalism Centre. Continue reading

Let’s explode the myth that data journalism is ‘resource intensive’

"Data Journalism is very time consuming, needs experts, is hard to do with shrinking news rooms" Eva Linsinger, Profil

Is data journalism ‘time consuming’ or ‘resource intensive’? The excuse – and I think it is an excuse – seems to come up at an increasing number of events whenever data journalism is discussed. “It’s OK for the New York Times/Guardian/BBC,” goes the argument. “But how can our small team justify the resources – especially in a time of cutbacks?

The idea that data journalism inherently requires extra resources is flawed – but understandable. Spectacular interactives, large scale datasets and investigative projects are the headliners of data journalism’s recent history. We have oohed and aahed over what has been achieved by programmer-journalists and data sleuths…

But that’s not all there is.

Continue reading

When data goes bad

Bad data on sex trafficking: flow chart

Image by Lauren York on the Data Journalism Blog

Data is so central to the decision-making that shapes our countries, jobs and even personal lives that an increasing amount of data journalism involves scrutinising the problems with the very data itself. Here’s an illustrative list of when bad data becomes the story – and the lessons they can teach data journalists:

Deaths in police custody unrecorded

This investigation by the Bureau of Investigative Journalism demonstrates an important question to ask about data: who decides what gets recorded?

In this case, the BIJ identified “a number of cases not included in the official tally of 16 ‘restraint-related’ deaths in the decade to 2009 … Some cases were not included because the person has not been officially arrested or detained.” Continue reading

When data goes bad

Incorrect-statistics

Image by Lauren York

Data is so central to the decision-making that shapes our countries, jobs and even personal lives that an increasing amount of data journalism involves scrutinising the problems with the very data itself. Here’s an illustrative list of when bad data becomes the story – and the lessons they can teach data journalists:

Deaths in police custody unrecorded

This investigation by the Bureau of Investigative Journalism demonstrates an important question to ask about data: who decides what gets recorded?

In this case, the BIJ identified “a number of cases not included in the official tally of 16 ‘restraint-related’ deaths in the decade to 2009 … Some cases were not included because the person has not been officially arrested or detained.”

As they explain:

“It turns out the IPCC has a very tight definition of ‘in custody’ –  defined only as when someone has been formally arrested or detained under the mental health act. This does not include people who have died after being in contact with the police.

“There are in fact two lists. The one which includes the widely quoted list of sixteen deaths in custody only records the cases where the person has been arrested or detained under the mental health act. So, an individual who comes into contact with the police – is never arrested or detained – but nonetheless dies after being restrained, is not included in the figures.

“… But even using the IPCC’s tightly drawn definition, the Bureau has identified cases that are still missing.”

Cross-checking the official statistics against wider reports was key technique. As was using the Freedom of Information Act to request the details behind them and the details of those “ who died in circumstances where restraint was used but was not necessarily a direct cause of death”.

Cooking the books on drug-related murders

Drug related murders in Mexico
Cross-checking statistics against reports was also used in this investigation by Diego Valle-Jones into Mexican drug deaths:

“The Acteal massacre committed by paramilitary units with government backing against 45 Tzotzil Indians is missing from the vital statistics database. According to the INEGI there were only 2 deaths during December 1997 in the municipality of Chenalho, where the massacre occurred. What a silly way to avoid recording homicides! Now it is just a question of which data is less corrupt.”

Diego also used the Benford’s Law technique to identify potentially fraudulent data, which was also used to highlight relationships between dodgy company data and real world events such as the dotcom bubble and deregulation.

Poor records mean no checks

Detective Inspector Philip Shakesheff exposed a “gap between [local authority] records and police data”, reported The Sunday Times in a story headlined ‘Care home loses child 130 times‘:

“The true scale of the problem was revealed after a check of records on police computers. For every child officially recorded by local authorities as missing in 2010, another seven were unaccounted for without their absence being noted.”

Why is it important?

“The number who go missing is one of the indicators on which Ofsted judges how well children’s homes are performing and the homes have a legal duty to keep accurate records.

“However, there is evidence some homes are failing to do so. In one case, Ofsted gave a good report to a private children’s home in Worcestershire when police records showed 1,630 missing person reports in five years. Police stationed an officer at the home and pressed Ofsted to look closer. The home was downgraded to inadequate and it later closed.

“The risks of being missing from care are demonstrated by Zoe Thomsett, 17, who was Westminster council’s responsibility. It sent her to a care home in Herefordshire, where she went missing several times, the final time for three days. She had earlier been found at an address in Hereford, but because no record was kept, nobody checked the address. She died there of a drugs overdose.

“The troubled life of Dane Edgar, 14, ended with a drugs overdose at a friend’s house after he repeatedly went missing from a children’s home in Northumberland. Another 14-year-old, James Jordan, was killed when he absconded from care and was the passenger in a stolen car.”

Interests not registered

When there are no formal checks on declarations of interest, how can we rely on it? In Chile, the Ciudadano Inteligente Fundaciondecided to check the Chilean MPs’ register of assets and interests by building a database:

“No-one was analysing this data, so it was incomplete,” explained Felipe Heusser, executive president of the Fundacion. “We used technology to build a database, using a wide range of open data and mapped all the MPs’ interests. From that, we found that nearly 40% of MPs were not disclosing their assets fully.”

The organisation has now launched a database that “enables members of the public to find potential conflicts of interest by analysing the data disclosed through the members’ register of assets.”

Data laundering

Tony Hirst’s post about how dodgy data was “laundered” by Facebook in a consultants report is a good illustration of the need to ‘follow the data’.

We have some dodgy evidence, about which we’re biased, so we give it to an “independent” consultant who re-reports it, albeit with caveats, that we can then report, minus the caveats. Lovely, clean evidence. Our lobbyists can then go to a lazy policy researcher and take this scrubbed evidence, referencing it as finding in the Deloitte report, so that it can make its way into a policy briefing.”

“Things just don’t add up”

In the video below Ellen Miller of the Sunlight Foundation takes the US government to task over the inconsistencies in its transparency agenda, and the flawed data published on its USAspending.gov – so flawed that they launched the Clearspending website to automate and highlight the discrepancy between two sources of the same data:

Key budget decisions made on useless data

Sometimes data might appear to tell an astonishing story, but this turns out to be a mistake – and that mistake itself leads you to something much more newsworthy, as Channel 4′s FactCheck foundwhen it started trying to find out if councils had been cutting spending on Sure Start children’s centres:

“That ought to be fairly straightforward, as all councils by law have to fill in something called a Section 251 workbook detailing how much they are spending on various services for young people.

“… Brent Council in north London appeared to have slashed its funding by nearly 90 per cent, something that seemed strange, as we hadn’t heard an outcry from local parents.

“The council swiftly admitted making an accounting error – to the tune of a staggering £6m.”

And they weren’t the only ones. In fact, the Department for Education  admitted the numbers were “not very accurate”:

“So to recap, these spending figures don’t actually reflect the real amount of money spent; figures from different councils are not comparable with each other; spending in one year can’t be compared usefully with other years; and the government doesn’t propose to audit the figures or correct them when they’re wrong.”

This was particularly important because the S251 form “is the document the government uses to reallocate funding from council-run schools to its flagship academies.”:

“The Local Government Association (LGA) says less than £250m should be swiped from council budgets and given to academies, while the government wants to cut more than £1bn, prompting accusations that it is overfunding its favoured schools to the detriment of thousands of other children.

“Many councils’ complaints, made plain in responses to an ongoing government consultation, hinge on DfE’s use of S251, a document it has variously described as “unaudited”, “flawed” and”not fit for purpose”.

No data is still a story

Sticking with education, the TES reports on the outcome of an FOI request on the experience of Ofsted inspectors:

“[Stephen] Ball submitted a Freedom of Information request, asking how many HMIs had experience of being a secondary head, and how many of those had led an outstanding school. The answer? Ofsted “does not hold the details”.

““Secondary heads and academy principals need to be reassured that their work is judged by people who understand its complexity,” Mr Ball said. “Training as a good head of department or a primary school leader on the framework is no longer adequate. Secondary heads don’t fear judgement, but they expect to be judged by people who have experience as well as a theoretical training. After all, a working knowledge of the highway code doesn’t qualify you to become a driving examiner.”

“… Sir Michael Wilshaw, Ofsted’s new chief inspector, has already argued publicly that raw data are a key factor in assessing a school’s performance. By not providing the facts to back up its boasts about the expertise of its inspectors, many heads will remain sceptical of the watchdog’s claims.”

Men aren’t as tall as they say they are

To round off, here’s a quirky piece of data journalism by dating site OkCupid, which looked at the height of its members and found an interesting pattern:

Male height distribution on OKCupid

“The male heights on OkCupid very nearly follow the expected normal distribution—except the whole thing is shifted to the right of where it should be.

“Almost universally guys like to add a couple inches. You can also see a more subtle vanity at work: starting at roughly 5′ 8″, the top of the dotted curve tilts even further rightward. This means that guys as they get closer to six feet round up a bit more than usual, stretching for that coveted psychological benchmark.”

Do you know of any other examples of bad data forming the basis of a story? Please post a comment – I’m collecting examples.

UPDATE (April 20 2012): A useful addition from Simon Rogers: Named and shamed: the worst government annual reports explains why government department spending reports fail to support the Government’s claimed desire for an “army of armchair auditors”, with a list of the worst offenders at the end.

Also:

Comparing apples and oranges in data journalism: a case study

A must-read for any data journalist, aspiring or otherwise, is Simon Rogers’ post on The Guardian Datablog where he compares public and private sector pay.

This is a classic apples-and-oranges situation where politicians and government bodies are comparing two things that, really, are very different. Is a private school teacher really comparable to someone teaching in an unpopular school? What is the private sector equivalent of a director of public health or a social worker?

But if these issues are being discussed, journalists must try to shed some light, and Simon Rogers does a great job in unpicking the comparisons. From pay and hours worked, to qualifications and age (big differences in both), and gender and pay inequality (more women in the public sector, more lower- and higher-paid workers in the private sector), Rogers crunches all the numbers: Continue reading

FAQ: How can broadcasters benefit from online communities?

Here’s another set of questions I’m answering in public in case anyone wants to ask the same:

How can broadcasters benefit from online communities?

Online communities contain many individuals who will be able to contribute different kinds of value to news production. Most obviously, expertise, opinion, and eyewitness testimony. In addition, they will be able to more effectively distribute parts of a story to ensure that it reaches the right experts, opinion-formers and eyewitnesses. The difference from an audience is that a community tends to be specialised, and connected to each other.

If you rephrase the question as ‘How can broadcasters benefit from people?’ it may be clearer.

How does a broadcaster begin to develop an engaged online community, any tips?

Over time. Rather than asking about how you develop an online community ask yourself instead: how do you begin to develop relationships? Waiting until a major news event happens is a bad strategy: it’s like waiting until someone has won the lottery to decide that you’re suddenly their friend.

Journalists who do this well do a little bit every so often – following people in their field, replying to questions on social networks, contributing to forums and commenting on blogs, and publishing blog posts which are helpful to members of that community rather than simply being about ‘the story’ (for instance, ‘Why’ and ‘How’ questions behind the news).

In case you are aware of networks in the middle east, do you think they are tapping into online communities and social media adequately?

I don’t know the networks well enough to comment – but I do think it’s hard for corporations to tap into communities; it works much better at an individual reporter level.

Can you mention any models whether it is news channels or entertainment television which have developed successful online communities, why do they work?

The most successful examples tend to be newspapers: I think Paul Lewis at The Guardian has done this extremely successfully, and I think Simon Rogers’ Data Blog has also developed a healthy community around data and visualisation. Both of these are probably due in part to the work of Meg Pickard there around community in general.

The BBC’s UGC unit is a good example from broadcasting – although that is less about developing a community as about providing platforms for others to contribute, and a way for journalists to quickly find expertise in those communities. More specifically, Robert Peston and Rory Cellan-Jones use their blogs and Twitter accounts well to connect with people in their fields.

Then of course there’s Andy Carvin at NPR, who is an exemplar of how to do it in radio. There’s so much written about what he does that I won’t repeat it here.

What are the reasons that certain broadcasters cannot connect successfully with online communities?

I expect a significant factor is regulation which requires objectivity from broadcasters but not from newspapers. If you can’t express an opinion then it is difficult to build relationships, and if you are more firmly regulated (which broadcasting is) then you take fewer risks.

Also, there are more intermediaries in broadcasting and fewer reporters who are public-facing, which for some journalists in broadcasting makes the prospect of speaking directly to the former audience that much more intimidating.

FAQ: Data journalism, laziness, information overload & localism

I seem to have lost the habit of publishing interview responses here under the FAQ category for the past year, but the following questions from a journalist, and my answers, were worth publishing in case anyone has the same questions:

Simon Rogers, Editor of the Datablog, said that he thinks in the future simply publishing the raw data will become acceptable journalism. Do you not think that an approach like this to raw data is lazy journalism? And equally, do you think that would be a type of journalism that the public will really be able to engage with?

It’s not lazy at all, and to think otherwise is pure journalistic egoism. We have a tendency to undervalue things because we haven’t invested our own effort into it, but the value lies in its usefulness, not in the effort. Increasingly I think being a journalist will be as much about making journalism possible for other people as it will be about creating that journalism yourself. You have to ask yourself: do I just want to write pretty stories, or allow people to hold power to account?

In a world where we can access information directly I think it’s a central function of journalists to make important information findable. The first level of that is to publish raw data.

It’s interesting to see that this seems to be a key principle for hyperlocal bloggers – making civic information findable.

The second level – if you have the time and resources – is then to analyse that raw data and pull stories out of it. But ultimately there will always be other ‘stories’ in the information that people want to find for themselves, which may be too specific to be of interest to the journalist or publisher.

The third level – which really requires a lot of investment – is to create tools that make it easier for the user to find what they want, to make it easier to understand (e.g. through visualisation), and to share it with others.

Do you think that alot of the information can be quite overwhelming and sometimes not go anywhere?

Of course, but that isn’t a reason for not publishing the information. It’s natural that when the information is released some of it will attract more attention than other parts – but also, if other questions come up in future there is a dataset that people can go back and interrogate even if they didn’t at the time.

At the moment we have a lot of data but very few tools to interrogate that. That’s going to change – just in the last 6 months we’ve seen some fantastic new tools for filtering data, and the momentum is building in this area. It’s notable how many of the bids for the Knight News Challenge were data-related.

Additionally, do you tihnk The Guardian continue to pursue stories from the masses of data as consistently as they have done in previous years?

Yes, I think the Guardian has now built a reputation in this field and will want to maintain that, not to mention the fact that its reputation means it will attract more and more data-related stories, and benefit from the work of people outside the organisation who are interrogating data. They’ll also get better and better as they learn from experience.

And why do you think that smaller news resources struggle to use this sort of information as a source for news?

Partly because data has historically been more national than local. Even now I get frustrated when I find a dataset but then discover it’s only broken down into England, Wales, Scotland and Northern Ireland. But we are now finally getting more and more local data.

Also, at a local level journalists tend to be less specialised. On a national you might have a health or environment or financial reporter who is more used to dealing with figures and data. On a local newspaper that’s less likely – and there’s a high turnover of staff because of the low wages.