Monthly Archives: December 2010

Wikileaks – a documentary

Here’s a well-produced (even in rough-cut form) documentary on Wikileaks by Swedish network SVT, published on YouTube in 4 parts. It covers quite a bit of the history of the organisation, the lessons it learned and the partnerships it made along the way – all of which provide valuable insights for any student of journalism as a practice or a cultural form, not to mention a more complex understanding than most coverage of the current situation provides. It really is essential viewing.

The photographer’s role in the age of citizen journalism: grab the guy filming on his mobile

The Guardian reports on the AP photographer whose image dominated the front pages today. The following passage on how he returned to his office with a member of the public who had filmed it on his mobile phone passes by without remark:

“The adrenaline was running by now. So I turned [the flash] on and took five pictures. I realised they were important and I saw another guy shooting video on his phone.

“So I got him into a taxi and we went back to AP’s offices in Camden.”

Worth noting.

One ambassador’s embarrassment is a tragedy, 15,000 civilian deaths is a statistic

Few things illustrate the challenges facing journalism in the age of ‘Big Data’ better than Cable Gate – and specifically, how you engage people with stories that involve large sets of data.

The Cable Gate leaks have been of a different order to the Afghanistan and Iraq war logs. Not in number (there were 90,000 documents in the Afghanistan war logs and over 390,000 in the Iraq logs; the Cable Gate documents number around 250,000) – but in subject matter.

Why is it that the 15,000 extra civilian deaths estimated to have been revealed by the Iraq war logs did not move the US authorities to shut down Wikileaks’ hosting and PayPal accounts? Why did it not dominate the news agenda in quite the same way?

Tragedy or statistic?

I once heard a journalist trying to put the number ‘£13 billion’ into context by saying: “imagine 13 million people paying £1,000 more per year” – as if imagining 13 million people was somehow easier than imagining £13bn. Comparing numbers to the size of Wales or the prime minister’s salary is hardly any better.

Generally misattributed to Stalin, the quote “The death of one man is a tragedy, the death of millions is a statistic” illustrates the problem particularly well: when you move beyond scales we can deal with on a human level, you struggle to engage people in the issue you are covering.

Research suggests this is a problem that not only affects journalism, but justice as well. In October Ben Goldacre wrote about a study that suggested “People who harm larger numbers of people get significantly lower punitive damages than people who harm a smaller number. Courts punish people less harshly when they harm more people.”

“Out of a maximum sentence of 10 years, people who read the three-victim story recommended an average prison term one year longer than the 30-victim readers. Another study, in which a food processing company knowingly poisoned customers to avoid bankruptcy, gave similar results.”

In the US “scoreboard reporting” on gun crime – “represented by numbing headlines like, “82 shot, 14 fatally.”” – has been criticised for similar reasons:

“”As long as we have reporting that gives the impression to everyone that poor, black folks in these communities don’t value life, it just adds to their sense of isolation,” says Stephen Franklin, the community media project director at the McCormick Foundation-funded Community Media Workshop, where he led the “We Are Not Alone” campaign to promote stories about solution-based anti-violence efforts.

“Natalie Moore, the South Side Bureau reporter for the Chicago Public Radio, asks: “What do we want people to know? Are we just trying to tell them to avoid the neighborhoods with many homicides?” Moore asks. “I’m personally struggling with it. I don’t know what the purpose is.””

Salience

This is where journalists play a particularly important role. Kevin Marsh, writing about Wikileaks on Sunday, argues that

“Whistleblowing that lacks salience does nothing to serve the public interest – if we mean capturing the public’s attention to nurture its discourse in a way that has the potential to change something material. “

He is right. But Charlie Beckett, in the comments to that post, points out that Wikileaks is not operating in isolation:

“Wikileaks is now part of a networked journalism where they are in effect, a kind of news-wire for traditional newsrooms like the New York Times, Guardian and El Pais. I think that delivers a high degree of what you call salience.”

This is because last year Wikileaks realised that they would have much more impact working in partnership with news organisations than releasing leaked documents to the world en masse. It was a massive move for Wikileaks, because it meant re-assessing a core principle of openness to all, and taking on a more editorial role. But it was an intelligent move – and undoubtedly effective. The Guardian, Der Spiegel, New York Times and now El Pais and Le Monde have all added salience to the leaks. But could they have done more?

Visualisation through personalisation and humanisation

In my series of posts on data journalism I identified visualisation as one of four interrelated stages in its production. I think that this concept needs to be broadened to include visualisation through case studies: or humanisation, to put it more succinctly.

There are dangers here, of course. Firstly, that humanising a story makes it appear to be an exception (one person’s tragedy) rather than the rule (thousands suffering) – or simply emotive rather than also informative; and secondly, that your selection of case studies does not reflect the more complex reality.

Ben Goldacre – again – explores this issue particularly well:

“Avastin extends survival from 19.9 months to 21.3 months, which is about 6 weeks. Some people might benefit more, some less. For some, Avastin might even shorten their life, and they would have been better off without it (and without its additional side effects, on top of their other chemotherapy). But overall, on average, when added to all the other treatments, Avastin extends survival from 19.9 months to 21.3 months.

“The Daily Mail, the ExpressSky News, the Press Association and the Guardian all described these figures, and then illustrated their stories about Avastin with an anecdote: the case of Barbara Moss. She was diagnosed with bowel cancer in 2006, had all the normal treatment, but also paid out of her own pocket to have Avastin on top of that. She is alive today, four years later.

“Barbara Moss is very lucky indeed, but her anecdote is in no sense whatsoever representative of what happens when you take Avastin, nor is it informative. She is useful journalistically, in the sense that people help to tell stories, but her anecdotal experience is actively misleading, because it doesn’t tell the story of what happens to people on Avastin: instead, it tells a completely different story, and arguably a more memorable one – now embedded in the minds of millions of people – that Roche’s £21,000 product Avastin makes you survive for half a decade.”

Broadcast journalism – with its regulatory requirement for impartiality, often interpreted in practical terms as ‘balance’ – is particularly vulnerable to this. Here’s one example of how the homeopathy debate is given over to one person’s experience for the sake of balance:

Journalism on an industrial scale

The Wikileaks stories are journalism on an industrial scale. The closest equivalent I can think of was the MPs’ expenses story which dominated the news agenda for 6 weeks. Cable Gate is already on Day 9 and the wealth of stories has even justified a live blog.

With this scale comes a further problem: cynicism and passivity; Cable Gate fatigue. In this context online journalism has a unique role to play which was barely possible previously: empowerment.

3 years ago I wrote about 5 Ws and a H that should come after every news story. The ‘How’ and ‘Why’ of that are possibilities that many news organisations have still barely explored. ‘Why should I care?’ is about a further dimension of visualisation: personalisation – relating information directly to me. The Guardian moves closer to this with its searchable database, but I wonder at what point processing power, tools, and user data will allow us to do this sort of thing more effectively.

‘How can I make a difference?’ is about pointing users to tools – or creating them ourselves – where they can move the story on by communicating with others, campaigning, voting, and so on. This is a role many journalists may be uncomfortable with because it raises advocacy issues, but then choosing to report on these stories, and how to report them, raises the same issues; linking to a range of online tools need not be any different. These are issues we should be exploring, ethically.

All the above in one sentence

Somehow I’ve ended up writing over a thousand words on this issue, so it’s worth summing it all up in a sentence.

Industrial scale journalism using ‘big data’ in a networked age raises new problems and new opportunities: we need to humanise and personalise big datasets in a way that does not detract from the complexity or scale of the issues being addressed; and we need to think about what happens after someone reads a story online and whether online publishers have a role in that.

Facebook, cartoon avatars, “paedos” and SEO as a public service

A few days ago status updates like this were doing the rounds on Facebook:

“Change your facebook profile picture to a cartoon from your childhood and invite your friends to do the same. Until Monday (December 6), there should be no human faces on facebook, but a stash of memories. This is for eliminating violence against children.”

Of course it is. Or maybe not. Today, the rumour changed poles:

“This cartoon thing has been set up by paedos using A registered charities name to entice kids. apparently on the 6th dec you will be kicked off fb if u have cartoon pics. The more folk that… put up cartoon pics the harder it is fo…r the police to catch these sickos!!”

There doesn’t appear to be any truth in the latter rumour. Internet hoax library Snopes has a similar hoax listed, and this seems to be variant of it. ThatsNonsense.com also covers the hoax.

SEO as a public service

Hoax updates do the rounds on social networks and text messages on a semi-regular basis. Remember the one about children being kidnapped in supermarket toilets? Or how about police banning English flags in pubs for fear of offending people?

In both cases the mainstream media was slow to react to the rumours. A Google search – which would be a typical reaction of anyone receiving such a message – would bring up nothing to counter those rumours. (Notably, perhaps because of its public and real-time nature, Twitter seems better at quashing hoaxes).

Search engine optimisation (SEO) is much derided for a perception that it leads news organisations to write for machines, or to aim for the lowest common denominator. But SEO has a very valuable role in serving the public: if searches on a particular rumour shoot up, or mentions of it increase on social networks, it’s worth verifying and getting up the facts quickly.

This is another reason why journalists should be on social networks, and why publishers should be monitoring them more broadly. Whether your motivations are civic, or commercial, it makes sense both ways.

Of course, on the other hand you could always recycle urban myths about councils banning Christmas

PS: If you need any tips on methods and tools, see my Delicious bookmarks for verification.

(h/t to Conrad Quilty-Harper)

FAQ: Data journalism, laziness, information overload & localism

I seem to have lost the habit of publishing interview responses here under the FAQ category for the past year, but the following questions from a journalist, and my answers, were worth publishing in case anyone has the same questions:

Simon Rogers, Editor of the Datablog, said that he thinks in the future simply publishing the raw data will become acceptable journalism. Do you not think that an approach like this to raw data is lazy journalism? And equally, do you think that would be a type of journalism that the public will really be able to engage with?

It’s not lazy at all, and to think otherwise is pure journalistic egoism. We have a tendency to undervalue things because we haven’t invested our own effort into it, but the value lies in its usefulness, not in the effort. Increasingly I think being a journalist will be as much about making journalism possible for other people as it will be about creating that journalism yourself. You have to ask yourself: do I just want to write pretty stories, or allow people to hold power to account?

In a world where we can access information directly I think it’s a central function of journalists to make important information findable. The first level of that is to publish raw data.

It’s interesting to see that this seems to be a key principle for hyperlocal bloggers – making civic information findable.

The second level – if you have the time and resources – is then to analyse that raw data and pull stories out of it. But ultimately there will always be other ‘stories’ in the information that people want to find for themselves, which may be too specific to be of interest to the journalist or publisher.

The third level – which really requires a lot of investment – is to create tools that make it easier for the user to find what they want, to make it easier to understand (e.g. through visualisation), and to share it with others.

Do you think that alot of the information can be quite overwhelming and sometimes not go anywhere?

Of course, but that isn’t a reason for not publishing the information. It’s natural that when the information is released some of it will attract more attention than other parts – but also, if other questions come up in future there is a dataset that people can go back and interrogate even if they didn’t at the time.

At the moment we have a lot of data but very few tools to interrogate that. That’s going to change – just in the last 6 months we’ve seen some fantastic new tools for filtering data, and the momentum is building in this area. It’s notable how many of the bids for the Knight News Challenge were data-related.

Additionally, do you tihnk The Guardian continue to pursue stories from the masses of data as consistently as they have done in previous years?

Yes, I think the Guardian has now built a reputation in this field and will want to maintain that, not to mention the fact that its reputation means it will attract more and more data-related stories, and benefit from the work of people outside the organisation who are interrogating data. They’ll also get better and better as they learn from experience.

And why do you think that smaller news resources struggle to use this sort of information as a source for news?

Partly because data has historically been more national than local. Even now I get frustrated when I find a dataset but then discover it’s only broken down into England, Wales, Scotland and Northern Ireland. But we are now finally getting more and more local data.

Also, at a local level journalists tend to be less specialised. On a national you might have a health or environment or financial reporter who is more used to dealing with figures and data. On a local newspaper that’s less likely – and there’s a high turnover of staff because of the low wages.

Online journalists left out in the cold by local government

Hedy Korbee is a journalist with 29 years’ experience in broadcasting. She has worked for the Canadian Broadcasting Corporation, Global TV, and CTV, among others. In September she moved to Birmingham to study the MA in Online Journalism that I teach, and decided to launch a website covering the biggest story of the year: the budget cuts.

Her experiences of local government here – and of local journalism – have left her incredulous. Since arriving Hedy has attended every council meeting – she notes that reporters from the BBC and ITV regional news do not attend. Her attempts to get responses to stories from elected officials have been met with stonewalling and silence.

This week – after 7 weeks of frustration – she discovered that the council had called a news briefing about their business plan for consultation with the public on how to cut £300 million in spending – and failed to tell her about it, despite the fact that she had repeatedly requested to be kept informed, and was even stood outside the council offices while it was taking place (and asked directly why TV crews were being waved in):

“At first, [the head of news] told me that it wasn’t a news conference but “a small briefing of regional journalists that we know”. [She] described them as five people, “local, traditional journalists” who were on her “automatic invite list”.  She said they were journalists that the press office has been talking to about all aspects of the budget cuts and have “an understanding of the threads of these stories”.

“She also said they were journalists who have talked to Stephen Hughes before and “know where he is coming from”.”

Hedy’s experience isn’t an isolated case. Hyperlocal bloggers frequently complain of being discriminated against by local government officers, being ignored, refused information or left to catch up on stories after council-friendly local newspapers are leaked leads. The most striking example of this was when Ventnor Blog’s Simon Perry was refused access to Newport coroner’s court as either a member of the press or a member of the public. (UPDATE: A further example is provided by this ‘investigation’ into one blogger’s right to film council committee meetings)

On the other side are press offices like Walsall’s, which appear to recognise that the way that blogs use social media allow the council to communicate with larger, more distributed, and different audiences than their print counterparts.

The issues for balanced reporting and public accountability are well illustrated by Hedy’s experience of calling the press office seeking a quote for a story:

“[I] was told that Birmingham councillors are “important people”  (I don’t know what that implies about “the public’s right to know”) and was told to simply write no comment.  The refusal by the press office to deal with us has made it exceedingly difficult to cover all sides of the story on our website.”

In contrast Hedy details her experiences in Canada:

“City Council meetings are considered a valuable source of news and attended by most of the local media and not just two print reporters, as they are in Birmingham.  Interested citizens show up in the gallery to watch.  Council meetings are broadcast live and journalists who can’t attend can watch the proceedings on television along with the general public.

“It is acceptable behaviour to walk up to a politician with your camera rolling and start asking questions which the politician will then answer.  If politicians are reluctant to answer questions they are often “scrummed” and wind up answering anyway.

“When major budget announcements are made by the federal government, politicians at every other level of government, as well as interest groups, hold news conferences to provide reaction.  Quite often, they go to the legislative chamber where the announcement is being made to make themselves more readily available to journalists (and, of course, to spin).”

Have you experienced similar problems as a journalist? Which local authorities deal well with the online media? I’d welcome your comments.

UPDATE: A response from Birmingham City Council comes via email:

“A Birmingham City Council spokesperson said: “We have proven that Birmingham City Council takes blogging and citizen journalism seriously through the launch of the award-winning http://www.birminghamnewsroom.com online press office.””

UPDATE 2 (Dec 16 2010): Sarah Hartley writes on the same problem, quoting some of the above incidents and others, and suggesting press offices confuse size with reach:

“Let the recently published London Online Neighbourhood Networks study enter the debate. It asked users of the citizen-run websites to identify what they regarded as their main source of local news. The result: 63% of respondents identified their local site as their main source.”

UPDATE 3 (Feb 23 2011): Guidance from the Local Government Secretary says that councils should give bloggers the same access as traditional media.

On Google, all publicity is no longer good publicity

TechCrunch reports on Google’s decision to tweak its algorithm in response to an online shop which found its Google ranking was boosted when dozens of people complained about it.

The owner of the shop – DecorMyEyes.com – had boasted on consumer complaint forum GetSatisfaction that:

“The more replies you people post, the more business and the more hits and sales I get. My goal is NEGATIVE advertisement.”

A lengthy New York Times article piece on the issue continues:

“It’s all part of a sales strategy, he said. Online chatter about DecorMyEyes, even furious online chatter, pushed the site higher in Google search results, which led to greater sales.

“… Not only has this heap of grievances failed to deter DecorMyEyes, but as [one consumer’s] all-too-cursory Google search demonstrated, the company can show up in the most coveted place on the Internet’s most powerful site.”

The NYT spoke to the owner, Vitaly Borker, who openly admits “I’ve exploited this opportunity because it works.”

“No matter where they post their negative comments, it helps my return on investment. So I decided, why not use that negativity to my advantage?”

Later in the article, after the reporter has doorstepped Borker, he says he ‘stumbled upon the upside of rudeness by accident’:

“”I stopped caring,” he says, and for that he blames customers. They lied and changed their minds in ways that cost him money, he says, and at some point he started telling them off in the bluntest of terms. To his amazement, this seemed to better his standing in certain Google searches, which brought in more sales.

“Before this discovery, he’d hired a search optimization company to burnish his site’s reputation by writing positive things about DecorMyEyes online. Odious behavior, he realized, worked much better, and it didn’t cost him a penny.”

In their blog post on the change Google says:

“We developed an algorithmic solution which detects the merchant … along with hundreds of other merchants that, in our opinion, provide an extremely poor user experience.”

For obvious reasons they don’t give details of the solution.

Visualising data with the Datapress WordPress plugin

{{Exhibit}} {{Footnotes}}

Here’s a useful plugin for bloggers working with data: Datapress allows you to quickly visualise a dataset as a table, timeline, scatter plot, bar chart, ‘intelligent list’ (allowing you to sort by more than one value at once – see this example) or map.

Once installed, the plugin adds a new button to the ‘Upload/Insert’ row in the post edit view which you can click to link to a dataset in the same way as you would embed an image or video.

The plugin is in beta at the moment and takes a bit of getting used to. There’s a convention you have to follow in naming Google spreadsheet columns, for example – this Glasgow Vegan Guide spreadsheet has quite a few of them – but this could add some new visualisation possibilities. It seems particularly nice for lists and maps (if you have lat-long values), although Google spreadsheet’s built-in charts options will obviously be quicker for simple graphs and charts.

UPDATE: I’ve also just learned that the large empty space below the table can be fixed under the ‘Configure Display’ tab in the editing view.

The plugin has a demo site with some impressive examples and the developers are happy to help with any problems. It’s also up for the Knight News Challenge if you want to support it.

Data journalism training – some reflections

OpenHeatMap - Percentage increase in fraud crimes in London since 2006_7

I recently spent 2 days teaching the basics of data journalism to trainee journalists on a broadsheet newspaper. It’s a pretty intensive course that follows a path I’ve explored here previously – from finding data and interrogating it to visualizing it and mashing – and I wanted to record the results.

My approach was both practical and conceptual. Conceptually, the trainees need to be able to understand and communicate with people from other disciplines, such as designers putting together an infographic, or programmers, statisticians and researchers.

They need to know what semantic data is, what APIs are, the difference between a database and open data, and what is possible with all of the above.

They need to know what design techniques make a visualisation clear, and the statistical quirks that need to be considered – or looked for.

But they also need to be able to do it.

The importance of editorial drive

The first thing I ask them to do (after a broad introduction) is come up with a journalistic hypothesis they want to test (a process taken from Mark E Hunter’s excellent ebook Story Based Inquiry). My experience is that you learn more about data journalism by tackling a specific problem or question – not just the trainees but, in trying to tackle other people’s problems, me as well.

So one trainee wants to look at the differences between supporters of David and Ed Miliband in that week’s Labour leadership contest. Another wants to look at authorization of armed operations by a police force (the result of an FOI request following up on the Raoul Moat story). A third wants to look at whether ethnic minorities are being laid off more quickly, while others investigate identity fraud, ASBOs and suicides.

Taking those as a starting point, then, I introduce them to some basic computer assisted reporting skills and sources of data. They quickly assemble some relevant datasets – and the context they need to make sense of them.

For the first time I have to use Open Office’s spreadsheet software, which turns out to be not too bad. The data pilot tool is a worthy free alternative to Excel’s pivot tables, allowing journalists to quickly aggregate & interrogate a large dataset.

Formulae like concatenate and ISNA turn out to be particularly useful in cleaning up data or making it compatible with similar datasets.

The ‘Text to columns’ function comes in handy in breaking up full names into title, forename and surname (or addresses into constituent parts), while find and replace helped in removing redundant information.

It’s not long before the journalists raise statistical issues – which is reassuring. The trainee looking into ethnic minority unemployment, for example, finds some large increases – but the numbers in those ethnicities are so small as to undermine the significance.

Scraping the surface of statistics

Still, I put them through an afternoon of statistical training. Notably, not one of them has studied a maths or science-related degree. History, English and Law dominate – and their educational history is pretty uniform. At a time when newsrooms need diversity to adapt to change, this is a little worrying.

But they can tell a mean from a mode, and deal well with percentages, which means we can move on quickly to standard deviations, distribution, statistical significance and regression analysis.

Even so, I feel like we’ve barely scraped the surface – and that there should be ways to make this more relevant in actively finding stories. (Indeed, a fortnight later I come across a great example of using Benford’s law to highlight problems with police reporting of drug-related murder)

One thing I do is ask one trainee to toss a coin 30 times and the others to place bets on the largest number of heads to fall in a row. Most plump for around 4 – but the longest run is 8 heads in a row.

The point I’m making is regarding small sample sizes and clusters. (With eerie coincidence, one of them has a map of Bridgend on her screen, which made the news after a cluster of suicides).

That’s about as engaging as this section got – so if you’ve any ideas for bringing statistical subjects to life and making them relevant to journalists, particularly as a practical tool for spotting stories, I’m all ears.

Visualisation – bringing data to life, quickly

Day 2 is rather more satisfying, as – after an overview of various chart types and their strengths and limitations – the trainees turn their hands to visualization tools – Many Eyes, Wordle, Tableau Public, Open Heat Map, and Mapalist.

Suddenly the data from the previous day comes to life. Fraud crime in London boroughs is shown on a handy heat map. A pie chart, and then bar chart, shows the breakdown of Labour leadership voters; and line graphs bring out new possible leads in suicide data (female suicide rates barely change in 5 years, while male rates fluctuate more).

It turns out that Mapalist – normally used for plotting points on Google Maps from a Google spreadsheet – now also does heat maps based on the density of occurrences. ManyEyes has also added mapping visualizations to its toolkit.

Looking through my Delicious bookmarks I rediscover a postcodes API with a hackable URL to generate CSV or XML files with the lat/long, ward and other data from any postcode (also useful on this front is Matthew Somerville’s project MaPit).

Still a print culture

Notably, the trainees bring up the dominance of print culture. “I can see how this works well online,” says one, “but our newsroom will want to see a print story.”

One of the effects of convergence on news production is that a tool traditionally left to designers after the journalist has finished their role in the production line is now used by the journalist as part of their newsgathering role – visualizing data to see the story within it, and possibly publishing that online to involve users in that process too.

A print news story – in this instance – may result from the visualization process, rather than the other way around.

More broadly, it’s another symptom of how news production is moving from a linear process involving division of labour to a flatter, more overlapping organization of processes and roles – which involves people outside of the organization as well as those within.

Mashups

The final session covers mashups. This is an opportunity to explore the broader possibilities of the technology, how APIs and semantic data fit in, and some basic tools and tutorials.

Clearly, a well-produced mashup requires more than half a day and a broader skillset than exists in journalists alone. But by using tools like Mapalist the trainees have actually already created a mashup. Again, like visualization, there is a sliding scale between quick and rough approaches to find stories and communicate them – and larger efforts that require a bigger investment of time and skill.

As the trainees are already engrossed in their own projects, I don’t distract them too much from that course.

You can see what some of the trainees produced at the links below:

Matt Holehouse:

Many Eyes _ Rate of deaths in industrial accidents in the EU (per 100k)

Rate of deaths in industrial accidents in the EU (per 100k)

Raf Sanchez:

Rosie Ensor

  • Places with the highest rates for ASBOs

Sarah Rainey