The database query language SQL pops up in all sorts of places when you’re working with data — especially big data — and can be a very useful way to query data in spreadsheets, APIs and coding. This video, made for students on the MA in Data Journalism at Birmingham City University, explains what SQL is, the different places you will come across it, and how to get started with SQL queries.
Three key terms you might hear used in data journalism circles are “open data“, “linked data” and “big data“. This video, made for students on the MA in Data Journalism at Birmingham City University, explores definitions of the three terms, explains some of the jargon used in relation to them, and the critical and ethical issues to consider in relation to open and big data in particular.
Three other video clips are mentioned in the video, and these are embedded below. First of all, Tim Berners-Lee‘s 2009 call for “raw data now”, where he outlined the potential of open and linked data…
How can data journalists make sense of such quantities of data and filter out what’s meaningful?
In the same way they always have. Journalists’ role has always been to make choices about which information to prioritise, what extra information they need, and what information to include in the story they communicate. Continue reading →
Few things illustrate the challenges facing journalism in the age of ‘Big Data’ better than Cable Gate – and specifically, how you engage people with stories that involve large sets of data.
The Cable Gate leaks have been of a different order to the Afghanistan and Iraq war logs. Not in number (there were 90,000 documents in the Afghanistan war logs and over 390,000 in the Iraq logs; the Cable Gate documents number around 250,000) – but in subject matter.
I once heard a journalist trying to put the number ‘£13 billion’ into context by saying: “imagine 13 million people paying £1,000 more per year” – as if imagining 13 million people was somehow easier than imagining £13bn. Comparing numbers to the size of Wales or the prime minister’s salary is hardly any better.
Generally misattributed to Stalin, the quote “The death of one man is a tragedy, the death of millions is a statistic” illustrates the problem particularly well: when you move beyond scales we can deal with on a human level, you struggle to engage people in the issue you are covering.
Research suggests this is a problem that not only affects journalism, but justice as well. In October Ben Goldacre wrote about a study that suggested “People who harm larger numbers of people get significantly lower punitive damages than people who harm a smaller number. Courts punish people less harshly when they harm more people.”
“Out of a maximum sentence of 10 years, people who read the three-victim story recommended an average prison term one year longer than the 30-victim readers. Another study, in which a food processing company knowingly poisoned customers to avoid bankruptcy, gave similar results.”
“”As long as we have reporting that gives the impression to everyone that poor, black folks in these communities don’t value life, it just adds to their sense of isolation,” says Stephen Franklin, the community media project director at the McCormick Foundation-funded Community Media Workshop, where he led the “We Are Not Alone” campaign to promote stories about solution-based anti-violence efforts.
“Natalie Moore, the South Side Bureau reporter for the Chicago Public Radio, asks: “What do we want people to know? Are we just trying to tell them to avoid the neighborhoods with many homicides?” Moore asks. “I’m personally struggling with it. I don’t know what the purpose is.””
“Whistleblowing that lacks salience does nothing to serve the public interest – if we mean capturing the public’s attention to nurture its discourse in a way that has the potential to change something material. “
He is right. But Charlie Beckett, in the comments to that post, points out that Wikileaks is not operating in isolation:
“Wikileaks is now part of a networked journalism where they are in effect, a kind of news-wire for traditional newsrooms like the New York Times, Guardian and El Pais. I think that delivers a high degree of what you call salience.”
This is because last year Wikileaks realised that they would have much more impact working in partnership with news organisations than releasing leaked documents to the world en masse. It was a massive move for Wikileaks, because it meant re-assessing a core principle of openness to all, and taking on a more editorial role. But it was an intelligent move – and undoubtedly effective. The Guardian, Der Spiegel, New York Times and now El Pais and Le Monde have all added salience to the leaks. But could they have done more?
Visualisation through personalisation and humanisation
In my series of posts on data journalism I identified visualisation as one of four interrelated stages in its production. I think that this concept needs to be broadened to include visualisation through case studies: or humanisation, to put it more succinctly.
There are dangers here, of course. Firstly, that humanising a story makes it appear to be an exception (one person’s tragedy) rather than the rule (thousands suffering) – or simply emotive rather than also informative; and secondly, that your selection of case studies does not reflect the more complex reality.
“Avastin extends survival from 19.9 months to 21.3 months, which is about 6 weeks. Some people might benefit more, some less. For some, Avastin might even shorten their life, and they would have been better off without it (and without its additional side effects, on top of their other chemotherapy). But overall, on average, when added to all the other treatments, Avastin extends survival from 19.9 months to 21.3 months.
“The Daily Mail, the Express, Sky News, the Press Association and the Guardian all described these figures, and then illustrated their stories about Avastin with an anecdote: the case of Barbara Moss. She was diagnosed with bowel cancer in 2006, had all the normal treatment, but also paid out of her own pocket to have Avastin on top of that. She is alive today, four years later.
“Barbara Moss is very lucky indeed, but her anecdote is in no sense whatsoever representative of what happens when you take Avastin, nor is it informative. She is useful journalistically, in the sense that people help to tell stories, but her anecdotal experience is actively misleading, because it doesn’t tell the story of what happens to people on Avastin: instead, it tells a completely different story, and arguably a more memorable one – now embedded in the minds of millions of people – that Roche’s £21,000 product Avastin makes you survive for half a decade.”
Broadcast journalism – with its regulatory requirement for impartiality, often interpreted in practical terms as ‘balance’ – is particularly vulnerable to this. Here’s one example of how the homeopathy debate is given over to one person’s experience for the sake of balance:
Journalism on an industrial scale
The Wikileaks stories are journalism on an industrial scale. The closest equivalent I can think of was the MPs’ expenses story which dominated the news agenda for 6 weeks. Cable Gate is already on Day 9 and the wealth of stories has even justified a live blog.
With this scale comes a further problem: cynicism and passivity; Cable Gate fatigue. In this context online journalism has a unique role to play which was barely possible previously: empowerment.
3 years ago I wrote about 5 Ws and a H that should come after every news story. The ‘How’ and ‘Why’ of that are possibilities that many news organisations have still barely explored. ‘Why should I care?’ is about a further dimension of visualisation: personalisation – relating information directly to me. The Guardian moves closer to this with its searchable database, but I wonder at what point processing power, tools, and user data will allow us to do this sort of thing more effectively.
‘How can I make a difference?’ is about pointing users to tools – or creating them ourselves – where they can move the story on by communicating with others, campaigning, voting, and so on. This is a role many journalists may be uncomfortable with because it raises advocacy issues, but then choosing to report on these stories, and how to report them, raises the same issues; linking to a range of online tools need not be any different. These are issues we should be exploring, ethically.
All the above in one sentence
Somehow I’ve ended up writing over a thousand words on this issue, so it’s worth summing it all up in a sentence.
Industrial scale journalism using ‘big data’ in a networked age raises new problems and new opportunities: we need to humanise and personalise big datasets in a way that does not detract from the complexity or scale of the issues being addressed; and we need to think about what happens after someone reads a story online and whether online publishers have a role in that.