Category Archives: databases

FAQ: Do you need new ethics for computational journalism?

This latest post in the FAQ series answers questions posed by a student in Belgium regarding ethics and data journalism.

Q: Do ethical issues in the practice of computational journalism differ from those of “traditional” journalism?

No, I don’t think they do particularly – any more than ethics in journalism differ from ethics in life in general. However, as in journalism versus life, there are areas which attract more attention because they are the places we find the most conflict between different ethical demands.

For example, the tension between public interest and an individual’s right to privacy is a general ethical issue in journalism but which has particular salience in data journalism, when you’re dealing with data which names individuals.

I wrote about this in a book chapter which I’ve published in parts on the blog. Continue reading

That massive open online course on data journalism now has a start date

In case you haven’t seen the tweets and blog posts, that MOOC on data journalism I’m involved in has a start date: May 19.

The launch was delayed a little due to the amount of people who signed up – which I think was a sensible decision.

You can watch the introduction video above, or ‘meet the instructors’ below. Looking forward to this…

Why open data matters – a (very bad) example from Universal Jobmatch

Open Data stickers image by Jonathan Gray

Open Data stickers image by Jonathan Gray

I come upon examples of bad practice in publishing government data on a regular basis, but the Universal Jobmatch tool is an example so bad I just had to write about it. In fact, it’s worse than the old-fashioned data service that preceded it.

That older service was the Office for National Statistics’ labour market service NOMIS, which published data on Jobcentre vacancies and claimants until late 2012, when Jobcentre Plus was given responsibility for publishing the data using their Universal Jobmatch tool.

Despite a number of concerns, more than a year on, Universal Jobmatch‘s reports section has ignored at least half of the public data principles first drafted by the Government’s Public Sector Transparency Board in 2010, and published in 2012. Continue reading

Saving the evidence in Ukraine: collaborate first – or you won’t be able to ask questions later

YanukovychLeaks screengrab

“The reporters then did something remarkable. They made a decision to cooperate among all the news organizations and to save first and report later.

“It wasn’t an easy decision. But it was clear that if they didn’t act, critical records of their own country’s history could be lost. The scene was already filling with other reporters eager to grab what stories they could and leave. In contrast, the group was joined by a handful of other like-minded journalists: Anna Babinets of Slidstvo/TV Hromadske;  Oleksandr Akymenko, formerly of Forbes; Katya Gorchinska and Vlad Lavrov of the Kyiv Post. Radio Free Europe reporter Natalie Sedletska returned from Prague so she could help, and others came, too.

“… In the tense situation that characterizes Ukraine, conspiracies form quickly. To demonstrate their transparency, the organizers quickly moved to get documents up. By early Tuesday, nearly 400 documents, a fraction of the estimated 20,000 to 50,000 documents, had been posted. Dozens more are being added by the hour.”

Drew Sullivan writes about Yanukovych Leaks.

FAQ: Big data and journalism

The latest in the series of Frequently Asked Questions comes from a UK student, who has questions about big data.

How can data journalists make sense of such quantities of data and filter out what’s meaningful?

In the same way they always have. Journalists’ role has always been to make choices about which information to prioritise, what extra information they need, and what information to include in the story they communicate. Continue reading

New ebook now ready! Learn basic spreadsheet skills with Data Journalism Heist

Data journalism book Data Journalism Heist

I’ve written a short ebook for people who are looking to get started with data journalism but need some help.

Data Journalism Heist covers two simple techniques for finding story leads in spreadsheets: pivot tables and advanced filters.

Neither technique requires any formulae, and there are dozens of local datasets (and one international one) to use them on.

In addition the book covers how to follow leads from data, and tell the resulting story, with tips on visualisation and plenty of recommendations for next steps.

You can buy it from Leanpub here. Comments welcome as always.

My next ebook: the Data Journalism Heist

Data Journalism Heist data journalism ebook

In the next couple of months I will begin publishing my next ebook: Data Journalism Heist.

Data Journalism Heist is designed to be a relatively short introduction to data journalism skills, demonstrating basic techniques for finding data, spotting possible stories and turning them around to a deadline.

Based on a workshop, the emphasis is on building confidence through speed and brevity, rather than headline-grabbing spectacular investigations or difficult datasets (I’m hoping to write a separate ebook on the latter at some point).

If you’re interested in finding out about the book, please sign up on the book’s Leanpub page.

Meanwhile, I’m looking for translators for Scraping for Journalists – get in touch if you’re interested.

I am a coding denier

There is an exchange that sometimes takes place, perfectly described by Beth Ashton, between those who use technology, and those who don’t. It goes like this:

Prospective data journalist: ‘I’d really like to learn how to do data journalism but I can’t do statistics!’

Data journalist: ‘Don’t let that put you off, I don’t know anything about numbers either, I’m a journalist, not a mathematician!’

Prospective data journalist: ‘But I can’t code, and it all looks so codey and complicated’

Data journalist: That’s fine, NONE OF US can code. None of us. Open angle bracket back slash End close angle bracket.

“These people are coding deniers,” argues Beth.

I think she’s on to something. Flash back to a week before Beth published that post: I was talking to Caroline Beavon about the realisation of just how hard-baked ‘coding’ was into my workflow:

  • A basic understanding of RSS lies behind my ability to get regular updates from hundreds of sources
  • I look at repetitiveness in my work and seek to automate it where I can
  • I look at structure in information and use that to save time in accessing it

These are all logical responses to an environment with more information than a journalist can reasonably deal with, and I have developed many of them almost without realising.

They are responses as logical as deciding to use a pen to record information when human memory cannot store it reliably alone. Or deciding to learn shorthand when longhand writing cannot record reliably alone. Or deciding to use an audio recorder when that technology became available.

One of the things that makes us uniquely human is that we reach for technological supports – tools – to do our jobs better. The alphabet, of course, is a technology too.

But we do not argue that shorthand comes easy, or that audio recorders can be time consuming, or that learning to use a pen takes time.

So: ‘coding’ – whether you call it RSS, or automation, or pattern recognition – needs to be learned. It might seem invisible to those of us who’ve built our work patterns around it – just as the alphabet seems invisible once you’ve learned it. But, like the alphabet, it is a technology all the same.

But secondly – and more importantly – for this to happen as a profession we need to acknowledge that ‘coding’ is a skill that has become as central to working effectively in journalism as using shorthand, the pen, or the alphabet.

I don’t say ‘will be central’ but ‘has become‘. There is too much information, moving too fast, to continue to work with the old tools alone. From social networks to the quantified self; from RSS-enabled blogs to the open data movement; from facial recognition to verification, our old tools won’t do.

So I’m not going to be a coding denier. Coding is to digital information what shorthand was to spoken information. There, I’ve said it. Now, how can we do it better?

It’s finished! Scraping for Journalists now complete (for now)

Scraping for Journalists book

Last night I published the final chapter of my first ebook: Scraping for Journalists. Since I started publishing it in July, over 40 ‘versions’ of the book have been uploaded to Leanpub, a platform that allows users to receive updates as a book develops – but more importantly, to input into its development.

I’ve been amazed at the consistent interest in the book – last week it passed 500 readers: 400 more than I ever expected to download it. Their comments have directly shaped, and in some cases been reproduced in, the book – something I expect to continue (I plan to continue to update it).

As a result I’ve become a huge fan of this form of ebook publishing, and plan to do a lot more with it (some hints here and here). The format combines the best qualities of traditional book publishing with those of blogging and social media (there’s a Facebook page too).

Meanwhile, there’s still more to do with Scraping for Journalists: publishing to other platforms and in other languages for starters… If you’re interested in translating the book into another language, please get in touch.