Tag Archives: caroline beavon

I am a coding denier

There is an exchange that sometimes takes place, perfectly described by Beth Ashton, between those who use technology, and those who don’t. It goes like this:

Prospective data journalist: ‘I’d really like to learn how to do data journalism but I can’t do statistics!’

Data journalist: ‘Don’t let that put you off, I don’t know anything about numbers either, I’m a journalist, not a mathematician!’

Prospective data journalist: ‘But I can’t code, and it all looks so codey and complicated’

Data journalist: That’s fine, NONE OF US can code. None of us. Open angle bracket back slash End close angle bracket.

“These people are coding deniers,” argues Beth.

I think she’s on to something. Flash back to a week before Beth published that post: I was talking to Caroline Beavon about the realisation of just how hard-baked ‘coding’ was into my workflow:

  • A basic understanding of RSS lies behind my ability to get regular updates from hundreds of sources
  • I look at repetitiveness in my work and seek to automate it where I can
  • I look at structure in information and use that to save time in accessing it

These are all logical responses to an environment with more information than a journalist can reasonably deal with, and I have developed many of them almost without realising.

They are responses as logical as deciding to use a pen to record information when human memory cannot store it reliably alone. Or deciding to learn shorthand when longhand writing cannot record reliably alone. Or deciding to use an audio recorder when that technology became available.

One of the things that makes us uniquely human is that we reach for technological supports – tools – to do our jobs better. The alphabet, of course, is a technology too.

But we do not argue that shorthand comes easy, or that audio recorders can be time consuming, or that learning to use a pen takes time.

So: ‘coding’ – whether you call it RSS, or automation, or pattern recognition – needs to be learned. It might seem invisible to those of us who’ve built our work patterns around it – just as the alphabet seems invisible once you’ve learned it. But, like the alphabet, it is a technology all the same.

But secondly – and more importantly – for this to happen as a profession we need to acknowledge that ‘coding’ is a skill that has become as central to working effectively in journalism as using shorthand, the pen, or the alphabet.

I don’t say ‘will be central’ but ‘has become‘. There is too much information, moving too fast, to continue to work with the old tools alone. From social networks to the quantified self; from RSS-enabled blogs to the open data movement; from facial recognition to verification, our old tools won’t do.

So I’m not going to be a coding denier. Coding is to digital information what shorthand was to spoken information. There, I’ve said it. Now, how can we do it better?

Data visualisation training

If you’re interested in data visualisation I’m delivering a training course on November 7 with the excellent Caroline Beavon. Here’s what we’re covering:

  • Pick the right chart for your story – against a deadline
  • Mapping tricks and techniques: using Fusion Tables and other tools to map Olympic torchbearers
  • Picking the right data to visualise
  • Visualisation tips for free chart tools
  • Avoiding common visualisation mistakes
  • Create an infographic with Tableau and Illustrator
  • Making data interactive

More details here. Places can be booked here.

A case study in online journalism: investigating the Olympic torch relay

Infographic: Where did the Olympic torch relay places go? What we know so far

image by @CarolineBeavon

For the last two months I’ve been involved in an investigation which has used almost every technique in the online journalism toolbox. From its beginnings in data journalism, through collaboration, community management and SEO to ‘passive-aggressive’ newsgathering,  verification and ebook publishing, it’s been a fascinating case study in such a range of ways I’m going to struggle to get them all down.

But I’m going to try.

Data journalism: scraping the Olympic torch relay

The investigation began with the scraping of the official torchbearer website. It’s important to emphasise that this piece of data journalism didn’t take place in isolation – in fact, it was while working with Help Me Investigate the Olympics‘s Jennifer Jones (coordinator for#media2012, the first citizen media network for the Olympic Games) and others that I stumbled across the torchbearer data. So networks and community are important here (more later).

Indeed, it turned out that the site couldn’t be scraped through a ‘normal’ scraper, and it was the community of the Scraperwiki site – specifically Zarino Zappia – who helped solve the problem and get a scraper working. Without both of those sets of relationships – with the citizen media network and with the developer community on Scraperwiki – this might never have got off the ground.

But it was also important to see the potential newsworthiness in that particular part of the site. Human stories were at the heart of the torch relay – not numbers. Local pride and curiosity was here – a key ingredient of any local newspaper. There were the promises made by its organisers – had they been kept?

The hunch proved correct – this dataset would just keep on giving stories.

The scraper grabbed details on around 6,000 torchbearers. I was curious why more weren’t listed – yes, there were supposed to be around 800 invitations to high profile torchbearers including celebrities, who might reasonably be expected to be omitted at least until they carried the torch – but that still left over 1,000.

I’ve written a bit more about the scraping and data analysis process for The Guardian and the Telegraph data blog. In a nutshell, here are some of the processes used:

  • Overview (pivot table): where do most come from? What’s the age distribution?
  • Focus on details in the overview: what’s the most surprising hometown in the top 5 or 10? Who’s oldest and youngest? What about the biggest source outside the UK?
  • Start asking questions of the data based on what we know it should look like – and hunches
  • Don’t get distracted – pick a focus and build around it.

This last point is notable. As I looked for mentions of Olympic sponsors in nomination stories, I started to build up subsets of the data: a dozen people who mentioned BP, two who mentioned ArcelorMittal (the CEO and his son), and so on. Each was interesting in its own way – but where should you invest your efforts?

One story had already caught my eye: it was written in the first person and talked about having been “engaged in the business of sport”. It was hardly inspirational. As it mentioned adidas, I focused on the adidas subset, and found that the same story was used by a further six people – a third of all of those who mentioned the company.

Clearly, all seven people hadn’t written the same story individually, so something was odd here. And that made this more than a ‘rotten apple’ story, but something potentially systemic.

Signals

While the data was interesting in itself, it was important to treat it as a set of signals to potentially more interesting exploration. Seven torchbearers having the same story was one of those signals. Mentions of corporate sponsors was another.

But there were many others too.

That initial scouring of the data had identified a number of people carrying the torch who held executive positions at sponsors and their commercial partners. The GuardianThe Independent and The Daily Mail were among the first to report on the story.

I wondered if the details of any of those corporate torchbearers might have been taken off off the site afterwards. And indeed they had: seven disappeared entirely (many still had a profile if you typed in the URL directly - but could not be found through search or browsing), and a further two had had their stories removed.

Now, every time I scraped details from the site I looked for those who had disappeared since the last scrape, and those that had been added late.

One, for example – who shared a name with a very senior figure at one of the sponsors – appeared just once before disappearing four days later. I wouldn’t have spotted them if they – or someone else – hadn’t been so keen on removing their name.

Another time, I noticed that a new torchbearer had been added to the list with the same story as the 7 adidas torchbearers. He turned out to be the Group Chief Executive of the country’s largest catalogue retailer, providing “continuing evidence that adidas ignored LOCOG guidance not to nominate executives.”

Meanwhile, the number of torchbearers running without any nomination story went from just 2.7% in the first scrape of 6,056 torchbearers, to 7.2% of 6,891 torchbearers in the last week, and 8.1% of all torchbearers – including those who had appeared and then disappeared – who had appeared between the two dates.

Many were celebrities or sportspeople where perhaps someone had taken the decision that they ‘needed no introduction’. But many also turned out to be corporate torchbearers.

By early July the numbers of these ‘mystery torchbearers’ had reached 500 and, having only identified a fifth, we published them through The Guardian datablog.

There were other signals, too, where knowing the way the torch relay operated helped.

For example, logistics meant that overseas torchbearers often carried the torch in the same location. This led to a cluster of Chinese torchbearers in StanstedHungarians in Dorset,Germans in BrightonAmericans in Oxford and Russians in North Wales.

As many corporate torchbearers were also based overseas, this helped narrow the search, with Germany’s corporate torchbearers in particular leading to an article in Der Tagesspiegel.

I also had the idea to total up how many torchbearers appeared each day, to identify days when details on unusually high numbers of torchbearers were missing – thanks to Adrian Short – but it became apparent that variation due to other factors such as weekends and the Jubilee made this worthless.

However, the percentage per day missing stories did help (visualised below by Caroline Beavon), as this also helped identify days when large numbers of overseas torchbearers were carrying the torch. I cross-referenced this with the ‘mystery torchbearer’ spreadsheet to see how many had already been checked, and which days still needed attention.

But the data was just the beginning. In the second part of this case study, I talk about the verification process, SEO and collaboration.

A case study in online journalism: investigating the Olympic torch relay

Infographic: Where did the Olympic torch relay places go? What we know so far

For the last two months I’ve been involved in an investigation which has used almost every technique in the online journalism toolbox. From its beginnings in data journalism, through collaboration, community management and SEO to ‘passive-aggressive’ newsgathering,  verification and ebook publishing, it’s been a fascinating case study in such a range of ways I’m going to struggle to get them all down.

But I’m going to try. Continue reading

The first Birmingham Hacks/Hackers meetup – Monday Sept 20

Those helpful people at Hacks/Hackers have let me set up a Hacks/Hackers group for Birmingham. This is basically a group of people interested in the journalistic (and, by extension, the civic) possibilities of data. If you’re at all interested in this and think you might want to meet up in the Midlands sometime, please join up.

I’ve also organised the first Hacks/Hackers meetup for Birmingham on Monday September 20, in Coffee Lounge from 1pm into the evening.

Our speaker will be Caroline Beavon, an experienced journalist who caught the data bug on my MA in Online Journalism (and whose experiences I felt would be accessible to most). In addition, NHS Local’s Carl Plant will be talking briefly about health data and Walsall Council’s Dan Slee about council data.

All are welcome and no technical or journalistic knowledge is required. I’m hoping we can pair techies with non-techies for some ad hoc learning.

If you want to come RSVP at the link.

PS: There’s also a Hacks/Hackers in London, and one being planned for Manchester, I’m told.

Music journalism and data (MA Online Journalism multimedia projects pt1)

I’ve just finished looking at the work from the Diploma stage of my MA in Online Journalism, and – if you’ll forgive the effusiveness – boy is it good.

The work includes data visualisation, Flash, video, mapping and game journalism – in short, everything you’d want from a group of people who are not merely learning how to do journalism but exploring what journalism can become in a networked age.

But before I get to the detail, a bit of background… Continue reading

Experiments in online journalism

Last month the first submissions by students on the MA in Online Journalism landed on my desk. I had set two assignments. The first was a standard portfolio of online journalism work as part of an ongoing, live news project. But the second was explicitly branded ‘Experimental Portfolio‘ – you can see the brief here. I wanted students to have a space to fail. I had no idea how brave they would be, or how successful. The results, thankfully, surpassed any expectations I had. They included:

There are a range of things that I found positive about the results. Firstly, the sheer variety – students seemed to either instinctively or explicitly choose areas distinct from each other. The resulting reservoir of knowledge and experience, then, has huge promise for moving into the second and final parts of the MA, providing a foundation to learn from each other. Continue reading