Category: data journalism

I am a coding denier

There is an exchange that sometimes takes place, perfectly described by Beth Ashton, between those who use technology, and those who don’t. It goes like this:

Prospective data journalist: ‘I’d really like to learn how to do data journalism but I can’t do statistics!’

Data journalist: ‘Don’t let that put you off, I don’t know anything about numbers either, I’m a journalist, not a mathematician!’

Prospective data journalist: ‘But I can’t code, and it all looks so codey and complicated’

Data journalist: That’s fine, NONE OF US can code. None of us. Open angle bracket back slash End close angle bracket.

“These people are coding deniers,” argues Beth.

I think she’s on to something. Flash back to a week before Beth published that post: I was talking to Caroline Beavon about the realisation of just how hard-baked ‘coding’ was into my workflow:

  • A basic understanding of RSS lies behind my ability to get regular updates from hundreds of sources
  • I look at repetitiveness in my work and seek to automate it where I can
  • I look at structure in information and use that to save time in accessing it

These are all logical responses to an environment with more information than a journalist can reasonably deal with, and I have developed many of them almost without realising.

They are responses as logical as deciding to use a pen to record information when human memory cannot store it reliably alone. Or deciding to learn shorthand when longhand writing cannot record reliably alone. Or deciding to use an audio recorder when that technology became available.

One of the things that makes us uniquely human is that we reach for technological supports – tools – to do our jobs better. The alphabet, of course, is a technology too.

But we do not argue that shorthand comes easy, or that audio recorders can be time consuming, or that learning to use a pen takes time.

So: ‘coding’ – whether you call it RSS, or automation, or pattern recognition – needs to be learned. It might seem invisible to those of us who’ve built our work patterns around it – just as the alphabet seems invisible once you’ve learned it. But, like the alphabet, it is a technology all the same.

But secondly – and more importantly – for this to happen as a profession we need to acknowledge that ‘coding’ is a skill that has become as central to working effectively in journalism as using shorthand, the pen, or the alphabet.

I don’t say ‘will be central’ but ‘has become‘. There is too much information, moving too fast, to continue to work with the old tools alone. From social networks to the quantified self; from RSS-enabled blogs to the open data movement; from facial recognition to verification, our old tools won’t do.

So I’m not going to be a coding denier. Coding is to digital information what shorthand was to spoken information. There, I’ve said it. Now, how can we do it better?

It’s finished! Scraping for Journalists now complete (for now)

Scraping for Journalists book

Last night I published the final chapter of my first ebook: Scraping for Journalists. Since I started publishing it in July, over 40 ‘versions’ of the book have been uploaded to Leanpub, a platform that allows users to receive updates as a book develops – but more importantly, to input into its development.

I’ve been amazed at the consistent interest in the book – last week it passed 500 readers: 400 more than I ever expected to download it. Their comments have directly shaped, and in some cases been reproduced in, the book – something I expect to continue (I plan to continue to update it).

As a result I’ve become a huge fan of this form of ebook publishing, and plan to do a lot more with it (some hints here and here). The format combines the best qualities of traditional book publishing with those of blogging and social media (there’s a Facebook page too).

Meanwhile, there’s still more to do with Scraping for Journalists: publishing to other platforms and in other languages for starters… If you’re interested in translating the book into another language, please get in touch.

Livesheets creator wants to “make all kids into rocket scientists”

Abacus image by Anssi Koskinen
Abacus image by Anssi Koskinen on Flickr

“Imagine if you could search for any calculations and then just use them directly without ever having to work it out yourself from scratch.”

This is the vision of developer Daniel Maxwell, the creator of livesheets.com, whose dream it is for no one in the world to perform the same calculation twice again. Continue reading

A sample dirty dataset for trying out Google Refine

I’ve created this spreadsheet of ‘dirty data‘ to demonstrate some typical problems that data cleaning tools and techniques can be used for:

  • Subheadings that are only used once (and you need them in each row where they apply)
  • Odd characters that stand for something else (e.g. a space or ampersand)
  • Different entries that mean the same thing, either because they are lacking pieces of information, or have been mistyped, or inconsistently formatted

It’s best used alongside this post introducing basic features of Google Refine. But you can also use it to explore more simple techniques in spreadsheets like Find and replace; the TRIM function (and alternative solutions); and the functions UPPER, LOWER, and PROPER (which convert text into all upper case, lower case, and titlecase respectively).

Thanks to Eva Constantaras for suggesting the idea.

2 how-tos: researching people and mapping planning applications

Mapping planning applications
Sid Ryan’s planning applications map

Sid Ryan wanted to see if planning applications near planning committee members were more or less likely to be accepted. In two guest posts on Help Me Investigate he shows how to research people online (in this case the councillors), and how to map planning applications to identify potential relationships.

The posts take in a range of techniques including:

  • Scraping using Scraperwiki and the Google Drive spreadsheet function importXML
  • Mapping in Google Fusion Tables
  • Registers of interests
  • Using advanced search techniques
  • Using Land Registry enquiries
  • Using Companies House and Duedil
  • Other ways to find information on individuals, such as Hansard, LinkedIn, 192.com, Lexis Nexis, whois and FriendsReunited

If you find it useful, please let me know – and if you can add anything… please do.

Motion graphic video workflow – a video tutorial

Motion graphics has become an increasingly popular way to present data in a compelling visual form. In a series of videos guest contributor Sihlangu Tshuma outlines his workflow process for managing a motion graphics video project, the results of which are shown at the end. All 13 videos are also available in this playlist.

1: Motion graphics introduction

2: Researching the project

3: Motion graphics treatments Continue reading

Notes on setting up a regional newspaper datablog

Behind the Numbers - Birmingham's regional datablog

I’ve been working recently with the Birmingham Mail to launch Behind The Numbersa new datablog project with Birmingham City University supported by Help Me Investigate. I’m told that it is probably the UK’s first regional newspaper datablog, although whether that’s a meaningful claim is debatable*.

The first story generated by the project - what is the worst time to be seen at A&E - was published in the newspaper a week ago. But it’s what happens next that’s going to be interesting. Continue reading

Is this an Excel killer? QueryTree app lowers the bar on data journalism

QueryTree

Sometimes the most impressive tools solve a problem you never knew you had. In the case of QueryTree, a new data analysis tool, that problem is something most people never question: spreadsheets.

For all the shiny-shiny copy-and-paste-click-and-drag-ness in new journalism tools, most data digging comes back to at least some simple spreadsheet work, and that represents a significant hurdle for many journalists used to working with simpler tools.

While interface design has undergone generations of improvement on the web, spreadsheet software interfaces have remained largely unchanged for decades.

So why did no one think to do this before?

QueryTree - how the drag and drop interface works

You only need 10 choices

Continue reading