Where should an aspiring data journalist start?

In writing last week’s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I’d publish them in full here.

The Telegraph’s Conrad Quilty-Harper:

Start reading:

http://www.google.com/reader/bundle/user%2F06076274130681848419%2Fbundle%2Fdatavizfeeds

Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.

Look for sources of data:

ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).

Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts.

Find out where hidden data is and try and get hold of it: private companies looking for publicity, under appreciated research departments, public bodies that release data but not in a granular form (e.g. Met Office).

Test out cleaning/visualisation tools:

You want to be able to collect data, clean it, visualise it and map it.

Obviously you need to know basic Excel skills (pivot tables are how journalists efficiently get headline numbers from big spreadsheets).

For publishing just use Google Spreadsheets graphs, or ManyEyes or Timetric. Google MyMaps coupled with http://batchgeo.com is a great beginner mapping combo.

Further on from that you want to try out Google Spreadsheets importURL service, Yahoo Pipes for cleaning data, Freebase Gridworks and Dabble DB.

More advanced stuff you want to figure out query language and be able to work with relational databases, Google BigQuery, Google Visualisation API (http://code.google.com/apis/charttools/), Google code playgrounds (http://code.google.com/apis/ajax/playground/?type=visualization#org_chart) and other Javascript tools. The advanced mapping equivalents are ArcGIS or GeoConcept, allowing you to query geographical data and find stories.

You could also learn some Ruby for building your own scrapers, or Python for ScraperWiki.

Get inspired:

Get the data behind some big data stories you admire, try and find a story, visualise it and blog about it. You’ll find that the whole process starts with the data, and your interpretation of it. That needs to be newsworthy/valuable.

Look to the past!

Edward Tufte’s work is very inspiring: http://www.edwardtufte.com/tufte/ His favourite data visualisation is from 1869! Or what about John Snow’s Cholera map? http://www.york.ac.uk/depts/maths/histstat/snow_map.htm

And for good luck here’s an assorted list of visualisation tutorials.

The Times’ Jonathan Richards

I’d say a couple of blogs.

Hacks/Hackers: http://hackshackers.com/
and more importantly their Q&A site: http://help.hackshackers.com/ – which answers a ton of questions about which computer language/book/manual/online tutorial to get started with.

Others that spring to mind are:

Michelle Minkoff: http://michelleminkoff.com/ (Hack/hacker who’s now at the LA Times. Provides a nice ‘not too geeky’ take on the area) and
10,000 words: http://10000words.net/

If people want more specific advice, tell them to come to the next London Hack/Hackers and track me down!

The Guardian’s Charles Arthur:

Obvious thing: find a story that will be best told through numbers. (I’m thinking about quizzing my local council about the effects of stopping free swimming for children. Obvious way forward: get numbers for number of children swimming before, during and after free swimming offer.)

If someone already has the skills for data journalism (which I’d put at (1) understanding statistics and relevance (2) understanding how to manipulate data (3) understanding how to make the data visual) the key, I’d say, is always being able to spot a story that can be told through data – and only makes sense that way, and where being able to manipulate the data is key to extracting the story. It’s like interviewing the data. Good interviewers know how to get what they want out from the conversation. Ditto good data journalists and their data.

The New York Times’ Aron Pilhofer:

I would start small, and start with something you already know and already do. And always, always, always remember that the goal here is journalism. There is a tendency to focus too much on the skills for the sake of skills, and not enough on how those skills help enable you to do better journalism. Be pragmatic about it, and resist the tendency to think you need to know everything about the techy stuff before you do anything — nothing could be further from the truth.

Less abstractly, I would start out learning some basic computer-assisted reporting skills and then moving from there as your interests/needs dictate. A lot of people see the programmer/journalism thing as distinct from computer-assisted reporting, but I don’t. I see it as a continuum. I see CAR as a “gateway drug” of sorts: Once you start working with small data sets using tools like Excel, Access, MySQL, etc., you’ll eventually hit limits of what you can do with macros and SQL.

Soon enough, you’ll want to be able to script certain things. You’ll want to get data from the web. You’ll want to do things you can only do using some kind of scripting language, and so it begins.

But again, the place to start isn’t thinking about all these technologies. The place to start is thinking about how these technologies can enable you to tell stories you otherwise would never be able to tell otherwise. And you should start small. Look for little things to start, and go from there.

15 thoughts on “Where should an aspiring data journalist start?”

Matt October 4, 2010 at 10:38 am

Thanks for the post – very useful for us newbies trying to get a handle on what data journalism is and how to extract stories and present them using these techniques.

A key issue for me is humanising the data – trying to match experiences with the information, putting faces to the stories which emerge from the data. Case studies, that sort of thing. That way it becomes easier for TV and radio editors to give the nod to this sort of material – traditional broadcast media, in my experience often doesn’t see itself as a natural home for this sort of material.

Having said that, it depends on the story! If, through your data mining, you come across something striking which affects lots of people, which tells us meaningful and vital things about the places where we live, which generates a compelling top line, then it becomes an easier sell. But there’s still a hard job engaging people who decide run orders.

I hear this a lot from broadcast news editors regarding data-driven stories “good story but it’s hard to leave the page/spreadsheet” or “it’s a great newspaper story, not sure about telly”. “Works well on the web”.

Is that old fashioned or do they have a point?

Reply ↓

Pingback: links for 2010-10-04 | Metamedia

Paul Bradshaw October 4, 2010 at 12:09 pm

It’s a depressing fact of broadcast news values, for me, sadly. It also strikes me as a bit lazy – until the journalist gets stuck into the data they can’t assume there’s not a compelling human/entertaining/etc. story there. Spending two days with Telegraph trainees, it struck me how good they were at finding ways to make the data come alive (will try to blog about this soon) – when I do the same with broadcast journalists I imagine we’ll spend some time on how to find case studies, etc.

Your comment also suggests that we’re not quite in the ‘converged’ news operations that news orgs like to shout about?

Reply ↓

Pingback: Jornalismo de base de dados (2) : Ponto Media

Matt October 4, 2010 at 12:47 pm

Newsrooms where I work tend to be pretty joined up when it comes to TV/Radio/Online talking to each other about what they’re doing and co-ordinating their newsgathering. The news orgs are an important cog in that machine and generally do a good job.

I think within TV, the problem is cultural – is it still a story if there aren’t any obvious pictures or things to film? The need to populate stories with people and for their experiences to be at the forefront of the piece has been a long standing theme where I work, and rightly so – but I sometimes feel that there are stories which need telling which don’t necessarily need a case study to make them ‘real’ or legitimate. If one of the functions of journalism is to further our understanding of how the world really works – especially behind the scenes or underneath the surface – then a lot of TV news has to move from its self-imposed limitations otherwise it’s going to be rendered a bit obsolete by the techniques you’re blogging about, Paul .. having said that, different TV newsrooms are more open to these ideas, more progressive, than others ..

Reply ↓

Pingback: Monday morning rounds | stevemullis[dot]net

Pingback: links for 2010-10-04 « Sarah Booker

Pingback: links for 2010-10-05 | Metamedia

Pingback: Something I wrote for the Guardian Datablog (and caveats) | Online Journalism Blog

Pingback: Data Journalism | Hedy Korbee

Pingback: Caught in an information rip? | This Is Possible

Pingback: Does data journalism work for all media platforms? | Driven by Data

Pingback: Data Visualization Movies | GabeMac

Pingback: Charles Ayoub News Portal