In writing last week’s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I’d publish them in full here.
The Telegraph’s Conrad Quilty-Harper:
Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.
Look for sources of data:
ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).
Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts.
Find out where hidden data is and try and get hold of it: private companies looking for publicity, under appreciated research departments, public bodies that release data but not in a granular form (e.g. Met Office).
Test out cleaning/visualisation tools:
You want to be able to collect data, clean it, visualise it and map it.
Obviously you need to know basic Excel skills (pivot tables are how journalists efficiently get headline numbers from big spreadsheets).
For publishing just use Google Spreadsheets graphs, or ManyEyes or Timetric. Google MyMaps coupled with http://batchgeo.com is a great beginner mapping combo.
Further on from that you want to try out Google Spreadsheets importURL service, Yahoo Pipes for cleaning data, Freebase Gridworks and Dabble DB.
You could also learn some Ruby for building your own scrapers, or Python for ScraperWiki.
Get the data behind some big data stories you admire, try and find a story, visualise it and blog about it. You’ll find that the whole process starts with the data, and your interpretation of it. That needs to be newsworthy/valuable.
Look to the past!
Edward Tufte’s work is very inspiring: http://www.edwardtufte.com/tufte/ His favourite data visualisation is from 1869! Or what about John Snow’s Cholera map? http://www.york.ac.uk/depts/maths/histstat/snow_map.htm
And for good luck here’s an assorted list of visualisation tutorials.
- Visualisation of types of visualisations
- How to make a heatmap
- How to create successful infographics
- How to create a timeline using Simile
- 10 useful Google Spreadsheet tricks
- 10 tips for designing infographics
- How to create outstanding modern infographics in Illustrator
The Times’ Jonathan Richards
I’d say a couple of blogs.
- Hacks/Hackers: http://hackshackers.com/
- and more importantly their Q&A site: http://help.hackshackers.com/ – which answers a ton of questions about which computer language/book/manual/online tutorial to get started with.
Others that spring to mind are:
- Michelle Minkoff: http://michelleminkoff.com/ (Hack/hacker who’s now at the LA Times. Provides a nice ‘not too geeky’ take on the area) and
- 10,000 words: http://10000words.net/
If people want more specific advice, tell them to come to the next London Hack/Hackers and track me down!
The Guardian’s Charles Arthur:
Obvious thing: find a story that will be best told through numbers. (I’m thinking about quizzing my local council about the effects of stopping free swimming for children. Obvious way forward: get numbers for number of children swimming before, during and after free swimming offer.)
If someone already has the skills for data journalism (which I’d put at (1) understanding statistics and relevance (2) understanding how to manipulate data (3) understanding how to make the data visual) the key, I’d say, is always being able to spot a story that can be told through data – and only makes sense that way, and where being able to manipulate the data is key to extracting the story. It’s like interviewing the data. Good interviewers know how to get what they want out from the conversation. Ditto good data journalists and their data.
The New York Times’ Aron Pilhofer:
I would start small, and start with something you already know and already do. And always, always, always remember that the goal here is journalism. There is a tendency to focus too much on the skills for the sake of skills, and not enough on how those skills help enable you to do better journalism. Be pragmatic about it, and resist the tendency to think you need to know everything about the techy stuff before you do anything — nothing could be further from the truth.
Less abstractly, I would start out learning some basic computer-assisted reporting skills and then moving from there as your interests/needs dictate. A lot of people see the programmer/journalism thing as distinct from computer-assisted reporting, but I don’t. I see it as a continuum. I see CAR as a “gateway drug” of sorts: Once you start working with small data sets using tools like Excel, Access, MySQL, etc., you’ll eventually hit limits of what you can do with macros and SQL.
Soon enough, you’ll want to be able to script certain things. You’ll want to get data from the web. You’ll want to do things you can only do using some kind of scripting language, and so it begins.
But again, the place to start isn’t thinking about all these technologies. The place to start is thinking about how these technologies can enable you to tell stories you otherwise would never be able to tell otherwise. And you should start small. Look for little things to start, and go from there.