Cross-posted from Data Driven Journalism.
Earlier this year I set out to tackle a problem that was bothering me: journalists who had started to learn programming were giving up.
They were hitting a wall. In trying to learn the more advanced programming techniques – particularly those involved in scraping – they seemed to fall into one of two camps:
- People who learned programming, but were taking far too long to apply it, and so losing momentum – the generalists
- People who learned how to write one scraper, but could not extend it to others, and so becoming frustrated – the specialists
In setting out to figure out what was going wrong, I set myself a task which I have found helpful in taking a fresh perspective on an issue: I started writing a book chapter.
The nice thing about writing books is that they force you to put together a coherent and complete narrative about an entire process. You identify gaps that you weren’t otherwise aware of, and you have to put yourself in the place of someone with no knowledge at all. You take nothing for granted.
So my starting point was this: what is a good way to learn how to write scrapers?
That’s a different question to ‘How do I write a scraper?’ and also to ‘How do I learn programming?’ And that’s important. Because most of the resources available fell into one of those two camps.
The people trying to learn programming were hitting a common problem in learning: lack of feedback. They might be able to change a variable in Ruby, but how would that help in journalism? It was like learning the structure of the entire French language just so they could go to the corner shop and ask for a loaf of bread.
The people learning how to write one scraper were hitting another common problem: learning how to do one task well, rather than the underlying principles. This was like someone learning how to ask for a loaf of bread in French, but not being able to extend that knowledge into asking for directions home.
I tackled both by beginning the chapter with probably the simplest scraper you can write: a spreadsheet formula in Google Docs. This provided the instant feedback that the generalists lacked, but the formula was also used to introduce some key concepts in programming: functions, strings, indexes, and parameters. These would provide key principles that the specialists lacked, and which future chapters could build on.
I also looked at how journalists tried to learn programming, and how programmers developed, and realised something else: journalists and programmers learned differently.
I’m generalising wildly, of course, but journalists – particularly student journalists – often try to learn programming from books. That may sound like common sense, but it’s not in an art or a science – and programming is both.
Programmers – if I’m to generalise wildly again – typically combine books (which they don’t read cover to cover) with documentation, adapting other code, trial and error, and each other. When they teach journalists, they often don’t realise that journalists don’t always share that culture.
And journalists – coming traditionally from a background in the humanities – are used to learning from books: static knowledge. Teaching programming to journalists then, I realised, would also mean teaching how programmers learn.
So my chapter introducing that first scraper introduced some other key concepts as well. It would direct readers to the documentation on the function being used, and invite them to engage in some trial and error to work out a solution to a problem. As more scraper tutorials were added, they introduced more key concepts in programming – importantly, without having to learn an entirely new language, and with documentation and trial and error running throughout, along with the principle of adapting other code.
I tested the approach at the News:Rewired conference. Can you teach scraping in 20 minutes? At a basic level, yes: it seemed you could.
After 20,000 words I realised that my book chapter was turning into a book. Meanwhile, a colleague had told me about Leanpub: a website that allowed people to publish books as they were being written, with readers able to download new updates as they came.
The platform suited the book perfectly: it meant I could stagger the publication of the book, Codecademy-style, with readers trying at least one scraper per week, but also having the time to experiment with trial and error before the next chapter was published. It meant that I could respond to feedback on the earlier chapters and adapt the rest of the book before it was published (in one case a Brazilian reader pointed out after the first chapter was published that the Portuguese-language Google Docs uses semi colons instead of commas). If examples used in the book changed then I could replace them. And it meant that if new tools or techniques emerged, I could incorporate them.
It is a programming-style approach to publishing – trial and error – which very much suits the spirit of the book. It’s extra work, but it makes for a much better writing experience. I hope the readers think so too.
Scraping for Journalists is available at Leanpub.com/ScrapingForJournalists
Why do journalists hit a wall learning programming? Well, I think learning to program is much harder than learning some html-tags. It also involves a continued effort. In my own experience, having to stop the learning process for a few weeks because of a heavy workload (not related to coding but to the many other aspects of journalism) made it hard to start again – I had the impression I had forgotten most of what I thought I learned. Maybe this is different for people who have data journalism in their job description of course, but for other journalists it could very well be the stop-go-stop experience which finally makes them conclude it’s just not for them. Just wondering: do you think programming skills are important for all journalists, or only for those who are full-time journalists/coders?
I think you’re right. What I’ve tried to do is make sure that that stop-start process at least has enough feedback before the ‘stop’ that you’re willing to ‘start’ again.
I’m a self-taught programmer journalist. A boss once called me “appropriately pig-headed” and I think that’s what got me through the above.
I think what would really help removing the walls above is a tandem effort on debugging. You can’t get anywhere without know why something borked. And if don’t have experience in programming, you can’t really know where to look in less you have the feedback you get from a good debugging set-up.
Yes, I spend a lot of the early chapters talking about debugging/problem solving with documentation, trial and error and online communities. Very important and easily overlooked when you take it for granted.
Pingback: Next job: write an ebook « St Helena Online: the blog
This looks really interesting and I’ll be excited to give it a go. I’m currently trying to work my way through the NCTJ distance diploma and do Code Year at the same time – a kind of DIY course in basic programming and journalism if you will.
I taught myself HTML as a kid by looking at source codes of websites, and so I find the process of Code Year a bit sticky at times because it doesn’t have the same trial and error experience required to learn the process of programming.
Pingback: För lite trial and error – eller är det tillräckligt med error? | Journalisttips
Thanks a good deal to get sharing that with all individuals you actually have an understanding of what prepared to talking about! Bookmarked as their favorite. Kindly additionally consult with the site =). We can easily have got a website trade deal many!