Monthly Archives: September 2013

Can you help map local data blogs?

This week Matt Burgess launched the Northampton Data Blog, “exploring the data behind the headlines in Northamptonshire”. The site is at least the fourth local data blog to be launched this year after the Coventry Data Blog in May, and Behind The Numbers sections on Wales Online and the Birmingham Mail.

But are we missing any? Please let me know in the comments.

UPDATE: Philip Nye points out his data posts are largely about Hackney.

Disclosure: Matt Burgess was previously the editor of Help Me Investigate Education and Coventry Data Blog creator Ian Silvera has contributed to Help Me Investigate. I am regularly involved in the Birmingham Data Blog.

How to think like a computer: 5 tips for a data journalism workflow part 3

This is the final part of a series of blog posts. The first explains how using feeds and social bookmarking can make for a quicker data journalism workflow. The second looks at how to anticipate and prevent problems; and how collaboration can improve data work.

Workflow tip 5. Think like a computer

The final workflow tip is all about efficiency. Computers deal with processes in a logical way, and good programming is often about completing processes in the simplest way possible.

If you have any tasks that are repetitive, break them down and work out what patterns might allow you to do them more quickly – or for a computer to do them. Continue reading

5 tips for a data journalism workflow: part 2 – anticipating problems and collaboration

In my last post I wrote about how using feeds and social bookmarking can make for a quicker data journalism workflow. In this second part I look at how to anticipate and prevent problems; and how collaboration can improve data work.

Workflow tip 3. Anticipate problems

A particularly useful habit of successful data journalists is to think ahead in the way you request data. For example, you might want to request basic datasets now that you think you’ll need in future, such as demographic details for local patches.

You might also want to request the ‘data dictionary‘ for key datasets. This lists all the fields used in a particular database. For example, did you know that the police have a database for storing descriptions of suspects? And that one of the fields is shoe size? That could make for quite a quirky story. Continue reading

5 tips for a data journalism workflow: part 1 – data newswires and archiving

Earlier this year I spoke at the BBC’s Data Fusion Day (you can find a liveblog of the event on Help Me Investigate) about data journalism workflows. The presentation slides are embedded below (the title is firmly tongue-in-cheek), but I thought I’d explain a bit more in a series of posts – beginning here.

Data journalism workflow 1: Set up data newswires

Most newsrooms take a newswire of some sort – national and international news from organisations like the Press Association, Reuters, and Associated Press.

Data journalism is no exception. If you want to find stories in data, it helps to know what data is coming out, when it comes out.

Continue reading

FAQ: Does data journalism subvert the norm of objectivity?

Here are another set of questions from a student as part of the FAQ section – the last one is a goodie.

Question 1: Do you think data journalism can reduce the cost of investigative journalism?

Yes. It reduces the cost of collecting information, certainly: scrapers for example can automate the collection and combination of hundreds of documents; other tools can automate cleaning, combining, comparing or checking information. But it also offers opportunities in reducing the cost of distribution (for example automation and personalisation), collaboration, and even visual treatments. Continue reading

Web security for journalists – takeaway tips and review

Web security for journalists - book cover

Early in Alan Pearce‘s book on web security, Deep Web for Journalists, a series of statistics appears that tell a striking story about the spread of surveillance in just one country.

199 is the first: the number of data mining programs in the US in 2004 when 16 Federal agencies were “on the look-out for suspicious activity”.

Just six years later there were 1,200 government agencies working on domestic intelligence programs, and 1,900 private companies working on domestic intelligence programs in the same year.

As a result of this spread there are, notes Pearce, 4.8m people with security clearance “that allows them to access all kinds of personal information”. 1.4m have Top Secret clearance.

But the most sobering figure comes at the end: 1,600 – the number of names added to the FBI’s terrorism watchlist each day.

Predictive policing

This is the world of predictive policing that a modern journalist must operate in: where browsing protesters’ websites, making particular searches, or mentioning certain keywords in your emails or tweets can put you on a watchlist, or even a no-fly list. An environment where it is increasingly difficult to protect your sources – or indeed for sources to trust you.

Alan Pearce’s book attempts to map this world – and outline the myriad techniques to avoid compromising your sources. Continue reading