Where should an aspiring data journalist start?

In writing last week’s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I’d publish them in full here.

The Telegraph’s Conrad Quilty-Harper:

Start reading:

http://www.google.com/reader/bundle/user%2F06076274130681848419%2Fbundle%2Fdatavizfeeds

Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.

Look for sources of data:

ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).

Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts. Continue reading

Time to talk about legal

As a lone blogger how much legal protection do you have? No more than anyone else, when it comes to libel, contempt of court law and so on, except that people are more likely to pay attention to large media organisations.

But there are many instances where bloggers have lost a lot of time and money over legal disputes. Last week, for example, journalist and blogger Dave Osler finally saw an end to a legal battle that consumed three years of his life, after he was sued for libel by the political activist Johanna Kaschke. Despite being refused the right to appeal the strike-out of the Osler case, she is still planning to appeal another High Court decision that ended her libel claim against Alex Hilton and John Gray.

If all individual bloggers worried about getting into trouble too much, we’d write much less than we do. Even big scary cases aren’t a deterrent: Dave Osler is still blogging. I was personally surprised by the results of my survey of 71 small online publishers this summer. Not that only 27 per cent had been involved in legal disputes (that was about what I expected) but that over half were satisfied with the number of legal resources available.

Personally, the grey areas of law trouble me and I don’t think there could be enough support: I’d like to see more organised structures for legal help, a sort of Citizens Advice Bureau for bloggers, if you like. Informal advice is already spreading via social networks, as lawyers increasingly use Twitter and blogs to join the conversation.

As I reported on my site Meeja Law, one hyperlocal blogger who was accused of breach of copyright asked for legal advice via Twitter: “Two separate media lawyers confirmed (for free) that I’d done nothing wrong. I also contacted [hyperlocal organisation] Talk About Local for advice, and they told me the same.”

Talk About Local has published several media law guides online (eg. this one on defamation) and the organisation’s founder William Perrin offers some frank legal advice ahead of a legal session at last weekend’s London Local Neighbourhoods Online Unconference:

…just about the best legal advice, which very few follow is to set up a 
limited company and keep the website inside that. Then you don’t lose 
your house to a nutter under defamation law….

Another concern of mine is the lack of transparency of courts data, something I’ve discussed at length here. I think bloggers should be able to access more information about cases; at the very least, the Ministry of Justice needs to consider its outmoded contempt of court law that is ill-equipped to deal with the online age.

In the coming months, I’d like to build up the conversation in this area and think about how we might approach some of these issues. If you’d like to be part of this informal online ‘working group’ please consider joining the Help Me Investigate challenge at this link (request membership here), or discussing via the OJB Facebook group.

UPDATE [Paul Bradshaw]: I’ve created a LinkedIn group as a place for people to more openly discuss how to take this forward.

Judith Townend (@jtownend on Twitter) is a PhD research student at City University London and freelance journalist.

Hyperlocal Voices: Julia Larden (Acocks Green Focus Group)

Hyperlocal voices - Acocks Green Focus Group blog

Today’s Hyperlocal Voices interview is with Julia Larden, chair of the Acocks Green Focus Group blog, which campaigns to make Acocks Green a “better place to live, work and shop”. The group was established in 2004 and the blog followed in 2007. “We are less likely to get confused or get our facts slightly muddled” than professional journalists, says Julia. Here’s the full interview:

Who were the people behind the blog, and what were their backgrounds before setting it up?

That’s a bit complicated. Originally the blog was set up, more as a straight website, by a member who has long since left the area. It was not working very well at that time, and the ex-member was also asking for quite a lot of money to carry it on. I don’t think the member had any particular background in IT – he was in education, although he has set up a few small websites of his own. I had done some work for it, written some materials and supplied some photographs. My son, who runs a small software company, agreed to take the whole thing into his care for a bit.

Things lay dormant and then, when my son had time he simply picked the content up and plonked the whole thing into a WordPress blog – one of the slightly posher ones that you have to pay a bit for, but he has some sort of contract and can get quite a few of these blogs, so the group just pays him a very nominal sum each year.

It then sat there for a bit longer with not very much happening except the occasional comment, and then several members pointed out that it was a valuable resource which we were not using properly.

One of the members had web experience (running her own online teaching company) and started to make it into a far more interesting blog, asking for more materials, creating new pages and adding in bits and pieces and an opinion survey of the area – as a launch gimmick. (We have kept that – it still gets a lot of interest – more since I shifted it to another page, for some reason.) Continue reading

"The mass market was a hack": Data and the future of journalism

The following is an unedited version of an article written for the International Press Institute report ‘Brave News Worlds (PDF)

For the past two centuries journalists have dealt in the currency of information: we transmuted base metals into narrative gold. But information is changing.

At first, the base metals were eye witness accounts, and interviews. Later we learned to melt down official reports, research papers, and balance sheets. And most recently our alloys have been diluted by statements and press releases.

But now journalists are having to get to grips with a new type of information: data. And this is a very rich seam indeed.

Data: what, how and why

Data is a broad term so I should define it here: I am not talking here about statistics or numbers in general, because those are nothing new to journalists. When I talk about data I mean information that can be processed by computers.

This is a crucial distinction: it is one thing for a journalist to look at a balance sheet on paper; it is quite another to be able to dig through those figures on a spreadsheet, or to write a programming script to analyse that data, and match it to other sources of information. We can also more easily analyse new types of data, such as live data, large amounts of text, user behaviour patterns, and network connections.

And that, for me, is hugely important. Indeed, it is potentially transformational. Adding computer processing power to our journalistic arsenal allows us to do more, faster, more accurately, and with others. All of which opens up new opportunities – and new dangers. Things are going to change. Continue reading

Why did you get into data journalism?

In researching my book chapter (UPDATE: now published) I asked a group of journalists who worked with data what led them to do so. Here are their answers:

Jonathon Richards, The Times:

The flood of information online presents an amazing opportunity for journalists, but also a challenge: how on earth does one keep up with; make sense of it? You could go about it in the traditional way, fossicking in individual sites, but much of the journalistic value in this outpouring, it seems, comes in aggregation: in processing large amounts of data, distilling them, and exploring them for patterns. To do that – unless you’re superhuman, or have a small army of volunteers – you need the help of a computer.

I ‘got into’ data journalism because I find this mix exciting. It appeals to the traditional journalistic instinct, but also calls for a new skill which, once harnessed, dramatically expands the realm of ‘stories I could possibly investigate…’ Continue reading

Hyperlocal Voices: Robin Byles (Sheffieldblog.com and Crosspool.info)

Hyperlocal voices: Sheffield blog

Here’s another hyperlocal voice: Robin Byles set up Sheffieldblog in 2008 when he returned to the city after working for the BBC. The site focuses on “The kind of stuff that may get featured as an aside in the local papers, but actually people are quite interested in and in the context of online, works really well.” More recently he’s also been involved in Crosspool.info. Here’s the full interview:

Who were the people behind the blog, and what were their backgrounds before setting it up?

I set the blog up on my own. I studied Media and Communications at UCE [now Birmingham City University], moved to London where I worked at the BBC for 8 years as a web editor and have now moved back north where I’m a digital editor for the University of Sheffield.

What made you decide to set up the blog?

A mixture of things really. I had seen one or two local blogs and knew that there wasn’t a major one covering my home town of Sheffield, so quite fancied setting something up.

I think living away from the area had given me a yearning for local news but not just the traditional stuff that I could read in the local paper or local news website.

I was also interested in the stories that people were talking about that didn’t always make the normal news outlets. This interesting stuff was out there on the internet and I liked the idea of being able to collate all this content and promote it from one place – a non-automated aggregator, I suppose.

I’m very fond of my home city and the pending move back home seemed like a good excuse to get something up and running.

I was also on the lookout for jobs at the time and knew that the more varied stuff that my CV had on it – in particular a place where I could do a bit of writing – the more it would help me find work. So part of the motivation was also a professional one. Continue reading

Hyperlocal Voices: Philip John (The Lichfield Blog)

Hyperlocal voices - Lichfield Blog

In another Hyperlocal Voices post, Philip John talks about how The Lichfield Blog was launched to address a gap in local news reporting. In less than 2 years it has taken on a less opinionated tone and more “proper reporting”, picking up national recognition and covering its costs along the way.

Who were the people behind Lichfield Blog, and what were their backgrounds before setting it up?

Ross Hawkes founded the blog in January. Ross is a senior lecturer in journalism at Staffs Uni and previously worked at BPM. He started his journalistic career at the now defunct Lichfield Post. There’s also Nick, a semi-professional photographer who helps out with the creative side of things and I look after the techy side of the web site as looking after WordPress is where I specialise. We also have a good group of contributors and a couple of advisors, many of whom are either current or former journalists at local newspapers.

What made you decide to set up the blog?

Ross’ wife heard sirens going past their house one day and was curious as to where they were going. Ross realised no-one was reporting those kind of low-level goings on and that with the beat reporter disappearing there was a gap for community-focused news. Continue reading

My Henry Stewart talk about 'Blogging, Twitter and Journalism'

I’ve recorded a 48 minute presentation covering ‘Blogging, Twitter and Journalism‘ for the Henry Stewart series of talks. It’s designed for journalism students and covers

  • How blogging differs from other journalism platforms;
  • Key developments in journalism blogging history;
  • What makes a successful blog
  • What is Twitter and how is it useful for journalists and publishers? and
  • Why RSS is central to blogging and Twitter and how it works

The BBC and missed data journalism opportunities

Bar chart: UN progress on eradication of world hunger

I’ve tweeted a couple of times recently about frustrations with BBC stories that are based on data but treat it poorly. As any journalist knows, two occasions of anything in close proximity warrants an overreaction about a “worrying trend”. So here it is.

“One in four council homes fails ‘Decent Homes Standard'”

This is a good piece of newsgathering, but a frustrating piece of online journalism. “Almost 100,000 local authority dwellings have not reached the government’s Decent Homes Standard,” it explained. But according to what? Who? “Government figures seen by BBC London”. Ah, right. Any chance of us seeing those too? No.

The article is scattered with statistics from these figures “In Havering, east London, 56% of properties do not reach Decent Homes Standard – the highest figure for any local authority in the UK … In Tower Hamlets the figure is 55%.”

It’s a great story – if you live in those two local authorities. But it’s a classic example of narrowing a story to fit the space available. This story-centric approach serves readers in those locations, and readers who may be titillated by the fact that someone must always finish bottom in a chart – but the majority of readers will not live in those areas, and will want to know what the figures are for their own area. The article does nothing to help them do this. There are only 3 links, and none of them are deep links: they go to the homepages for Havering Council, Tower Hamlets Council, and the Department of Communities and Local Government.

In the world of print and broadcast, narrowing a story to fit space was a regrettable limitation of the medium; in the online world, linking to your sources is a fundamental quality of the medium. Not doing so looks either ignorant or arrogant.

“Uneven progress of UN Millennium Development Goals”

An impressive piece of data journalism that deserves credit, this looks at the UN’s goals and how close they are to being achieved, based on a raft of stats, which are presented in bar chart after bar chart (see image above). Each chart gives the source of the data, which is good to see. However, that source is simply given as “UN”: there is no link either on the charts or in the article (there are 2 links at the end of the piece – one to the UN Development Programme and the other to the official UN Millennium Development Goals website).

This lack of a link to the specific source of the data raises a number of questions: did the journalist or journalists (in both of these stories there is no byline) find the data themselves, or was it simply presented to them? What is it based on? What was the methodology?

The real missed opportunity here, however, is around visualisation. The relentless onslaught on bar charts makes this feel like a UN report itself, and leaves a dry subject still looking dry. This needed more thought.

Off the top of my head, one option might have been an overarching visualisation of how funding shortfalls overall differ between different parts of the world (allowing you to see that, for example, South America is coming off worst). This ‘big picture’ would then draw in people to look at the detail behind it (with an opportunity for interactivity).

Had they published a link to the data someone else might have done this – and other visualisations – for them. I would have liked to try it myself, in fact.

UPDATE: After reading this post, a link has now been posted to the report (PDF).

Compare this article, for example, with the Guardian Datablog’s treatment of the coalition agreement: a harder set of goals to measure, and they’ve had to compile the data themselves. But they’re transparent about the methodology (it’s subjective) and the data is there in full for others to play with.

It’s another dry subject matter, but The Guardian have made it a social object.

No excuses

The BBC is not a print outlet, so it does not have the excuse of these stories being written for print (although I will assume they were researched with broadcast as the primary outlet in mind).

It should also, in theory, be well resourced for data journalism. Martin Rosenbaum, for example, is a pioneer in the field, and the team behind the BBC website’s Special Reports section does some world class work. The corporation was one of the first in the world to experiment with open innovation with Backstage, and runs a DataArt blog too. But the core newsgathering operation is missing some basic opportunities for good data journalism practice.

In fact, it’s missing just one basic opportunity: link to your data. It’s as simple as that.

On a related note, the BBC Trust wants your opinions on science reporting. On this subject, David Colquhoun raises many of the same issues: absence of links to sources, and anonymity of reporters. This is clearly more a cultural issue than a technical one.

Of all the UK’s news organisations, the BBC should be at the forefront of transparency and openness in journalism online. Thinking politically, allowing users to access the data they have spent public money to acquire also strengthens their ideological hand in the Big Society bunfight.

UPDATE: Credit where it’s due: the website for tonight’s Panorama on public pay includes a link to the full data.

When crowdsourcing is your only option

Crowdsourced map - the price of weed

PriceOfWeed.com is a great example of when you need to turn to crowdsourcing to obtain data for your journalism. As Paul Kedrosky writes, it’s “Not often that you get to combine economics, illicit substances, map mashups and crowd-sourcing in one post like this.” The resulting picture is surprisingly clear.

And news organisations could learn a lot from the way this has been executed. Although the default map view is of the US, the site detects your location and offers you prices nearest to you. It’s searchable and browsable. Sadly, the raw data isn’t available – although it would be relatively straightforward to scrape it.

As the site expands globally it is also adding extra data on the social context – tolerance and  law enforcement. (via)