Author Archives: Paul Bradshaw

How to: convert XML or JSON into spreadsheets using Open Refine

curly brackets

Curly brackets pattern by Dan McCullough

One of the most useful applications of the data cleaning tool Open Refine (formerly Google Refine) is converting XML and JSON files into spreadsheets that you can interrogate in Excel.

Surprisingly, I’ve never blogged about it. Until now. Continue reading

If you’re worried about the future of FOI, here’s what you can do about it

Independent front page on FOISo, the commission that has been formed to look into Freedom of Information in the UK is worrying a lot of people, particularly journalists. From the selection of its members and lack of transparency to suggestions of vetoes and charges, there’s a strong signal of an intention to curtail the ‘free’ in ‘freedom’.

But there is an opportunity to have an input into the commission, through its call for evidence. This not only allows you to send your opinions on improving FOI to the commission via email, but also has an online form you can fill in.

If you take the form route there are 2 key questions:

  1. What protection should there be for internal deliberations of public bodies
  2. What protection should there be for information which relates to the process of collective Cabinet discussion and agreement
  3. What protection should there be for information which involves candid assessment of risks
  4. Should the executive have a veto (subject to judicial review) over the release of information
  5. What is the appropriate enforcement and appeal system
  6. And is the burden imposed on public authorities under the Act justified by the public interest in the public’s right to know

Whether you think it’s a foregone conclusion or not, this is a key opportunity to have a shot.

Giving a voice to the (literally) voiceless: data journalism and the dead

Red and blue person icons indicating the dead

In the Bureau’s Naming the Dead visualisation, blue indicates civilian victims and red alleged militants

Giving a voice to the voiceless is one of the core principles of journalism. Traditionally this means those without the power or money to amplify their own voices, but in recent years a strand of work has developed in data journalism that deserves particular attention: projects which give a voice to people who literally don’t have one — because they are dead. Continue reading

Research: regional publishers may be risking their sources and their brands

Whistle with spikes

Journalists say sources are less willing to talk because they are afraid of employers. Image by Terry Border

Local journalists don’t know how to protect their social media accounts, or the law regarding sources, and they don’t know what their employers are doing about online security.

That’s the upshot of research that I conducted with dozens of reporters around the UK – and it’s so important I’ve organised an event to tackle it.

Here are some of the key findings…

Journalists could be compromising colleagues – but they don’t think security affects them

Over the past year it’s been revealed that UK police forces have been accessing regional journalists’ communications, and at least one local authority has used its powers to spy on journalists meeting an employee: security isn’t just about GCHQ and Edward Snowden.

Social media accounts that have been hacked in the past few years include those reporting on subjects as innocuous as entertainment and the weather, while commercial organisations including Microsoft and Vodafone have hacked journalists’ communications when they wrote about them. This week a journalist was found guilty of helping hackers access a newspaper CMS, causing almost $1m in damage.

But local journalists’ and editors’ perception of the issue is that security is “another planet”, there’s no strategy for protecting branded social media accounts, and it is assumed reporters who routinely need to protect their sources are “usually pretty conversant with that kind of issue”.

Unfortunately, on the whole they are not. More than one experienced crime reporter that I spoke to operated on the basis that police requests to access their sources would come through the newspaper. “They’ve never taken action to gain that information from me,” one said.

But the key thing that I’ve discovered is that networked working practices in modern newsrooms mean that information regarding sensitive stories can still be accessed through communications with colleagues who do not consider security to affect them.

1 in 5 lack even basic password security

Despite feeling that security issues did not affect them, around half of journalists had made some changes to their behaviour online in the past year.

But a significant proportion of journalists were not even using different passwords for different accounts – one of the most basic security practices.

22% of journalists do not use different passwords for different accounts


16% of journalists did not do any of the following: use different passwords, clear their browser history, turn off cookies, turn off geolocation or use enhanced privacy settings on social media.

What are publishers doing about information security?

Despite hundreds of journalists and many editors signing Press Gazette’s Save Our Sources petition last year, there is no indication of leadership or communication from the top on the issue of source protection.

Journalists overwhelmingly said that they did not know what their organisation was doing about internet security. But perhaps more importantly, editors did not know either. “I should know the answer to that,” said one, “and it’s worrying that I don’t.”

88% of journalists do not know what their employers are doing regarding security

31% of journalists said their employer was doing enough to protect employees and sources


Strangely, even though only 4% of respondents said that their employers had taken steps in the last 12 months on the issue, almost a third of respondents made the leap of faith to say that their employers were “doing enough”.

Newsroom processes aren’t set up for modern law and technology

One thing became very clear: newsrooms and work processes are still set up for an analogue world where protecting sources is a reactive process. Discussions about sensitive sources focus on a potential legal defence if approached directly. No processes are in place to anticipate or prevent sources’ identities being accessed indirectly.

Likewise IT policies focus on protecting email – but there is little consideration to securing social media accounts.

And journalists felt unable to advise sources who were unwilling to talk because of workplace surveillance and contracts with ‘gagging’ clauses.

What I’m doing about it

I’ve organised an event to try to begin to address these issues, with people who have been directly affected, experts on law (including employment law) and people who can advise on the technical side. It’s in Salford at BBC in Media City on Friday November 6 – you can sign up here.

Copying images from the web: £6,000 damages awarded in copyright case

copyright football by Banksy

Copyright Football by Banksy, image by Duncan Hull

A recent decision by IPEC (Intellectual Property Enterprise Court) might make journalists, marketers and bloggers think twice before they reproduce images from other people’s websites, reports Cleland Thom.

The presiding judge, Richard Hacon, awarded £6,000 damages against a home improvements company who copied and pasted photos from another site.

The damages were on top of the basic cost of the images, and reflected the fact that the company’s action was ‘flagrant’. Continue reading

How The Telegraph liveblog historical anniversaries

Rudolf Heitsch's flamethrower-equipped Dornier 17 on the ground near Shoreham

A public domain image from Laurence’s Battle of Britain liveblog

The Telegraph’s Laurence Dodds has an unusual claim to fame: he has liveblogged not just one, but four, historical anniversaries: the fall of the Berlin Wall; the funeral of Winston Churchill; the anniversary of Waterloo; and the 75th anniversary of the Battle of Britain.

Anniversary liveblogging is a particularly under-recognised sub-genre which can be enormously successful, and yet there’s very little written about it.

So I asked Laurence what it involved, and what he’s learned from his experiences. Continue reading

Periodismo de datos: Un golpe rápido

Periodismo de datos… ebookMy ebook Data Journalism Heist is now available in a specially reduced Spanish translation.

Periodismo de datos: Un golpe rápido was translated by Cuban journalist Barbara Maseda, and is available in PDF, iPad and Kindle formats. The recommended price is $5.99 but a special minimum price of $1.19 is available for journalists working in countries where the full price would be too expensive.

The publication follows the release of the Spanish version of my book on Excel for journalists, Excel para periodistas, earlier this year.