SFTW: How to grab useful political data with the They Work For You API

They Work For You

It’s been over 2 years since I stopped doing the ‘Something for the Weekend’ series. I thought I would revive it with a tutorial on They Work For You and Google Refine…

If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs for those constituencies – the They Work For You API can save you hours of fiddling – if you know how to use it.

An API is – for the purposes of journalists – a way of asking questions for reams of data. For example, you can use an API to ask “What constituency is each of these postcodes in?” or “When did these politicians enter office?” or even “Can you show me an image of these people?”

The They Work For You API will give answers to a range of UK political questions on subjects including Lords, MLAs (Members of the Legislative Assembly in Northern Ireland), MPs, MSPs (Members of the Scottish Parliament), select committees, debates, written answers, statements and constituencies.

When you combine that API with Google Refine you can fill a whole spreadsheet with additional political data, allowing you to answer questions you might otherwise not be able to.

I’ve written before on how to use Google Refine to pull data into a spreadsheet from the Google Maps API and the UK Postcodes API, but this post takes things a bit further because the They Work For You API requires something called a ‘key’. This is quite common with APIs so knowing how to use them is – well – key. If you need extra help, try those tutorials first. Continue reading

How to collaborate (or crowdsource) by combining Delicious and Google Docs

RSS girl by Heather Weaver

RSS girl by HeatherWeaver on Flickr

During some training in open data I was doing recently, I ended up explaining (it’s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is.

In a Google Docs spreadsheet the formula =importfeed will pull information from an RSS feed and put it into that spreadsheet. Titles, links, datestamps and other parts of the feed will each be separated into their own columns.

When combined with Delicious, this can be a useful way to collect together pages that have been bookmarked by a group of people, or any other feed that you want to analyse.

Here’s how you do it: Continue reading

Learning about community strategies: 10 lessons

Back in February I blogged about the process of teaching journalism students to think about working with communities. The results have been positive: even where the strategy itself wasn’t successful, the individuals have learned from its execution, its research, or both. And so, for those who were part of this process – and anyone else who’s interested – I thought I would summarise 10 key themes that came through the resulting work.

1. A community strategy isn’t something you can execute effectively in one month

Perhaps the number one lesson that people drew from the experience was that they should have started early, and done little, often, rather than a lot all at once. There was a tendency to underestimate the needs of community management and a need for better time management.

Communities needed time to “grow organically”, wrote one; it wasn’t a top down approach. Members might also have felt they were being “manipulated” when weeks of inactivity were followed by a flood of posts, links and questions.

Continue reading

FAQ: How can broadcasters benefit from online communities?

Here’s another set of questions I’m answering in public in case anyone wants to ask the same:

How can broadcasters benefit from online communities?

Online communities contain many individuals who will be able to contribute different kinds of value to news production. Most obviously, expertise, opinion, and eyewitness testimony. In addition, they will be able to more effectively distribute parts of a story to ensure that it reaches the right experts, opinion-formers and eyewitnesses. The difference from an audience is that a community tends to be specialised, and connected to each other.

If you rephrase the question as ‘How can broadcasters benefit from people?’ it may be clearer.

How does a broadcaster begin to develop an engaged online community, any tips?

Over time. Rather than asking about how you develop an online community ask yourself instead: how do you begin to develop relationships? Waiting until a major news event happens is a bad strategy: it’s like waiting until someone has won the lottery to decide that you’re suddenly their friend.

Journalists who do this well do a little bit every so often – following people in their field, replying to questions on social networks, contributing to forums and commenting on blogs, and publishing blog posts which are helpful to members of that community rather than simply being about ‘the story’ (for instance, ‘Why’ and ‘How’ questions behind the news).

In case you are aware of networks in the middle east, do you think they are tapping into online communities and social media adequately?

I don’t know the networks well enough to comment – but I do think it’s hard for corporations to tap into communities; it works much better at an individual reporter level.

Can you mention any models whether it is news channels or entertainment television which have developed successful online communities, why do they work?

The most successful examples tend to be newspapers: I think Paul Lewis at The Guardian has done this extremely successfully, and I think Simon Rogers’ Data Blog has also developed a healthy community around data and visualisation. Both of these are probably due in part to the work of Meg Pickard there around community in general.

The BBC’s UGC unit is a good example from broadcasting – although that is less about developing a community as about providing platforms for others to contribute, and a way for journalists to quickly find expertise in those communities. More specifically, Robert Peston and Rory Cellan-Jones use their blogs and Twitter accounts well to connect with people in their fields.

Then of course there’s Andy Carvin at NPR, who is an exemplar of how to do it in radio. There’s so much written about what he does that I won’t repeat it here.

What are the reasons that certain broadcasters cannot connect successfully with online communities?

I expect a significant factor is regulation which requires objectivity from broadcasters but not from newspapers. If you can’t express an opinion then it is difficult to build relationships, and if you are more firmly regulated (which broadcasting is) then you take fewer risks.

Also, there are more intermediaries in broadcasting and fewer reporters who are public-facing, which for some journalists in broadcasting makes the prospect of speaking directly to the former audience that much more intimidating.

When information is power, these are the questions we should be asking

Various commentators over the past year have made the observation that “Data is the new oil“. If that’s the case, journalists should be following the money. But they’re not.

Instead it’s falling to the likes of Tony Hirst (an Open University academic), Dan Herbert (an Oxford Brookes academic) and Chris Taggart (a developer who used to be a magazine publisher) to fill the scrutiny gap. Recently all three have shone a light into the move towards transparency and open data which anyone with an interest in information would be advised to read.

Hirst wrote a particularly detailed post breaking down the results of a consultation about higher education data.

Herbert wrote about the publication of the first Whole of Government Accounts for the UK.

And Taggart made one of the best presentations I’ve seen on the relationship between information and democracy.

What all three highlight is how control of information still represents the exercise of power, and how shifts in that control as a result of the transparency/open data/linked data agenda are open to abuse, gaming, or spin. Continue reading

In Spanish: The inverted pyramid of data journalism part 2

Mauro Accurso has followed up his rapid translation of last week’s inverted pyramid of data journalism with a Spanish version of part 2: the 6 C’s of communicating data journalism. It’s copied in full below.

La semana pasada les traduje la primera parte de La Pirámide Invertida del Periodismo de Datos de Paul Bradshaw que prometió extender en el aspecto de comunicación del extenso proceso que significa el periodismo de datos.

comunicar periodismo de datosEn esta segunda parte Paul recorre 6 formas diferentes de comunicar en periodismo de datos que pueden ver en el cuadro de arriba y al final encontrarán un gráfico que resume toda la teoría (la cual está en desarrollo todavía y Bradshaw pide aportes, comentarios y sugerencias):

Continue reading

6 ways of communicating data journalism (The inverted pyramid of data journalism part 2)

UPDATE: A new version of the inverted pyramid, with new stages and resources for each, is now available.

Last week I published an inverted pyramid of data journalism which attempted to map processes from initial compilation of data through cleaning, contextualising, and combining that. The final stage – communication – needed a post of its own, so here it is.

UPDATE: Now in Spanish too.

Below is a diagram illustrating 6 different types of communication in data journalism. (I may have overlooked others, so please let me know if that’s the case.)

Communicate: visualised, narrate, socialise, humanise, personalise, utilise

Modern data journalism has grown up alongside an enormous growth in visualisation, and this can sometimes lead us to overlook different ways of telling stories involving big numbers. The intention of the following is to act as a primer for ensuring all options are considered.
Continue reading

An experiment in creating an ‘Auto-Debunker’ twitter account

As the conspiracy theories flew around last Friday, one in particular caught fire: the idea that the News Of The World might have been closed down because it would then allow for its assets – i.e. incriminating evidence – to be destroyed.

Perhaps because it was published under the Reuters brand (although the byline abrogated them of any responsibility for its contents) by the end of the day it had accumulated over 4,000 retweets.

I had already personally tweeted a couple of those users to point out that comments on the article had quickly debunked its argument. And by 6.26 that evening David Allen Green had published an explanation of the flaws in a piece at the New Statesman.

But people were still retweeting: how to connect the two?

Creating @autodebunker

It took me all of 20 minutes to hack together a simple automated service that would reply to people retweeting the Reuters blog post. Continue reading

The inverted pyramid of data journalism – in Spanish

Barely 7 hours after I published yesterday’s ‘Inverted pyramid of data journalism‘, it had been translated into Spanish – by the wonderful Mauro Accurso. The post is copied in full below.

Ya hace un tiempo traduje todo el Modelo para la redacción del siglo XXI cuya parte principal es el Diamante de noticias en contraposición a la clásica pirámide invertida que enseñan en cualquier facultad de periodismo (luego vimos el ciclo de vida de las noticias digitales: el diamante de noticias reimaginado y otra vez eldiamante de noticias reinterpretado).

Pero ahora una vez más Paul Bradshaw nos trae un diagrama interesante para, en este caso, explicar el proceso de creación del periodismo de datos. Esta pirámide invertida del periodismo de datos muestra de forma simple como se avanza desde una gran cantidad de información que incrementalmente se va enfocando hasta llegar al punto de comunicar los resultados a la audiencia de la forma más clara posible. A continuación, la traducción del artículo donde podemos ver lasdiferentes etapas del proceso de data journalism: Continue reading