Monthly Archives: January 2011

Getting Started With Local Council Spending Data

With more and more councils doing as they were told and opening up their spending data in the name of transparency, it’s maybe worth a quick review of how the data is currently being made available.

To start with, I’m going to consider the Isle of Wight Council’s data, which was opened up earlier this week. The first data release can be found (though not easily?!) as a pair of Excel spreadsheets, both of which are just over 1 MB large, at http://www.iwight.com/council/transparency/ (This URL reminds me that it might be time to review my post on “Top Level” URL Conventions in Local Council Open Data Websites!)

The data has also been released via Spikes Cavell at Spotlight on Spend: Isle of Wight.

The Spotlight on Spend site offers a hierarchical table based view of the data; value add comes from the ability to compare spend with national averages and that of other councils. Links are also provided to monthly datasets available as a CSV download.

Uploading these datasets to Google Fusion tables shows the following columns are included in the CSV files available from Spotlight on Spend (click through the image to see the data):

Note that the Expense Area column appears to be empty, and “clumped” transaction dates use? Also note that each row, column and cell is commentable upon

The Excel spreadsheets on the Isle of Wight Council website are a little more complete – here’s the data in Google Fusion tables again (click through the image to see the data):

(It would maybe worth comparing these columns with those identified as Mandatory or Desirable in the Local Spending Data Guidance? A comparison with the format the esd use for their Linked Data cross-council local spending data demo might also be interesting?)

Note that because the Excel files on the Isle of Wight Council were larger than the 1MB size limit on XLS spreadsheet uploads to Google Fusion Tables, I had to open the spreadsheets in Excel and then export them as CSV documents. (Google Fusion Tables accepts CSV uploads for files up to 100MB.) So if you’re writing an open data sabotage manual, this maybe something worth bearing in mind (i.e. publish data in very large Excel spreadsheets)!

It’s also worth noting that if different councils use similar column headings and CSV file formats, and include a column stating the name of the council, it should be trivial to upload all their data to a common Google Fusion Table allowing comparisons to be made across councils, contractors with similar names to be identified across councils, and so on… (i.e. Google Fusion tables would probably let you do as much as Spotlight on Spend, though in a rather clunkier interface… but then again, I think there is a fusion table API…?;-)

Although the data hasn’t appeared there yet, I’m sure it won’t be long before it’s made available on OpenlyLocal:

However, the Isle of Wight’s hyperlocal news site, Ventnorblog teamed up with a local developer to revise Adrian Short’s Armchair Auditor code and released the OnTheWIght Armchair Auditor site:

So that’s a round up of where the data is, and how it’s presented. If I get a chance, the next step is to:
– compare the offerings with each other in more detail, e.g. the columns each view provides;
– compare the offerings with the guidance on release of council spending data;
– see what interesting Google Fusion table views we can come up with as “top level” reports on the Isle of Wight data;
– explore the extent to which Google Fusion Tables can be used to aggregate and compare data from across different councils.

PS related – Nodalities blog: Linked Spending Data – How and Why Bother Pt2

PPS for a list of local councils and the data they have released, see Guardian datastore: Local council spending over £500, OpenlyLocal Council Spending Dashboard

Advertisements

While you’re waiting for Yahoo! to make its mind up about Delicious, sign up to Trunk.ly

Despite the incredible work done on the spreadsheet comparing social bookmarking services I am yet to find one that does everything that I use Delicious for (background here). One service I have been using, however, is Trunk.ly.

Once you’ve imported your existing bookmarks from Delicious Trunk.ly stores any new ones you bookmark on Delicious, keeping the backup up to date. In addition it can store any links you’ve shared on Twitter, Facebook, Google Reader and any RSS feed.

It is essentially a search engine for links you may have shared at some point – but its technical limitations stop it from being much more. For example, there do not appear to be any RSS feeds for tags*, and there is no facility to combine tags to find items that are, for example, tagged with ‘privacy’ and ‘tools’. (It would also be nice if it tagged links shared on Twitter with any hashtags in the tweet)

That said if, like me, you want to continue using Delicious but with an ongoing backup in case, Trunk.ly appears a sound choice. And it’s early days, so here’s hoping they add those features soon… *cough*.

*Planned apparently. See Trunk.ly in the comments below.

Investigations tool DocumentCloud goes public (PS: documents drive traffic)

The rather lovely DocumentCloud – a tool that allows journalists to share, annotate, connect and organise documents – has finally emerged from its closet and made itself available to public searches.

This means that anyone can now search the powerful database (some tips here) of newsworthy documents. If you want to add your own, however, you still need approval.

If you do end up on this list you’ll find it’s quite a powerful tool, with quick conversion of PDFs into text files, analytic tools and semantic tagging (so you can connect all documents with a particular person, or organisation) among its best features. The site is open source and has an API too.

I asked Program Director Amanda B Hickman what she’s learned on the project so far. Her response suggests that documents have a particular appeal for online readers:

“If we’ve learned anything, it is that people really love documents. It is pretty clear that when there’s something interesting going on in the news, plenty of people want to dig a little deeper. When Arizona Republic posted an annotated version of that state’s new immigration law, it got more traffic than their weekly entertainment round up. WNYC told us that the page listing the indictments in last week’s mob roundup was still getting more traffic than any other single news story even a week later.

“These were big news documents, to be sure, but it still seems pretty clear that people do want to dig deeper and explore the documents behind the news, which is great for us and great for news.”

Content, context and code: verifying information online

ContentContextCode_VerifyingInfo

When the telephone first entered the newsroom journalists were sceptical. “How can we be sure that the person at the other end is who they say they are?” The question seems odd now, because we have become so used to phone technology that we barely think of it as technology at all – and there are a range of techniques we use, almost unconsciously, to verify what the person on the other end of the phone is saying, from their tone of voice, to the number they are ringing from, and the information they are providing.

Dealing with online sources is no different. How do you know the source is telling the truth? You’re a journalist, for god’s sake: it’s your job to find out.

In many ways the internet gives us extra tools to verify information – certainly more than the phone ever did. The apparent ‘facelessness’ of the medium is misleading: every piece of information, and every person, leaves a trail of data that you can use to build a picture of its reliability.

The following is a three-level approach to verification: starting with the content itself, moving on to the context surrounding it; and finishing with the technical information underlying it. Most of the techniques outlined take very little time at all but the key thing is to look for warning signs and follow those up. Continue reading

Content, context and code: verifying information online

When the telephone first entered the newsroom journalists were sceptical. “How can we be sure that the person at the other end is who they say they are?” The question seems odd now, because we have become so used to phone technology that we barely think of it as technology at all – and there are a range of techniques we use, almost unconsciously, to verify what the person on the other end of the phone is saying, from their tone of voice, to the number they are ringing from, and the information they are providing.

Dealing with online sources is no different. How do you know the source is telling the truth? You’re a journalist, for god’s sake: it’s your job to find out.

In many ways the internet gives us extra tools to verify information – certainly more than the phone ever did. The apparent ‘facelessness’ of the medium is misleading: every piece of information, and every person, leaves a trail of data that you can use to build a picture of its reliability. Continue reading

Sri Lanka war crimes and the future of international journalism

Here’s a quick thought about a problem of international reporting: sources. Your viewers and readers are in your country, while your sources are largely not (there are exceptions such as CNN or the BBC, but humour me).

In order to make contact with the people and evidence who can help answer your questions, you have to rely far more on your personal network than, for example, a home affairs or education correspondent.

But the globalisation of modern news – and the ability of people to search on the internet for information related to their own experiences – has changed this. Now, if you report on an issue in another country, people in that country can see what you’ve written and contact you with further information.

In a nutshell this reflects the way that journalism has moved from a ‘push’ medium limited by transmission and distribution infrastructure, to a ‘pull’ (search) and ‘pass’ (social media) one.

Three particularly strong examples of this: Channel 4’s ongoing reporting on the civil war in Sri Lanka and evidence of war crimes. Video footage that was obtained as part of that journalism was, eventually, seen by someone who recognised one of the bodies. (A particularly good lesson for budding journalists is how photos of those bodies were dated using EXIF data, and correlated with documentary evidence from the Sri Lankan MOD – material that don’t lend themselves to broadcast, but can be put online)

Second, Paul Lewis’ investigation into the death of a man being deported to Angola. One of the passengers on the plane where he died was a US citizen who works in Angola. He contacted Lewis after coming across a tweet calling for witnesses.

Third, Paul Lewis again, and the death of Ian Tomlinson at G20 protests. This was again provided by a US citizen who happened to be in the UK at the time and came across the story after he returned home.

Curiously, of course, these two latter stories are not examples of international journalism in terms of their subject – but they do highlight how the web can make international newsgathering part of home affairs stories too.