Where do I get that data? New Q&A site launched

Get the Data

Well here’s another gap in the data journalism process ever-so-slightly plugged: Tony Hirst blogs about a new Q&A site that Rufus Pollock has built. Get the Data allows you to “ask your data related questions, including, but not limited to, the following:

  • “where to find data relating to a particular issue;
  • “how to query Linked Data sources to get just the data set you require;
  • “what tools to use to explore a data set in a visual way;
  • “how to cleanse data or get it into a format you can work with using third party visualisation or analysis tools.”

As Tony explains (the site came out of a conversation between him and Rufus):

“In some cases the data will exist in a queryable and machine readable form somewhere, if only you knew where to look. In other cases, you might have found a data source but lack the query writing expertise to get hold of just the data you want in a format you can make use of.”

He also invites people to help populate the site:

“If you publish data via some sort of API or queryable interface, why not considering posting self-answered questions using examples from your FAQ?

“If you’re running a hackday, why not use GetTheData.org to post questions arising in the scoping the hacks, tweet a link to the question to your event backchannel and give the remote participants a chance to contribute back, at the same time adding to the online legacy of your event.”

Off you go then.

Bootstrapping GetTheData.org for All Your Public Open Data Questions and Answers

Where can I find a list of hospitals in the UK along with their location data? Or historical weather data for the UK? Or how do I find the county from a postcode, or a book title from its ISBN? And is there any way you can give me RDF Linked Data in a format I can actually use?!

With increasing amounts of data available, it can still be hard to:

– find the data you you want;
– query a datasource to return just the data you want;
– get the data from a datasource in a particular format;
– convert data from one format to another (Excel to RDF, for example, or CSV to JSON);
– get data into a representation that means it can be easily visualised using a pre-existing tool.

In some cases the data will exist in a queryable and machine readable form somewhere, if only you knew where to look. In other cases, you might have found a data source but lack the query writing expertise to get hold of just the data you want in a format you can make use of. Or maybe you know the data is in Linked Data store on data.gov.uk, but you just can’t figure how to get it out?

This is where GetTheData.org comes in. Get The Data arose out of a conversation between myself and Rufus Pollock at the end of last year, which resulted with Rufus setting up the site now known as getTheData.org.

getTheData.org

The idea behind the site is to field questions and answers relating to the practicalities of working with public open data: from discovering data sets, to combining data from different sources in appropriate ways, getting data into formats you can happily work with, or that will play nicely with visualisation or analysis tools you already have, and so on.

At the moment, the site is in its startup/bootstrapping phase, although there is already some handy information up there. What we need now are your questions and answers…

So, if you publish data via some sort of API or queryable interface, why not considering posting self-answered questions using examples from your FAQ?

If you’re running a hackday, why not use GetTheData.org to post questions arising in the scoping the hacks, tweet a link to the question to your event backchannel and give the remote participants a chance to contribute back, at the same time adding to the online legacy of your event.

If you’re looking for data as part of a research project, but can’t find it or can’t get it in an appropriate form that lets you link it to another data set, post a question to GetTheData.

If you want to do some graphical analysis on a data set, but don’t know what tool to use, or how to get the data in the right format for a particular tool, that’d be a good question to ask too.

Which is to say: if you want to GetTheData, but can’t do so for whatever reason, just ask… GetTheData.org

The Independent’s Facebook innovation

The-Independent-Robert-Fisk

The Independent newspaper has introduced a fascinating new feature on the site that allows users to follow articles by individual writers and news about specific football teams via Facebook.

It’s one of those ideas so simple you wonder why no one else appears to have done it before*: instead of just ‘liking’ individual articles, or having to trudge off to Facebook to see if there’s a relevant page you can become a fan of, the Indie have applied the technology behind the ‘Like’ button to make the process of following specific news feeds more intuitive.

To that end, you can pick your favourite football team from this page or click on the ‘Like’ button at the head of any commentator’s homepage. The Independent’s Jack Riley says that the feature will be rolled out to columnists next, followed by public figures, places, political parties, and countries.

The move is likely to pour extra fuel on the overblown ‘RSS is dying‘ discussion that has been taking place recently. The Guardian’s hugely impressive hackable RSS feeds (with full content) are somewhat put in the shade by this move – but then the Guardian have generated enormous goodwill in the development community for that, and continue to innovate. Both strategies have benefits.

At the moment the Independent’s new Facebook feature is plugged at the end of each article by the relevant commentator or about a particular club. It’s not the best place to put given how many people read articles through to the end, nor the best designed to catch the eye, and it will be interesting to see whether the placement and design changes as the feature is rolled out.

It will also be interesting to see how quickly other news organisations copy the innovation.

*If I told you I said this deliberately in the hope someone would point me to a previous example – would you believe me? Martin Stabe in the comments points to The Sporting News as one organisation that got here first. And David Moynihan points out that NME have ‘Like’ buttons for each artist on their site.

More coverage at Read Write Web and Future of Media.

A portal for European government data: PublicData.eu plans

The Open Knowledge Foundation have published a blog post with notes on a site they’re developing to gather together data from across Europe. The post notes that the growth of data catalogues at both a national level (mentioning the Digitalisér.dk data portal run by the Danish National IT and Telecom Agency) and “countless city level initiatives across Europe as well – from Helsinki to Munich, Paris to Zaragoza.” with many more initiatives “in the pipeline with plans to launch in the next 6 to 12 months.”

PublicData.eu will, it says:

“Provide a single point of access to open, freely reusable datasets from numerous national, regional and local public bodies throughout Europe.

“[It] will harvest and federate this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the “discoverability problem” of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe.”

What is perhaps even more interesting for journalists is that the site plans to:

“Capture (proposed) edits, annotations, comments and uploads from the broader community of public data users.”

That might include anything from cleaner versions of data, to instances where developers match datasets together, or where users add annotations that add context to a particular piece of information.

Finally there’s a general indication that the site hopes to further lower the bar for data and collaborative journalism by:

“Providing basic data analysis and visualisation tools together with more in-depth resources for those looking to dig deeper into the data. Users will be able to personalise their data browsing experience by being able to save links and create notes and comments on datasets.”

More in the post itself. Worth keeping an eye on.

‘UGC’ and journalism: the Giffords shooting and Facebook page moderation

SarahPalinFacebook

The Obama London blog has a post looking at the moderation of comments on Sarah Palin’s Facebook page (following the Giffords shooting) which raises a couple of key points for journalists dealing with user generated content.

Editorially selected, not UGC

The first point is that it can be easy to assume user generated content is an unadulterated reflection of one community’s point of view, but in many cases it is not. A political page like Palin’s is, in many ways, no different to any piece of campaigning literature, with quotes carefully selected to reflect well on the candidate.

Political blogs – where critical comments can also be removed, should be subject to the same scepticism (MP Nadine Dorries’ claim that 70% of her blog was fiction is a good example of blog-as-political-pamphlet).

Taking a virtual trip to a Facebook page, then, is not comparable to treading the streets – or even a particular politician’s campaign team – in search of ‘the feeling on the ground’.

Inaction can be newsworthy

The second point, however, is that this very moderation can generate stories itself.

The Obama London post notes that while even constructively critical comments were removed almost instantly, one comment was left to stand (shown in the image above). And it appeared to condone the killing of 9-year-old Christina Taylor Green:

“It’s ok. Christina Taylor Green was probably going to end up a left wing bleeding heart liberal anyway. Hey, as ‘they’ say, what would you do if you had the chance to kill Hitler as a kid? Exactly.”

Drawing on the campaign literature analogy again, you can see the newsworthiness of Palin staffers leaving this comment to stand (even when other commenters highlight its offensiveness).

Had Obama London been so inclined they could have led more strongly on something like: ‘Palin staff endorse comments condoning killing of 9-year-old’, or chased up a response from the team on why the comment was not removed.

But regardless of the nature of this individual example, you can see the broader point about comments on heavily moderated Facebook pages and blogs: they represent views that the politician’s camp is prepared to condemn or condone.

Comments

By the way, the extensive comment thread on that post is well worth exploring – it details how users can flag comments for moderation, removing them from their own view of the page but not that of others, as well as users’ experiences of being barred from Facebook groups for posting mildly critical comments.

Dylan Reeve in particular expresses my point more succinctly for moderators:

“The problem with the type of moderation policy that Sarah Palin (and others) utilise in places with user-contributed content is that they effectively appear to endorse any comments that do remain published.”

In the case of Facebook pages, admins are not named, but security lapses can lead to them being revealed and recorded, as is the case with Palin’s Facebook pages.

Oh, and on the more general thread of ‘analysis’ in the wake of the Giffords shooting, this post is well worth reading.

UPDATE: More discussion of the satirical nature of the comment on Reddit (thanks Mary Hamilton)

h/t Umair Haque

Hyperlocal voices: James Hatts, SE1

This week’s Hyperlocal Voices interview looks at the long-running SE1 website, which boasts half a million visits every month. Despite being over 12 years old, the site remains at the cutting edge of online journalism, being among the first experimenters with the Google Maps API and Audioboo.

Who were the people behind the site, and what were their backgrounds?

The London SE1 website is a family-run enterprise. My father, Leigh Hatts, has a background in outdoors, arts and religious affairs journalism. I was still studying for A-levels when we started the website back in 1998. I went on to study History and Spanish at Royal Holloway, University of London, and continued to run the SE1 website even whilst living and studying in Madrid.

What made you decide to set up the site?

My father was editing a monthly what’s on guide for the City of London (ie the Square Mile) with an emphasis on things that City workers could do in their lunch hour such as attending free lectures and concerts. The publication was funded by the City of London Corporation and in later years by the Diocese of London because many of these events and activities happened in the City churches.

Our own neighbourhood – across the Thames from the City – was undergoing a big change. Huge new developments such as Tate Modern and the London Eye were being planned and built. There was lots of new cultural and community activity in the area, but no-one was gathering information about all of the opportunities available to local residents, workers and visitors in a single place.

In the 1970s and 1980s there was a community newspaper called ‘SE1’ but that had died out, and our neighbourhood was just a small part of the coverage areas of the established local papers (South London Press and Southwark News).

We saw that there was a need for high quality local news and information and decided that together we could produce something worthwhile.

When did you set up the site and how did you go about it?

We launched an ad-funded monthly printed what’s on guide called ‘in SE1’ in May 1998. At the same time we launched a website which soon grew into a product that was distinct from (but complementary to) the printed publication.

The earliest version of the site was hosted on free web space from Tripod (a Geocities rival) and was very basic.

By the end of 1998 we had registered the london-se1.co.uk domain and the site as it is today began to evolve.

In 2001 we moved from flat HTML files to a news CMS called WMNews. We still use a much-customised version. The current incarnation of our forum dates from a similar time, and our events database was developed in 2006.

What other websites influenced you?

When we started there weren’t many local news and community websites.

chiswickw4.com started at about the same time as we did and I’ve always admired it. There used to be a great site for the Paddington area called Newspad (run by Brian Jenner) which was another example of a good hyperlocal site before the term was coined.

More recently I’ve enjoyed following the development at some of the local news and listings sites in the USA, like Pegasus News and Edhat.

I also admire Ventnor Blog for the way it keeps local authorities on their toes.

How did – and do – you see yourself in relation to a traditional news operation?

I think we have quite old-fashioned news values – we place a strong emphasis on local government coverage and the importance of local democracy. That means a lot of evenings sitting in long meetings at Southwark and Lambeth town halls.

Quite often the main difference is simply speed of delivery – why should people wait a week for something to appear in a local paper when we can publish within hours or minutes?

We are able to be much more responsive to changes in technology than traditional news operations – we were one of the first news sites in the UK to integrate the Google Maps API into our content management system, and one of the earliest users of Audioboo.

What have been the key moments in the blog’s development editorially?

It’s very difficult to pinpoint ‘key moments’. I think our success has more to do with quiet persistence and consistency of coverage than any particular breakthrough. Our 12-year track record gives us an advantage over the local papers because their reporters covering our patch rarely last more than a year or two before moving on, so they’re constantly starting again from a clean slate in terms of contacts and background knowledge.

There are also several long-running stories that we’ve followed doggedly for a long time – for example the stop-start saga of the regeneration of the Elephant & Castle area, and various major developments along the riverside.

Twitter has changed things a lot for us, both in terms of newsgathering, and being able to share small bits of information quickly that wouldn’t merit writing a longer article.

Some of the key moments in our 12-year history have been as much about technical achievement as editorial.

In 2006 I developed our CMS for events listings. Since then we have carried details of more than 10,000 local events from jumble sales to public meetings and exhibitions of fine art. As well as powering a large part of the website, this system can also output InDesign tagged text ready to be imported straight onto the pages of our printed publication. How many publications have such an integrated online and print workflow?

What sort of traffic do you get and how has that changed over time?

The site consistently gets more than 500,000 page views a month.

We have a weekly email newsletter which has 7,200 subscribers, and we have about 7,500 followers on Twitter.

For us the big growth in traffic came four or five years ago. Since then there have been steady, unspectacular year-on-year increases in visitor numbers.

Consequences of covert recording of MPs’ advice surgeries

This article is a cross-post from the Wardman Wire.

We have a new Cablegate, in which Vince Cable the Business Minster has revealed that he was not carrying out his quasi-Judicial role in a takeover bid by News Corporation objectively, in the presence of Daily Telegraph undercover reporters:

I have blocked it, using the powers that I have got. And they are legal powers that I have got. I can’t politicise it, but for the people who know what is happening, this is a big thing. His whole empire is now under attack. So there are things like that, that being in Government…All we can do in opposition is protest.”

There are two angles which interest me around the intrusion of covert reporting into the Constituency Surgeries of MPs. Firstly, whether the covert reporting done was justified in the context, and then whether there will be a significant political impact.

Covert Reporting

The PCC Code of Practice states:

(*) 10 Clandestine devices and subterfuge

i) The press must not seek to obtain or publish material acquired by using hidden cameras or clandestine listening devices; or by intercepting private or mobile telephone calls, messages or emails; or by the unauthorised removal of documents or photographs; or by accessing digitally-held private information without consent.

ii) Engaging in misrepresentation or subterfuge, including by agents or intermediaries, can generally be justified only in the public interest and then only when the material cannot be obtained by other means.

but adds

The public interest

There may be exceptions to the clauses marked * where they can be demonstrated to be in the public interest.

1. The public interest includes, but is not confined to:
i) Detecting or exposing crime or serious impropriety.
ii) Protecting public health and safety.
iii) Preventing the public from being misled by an action or statement of an
individual or organisation.

2. There is a public interest in freedom of expression itself.

3. Whenever the public interest is invoked, the PCC will require editors to demonstrate fully that they reasonably believed that publication, or journalistic activity undertaken with a view to publication, would be in the public interest.

4. The PCC will consider the extent to which material is already in the
public domain, or will become so.

5. In cases involving children under 16, editors must demonstrate an exceptional public interest to over-ride the normally paramount interest of the child.

Given that the story has resulted in Vince Cable’s political role being heavily limited, and exposing his bias in a decision where he is required to be objective, I’d suggest that the subterfuge is very probably justified.

Impact on Constituency Surgeries

Politics is buried in mass lobbying from single issue campaigns by email (ask the people who run the House of Commons EMail system), demonstrations, and the rest. In this, the Constituency Surgery had given MPs at least one foot partly in touch with the ground.

As far as I am aware, this is the first time that covert recording has been used in a Constituency Surgery seriously to embarrass an MP, and I hope that MPs won’t be tempted to become much more cautious.

David Allen Green (aka Jack of Kent) has an interesting angle over at the New Statesman, pointing out that the newspapers would be fully aware of the views of Lib Dem Ministers:

… the Daily Telegraph’s lobby correspondents routinely hear what Liberal Democrat MPs are “really saying” about the Coalition. But because these conversations are on lobby terms, any criticisms will not be attributed to the MP in question.

but that therefore it was therefore necessary to record covertly somewhere else in order for direct ‘evidence’ to be obtained, and that this may form the thin end of a very long wedge. That “somewhere else” was the Constituency Surgery.

As a general rule, the constituency surgery of an MP should not be the place for secret recordings. That said, the confidentiality of the constituency surgery is there to protect the constituent, and not the MP (just as legal professional privilege is there to protect the client and not the lawyer). And so it is open for any constituent (real or supposed) to disclose what is said by an MP. So, on this basis, the Daily Telegraph’s secret recordings do not so far breach any grand political or legal principle.

However, there is some cause for concern. One suspects that the first use of interceptions of voicemails by tabloid reporters had a solid public interest basis; but it was quickly realised that such material was a rich seam to be mined just for trivial stories. Similarly, one hopes that newspapers do not now see constituency surgeries as “fair game”. The secret recording of a constituent would never be appropriate: there will always need to be a private space where a constituent can speak candidly to his or her Member of Parliament.

A variety of security measures would be available, ranging from verification of home addresses as being in the MP’s constituency to metal detectors and searches. Portable fingerprint scanners are now in use by the police routinely.

In a different context there was a conversation several years ago about whether the full Islamic veil was appropriate for an MP’s surgery, sparked off by Jack Straw.

Now corporations get the open data treatment

OpenCorporates __ The Open Database Of The Corporate World

In September I blogged about Chris Taggart’s website Open Charities, which opened up data from the Charity Commission website.

Today Taggart – along with Rob McKinnon – launches Open Corporates, which opens up companies information. This is a huge undertaking, but a vital one. As the site’s About page explains:

“Few parts of the corporate world are limited to a single country, and so the world needs a way of bringing the information together in a single place, and more than that, a place that’s accessible to anyone, not just those who subscribe to proprietary datasets.”

Taggart and McKinnon are well placed to do this. In addition to charities data, Taggart has created websites that make it easier to interrogate council spending data and hyperlocal websites; McKinnon has done the same for the New Zealand parliament and UK lobbying.

Below is a video explaining how you can interrogate data from the site using Google Refine. The site promises an API soon.

Games, systems and context in journalism at News Rewired

I went to News Rewired on Thursday, along with dozens of other journalists and folk concerned in various ways with news production. Some threads that ran through the day for me were discussions of how we publish our data (and allow others to do the same), how we link our stories together with each other and the rest of the web, and how we can help our readers to explore context around our stories.

Continue reading

Leaving Delicious – which replacement service will you use? (Comment call)

Leaving Delicious - other services already being bookmarked on my network

UPDATE: I’ve created a spreadsheet where you can add information about the various services and requirements. Please add what you can.

Delicious, it appears, is going to be closed down. I am hugely sad about this – Delicious is possibly the most useful tool I use as a journalist, academic and writer. Not just because of the way it makes it possible for me to share, store and retrieve information very easily – but because of the network of other users doing just the same whose overlapping fields of information I can share.

I follow over 100 people in my Delicious network, and my biggest requirement of any service that I might switch to is that as many of those people move there too.

So I’d like to ask: if Delicious does shut down, where will you move to? Publish2? Pinboard.in? Diigo? Google Reader (sorry, not functional enough for me)?  Or something else? (Here are some ideas) Please post your comments.