A portal for European government data: PublicData.eu plans

The Open Knowledge Foundation have published a blog post with notes on a site they’re developing to gather together data from across Europe. The post notes that the growth of data catalogues at both a national level (mentioning the Digitalisér.dk data portal run by the Danish National IT and Telecom Agency) and “countless city level initiatives across Europe as well – from Helsinki to Munich, Paris to Zaragoza.” with many more initiatives “in the pipeline with plans to launch in the next 6 to 12 months.”

PublicData.eu will, it says:

“Provide a single point of access to open, freely reusable datasets from numerous national, regional and local public bodies throughout Europe.

“[It] will harvest and federate this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the “discoverability problem” of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe.”

What is perhaps even more interesting for journalists is that the site plans to:

“Capture (proposed) edits, annotations, comments and uploads from the broader community of public data users.”

That might include anything from cleaner versions of data, to instances where developers match datasets together, or where users add annotations that add context to a particular piece of information.

Finally there’s a general indication that the site hopes to further lower the bar for data and collaborative journalism by:

“Providing basic data analysis and visualisation tools together with more in-depth resources for those looking to dig deeper into the data. Users will be able to personalise their data browsing experience by being able to save links and create notes and comments on datasets.”

More in the post itself. Worth keeping an eye on.

‘UGC’ and journalism: the Giffords shooting and Facebook page moderation

SarahPalinFacebook

The Obama London blog has a post looking at the moderation of comments on Sarah Palin’s Facebook page (following the Giffords shooting) which raises a couple of key points for journalists dealing with user generated content.

Editorially selected, not UGC

The first point is that it can be easy to assume user generated content is an unadulterated reflection of one community’s point of view, but in many cases it is not. A political page like Palin’s is, in many ways, no different to any piece of campaigning literature, with quotes carefully selected to reflect well on the candidate.

Political blogs – where critical comments can also be removed, should be subject to the same scepticism (MP Nadine Dorries’ claim that 70% of her blog was fiction is a good example of blog-as-political-pamphlet).

Taking a virtual trip to a Facebook page, then, is not comparable to treading the streets – or even a particular politician’s campaign team – in search of ‘the feeling on the ground’.

Inaction can be newsworthy

The second point, however, is that this very moderation can generate stories itself.

The Obama London post notes that while even constructively critical comments were removed almost instantly, one comment was left to stand (shown in the image above). And it appeared to condone the killing of 9-year-old Christina Taylor Green:

“It’s ok. Christina Taylor Green was probably going to end up a left wing bleeding heart liberal anyway. Hey, as ‘they’ say, what would you do if you had the chance to kill Hitler as a kid? Exactly.”

Drawing on the campaign literature analogy again, you can see the newsworthiness of Palin staffers leaving this comment to stand (even when other commenters highlight its offensiveness).

Had Obama London been so inclined they could have led more strongly on something like: ‘Palin staff endorse comments condoning killing of 9-year-old’, or chased up a response from the team on why the comment was not removed.

But regardless of the nature of this individual example, you can see the broader point about comments on heavily moderated Facebook pages and blogs: they represent views that the politician’s camp is prepared to condemn or condone.

Comments

By the way, the extensive comment thread on that post is well worth exploring – it details how users can flag comments for moderation, removing them from their own view of the page but not that of others, as well as users’ experiences of being barred from Facebook groups for posting mildly critical comments.

Dylan Reeve in particular expresses my point more succinctly for moderators:

“The problem with the type of moderation policy that Sarah Palin (and others) utilise in places with user-contributed content is that they effectively appear to endorse any comments that do remain published.”

In the case of Facebook pages, admins are not named, but security lapses can lead to them being revealed and recorded, as is the case with Palin’s Facebook pages.

Oh, and on the more general thread of ‘analysis’ in the wake of the Giffords shooting, this post is well worth reading.

UPDATE: More discussion of the satirical nature of the comment on Reddit (thanks Mary Hamilton)

h/t Umair Haque

Hyperlocal voices: James Hatts, SE1

This week’s Hyperlocal Voices interview looks at the long-running SE1 website, which boasts half a million visits every month. Despite being over 12 years old, the site remains at the cutting edge of online journalism, being among the first experimenters with the Google Maps API and Audioboo.

Who were the people behind the site, and what were their backgrounds?

The London SE1 website is a family-run enterprise. My father, Leigh Hatts, has a background in outdoors, arts and religious affairs journalism. I was still studying for A-levels when we started the website back in 1998. I went on to study History and Spanish at Royal Holloway, University of London, and continued to run the SE1 website even whilst living and studying in Madrid.

What made you decide to set up the site?

My father was editing a monthly what’s on guide for the City of London (ie the Square Mile) with an emphasis on things that City workers could do in their lunch hour such as attending free lectures and concerts. The publication was funded by the City of London Corporation and in later years by the Diocese of London because many of these events and activities happened in the City churches.

Our own neighbourhood – across the Thames from the City – was undergoing a big change. Huge new developments such as Tate Modern and the London Eye were being planned and built. There was lots of new cultural and community activity in the area, but no-one was gathering information about all of the opportunities available to local residents, workers and visitors in a single place.

In the 1970s and 1980s there was a community newspaper called ‘SE1’ but that had died out, and our neighbourhood was just a small part of the coverage areas of the established local papers (South London Press and Southwark News).

We saw that there was a need for high quality local news and information and decided that together we could produce something worthwhile.

When did you set up the site and how did you go about it?

We launched an ad-funded monthly printed what’s on guide called ‘in SE1’ in May 1998. At the same time we launched a website which soon grew into a product that was distinct from (but complementary to) the printed publication.

The earliest version of the site was hosted on free web space from Tripod (a Geocities rival) and was very basic.

By the end of 1998 we had registered the london-se1.co.uk domain and the site as it is today began to evolve.

In 2001 we moved from flat HTML files to a news CMS called WMNews. We still use a much-customised version. The current incarnation of our forum dates from a similar time, and our events database was developed in 2006.

What other websites influenced you?

When we started there weren’t many local news and community websites.

chiswickw4.com started at about the same time as we did and I’ve always admired it. There used to be a great site for the Paddington area called Newspad (run by Brian Jenner) which was another example of a good hyperlocal site before the term was coined.

More recently I’ve enjoyed following the development at some of the local news and listings sites in the USA, like Pegasus News and Edhat.

I also admire Ventnor Blog for the way it keeps local authorities on their toes.

How did – and do – you see yourself in relation to a traditional news operation?

I think we have quite old-fashioned news values – we place a strong emphasis on local government coverage and the importance of local democracy. That means a lot of evenings sitting in long meetings at Southwark and Lambeth town halls.

Quite often the main difference is simply speed of delivery – why should people wait a week for something to appear in a local paper when we can publish within hours or minutes?

We are able to be much more responsive to changes in technology than traditional news operations – we were one of the first news sites in the UK to integrate the Google Maps API into our content management system, and one of the earliest users of Audioboo.

What have been the key moments in the blog’s development editorially?

It’s very difficult to pinpoint ‘key moments’. I think our success has more to do with quiet persistence and consistency of coverage than any particular breakthrough. Our 12-year track record gives us an advantage over the local papers because their reporters covering our patch rarely last more than a year or two before moving on, so they’re constantly starting again from a clean slate in terms of contacts and background knowledge.

There are also several long-running stories that we’ve followed doggedly for a long time – for example the stop-start saga of the regeneration of the Elephant & Castle area, and various major developments along the riverside.

Twitter has changed things a lot for us, both in terms of newsgathering, and being able to share small bits of information quickly that wouldn’t merit writing a longer article.

Some of the key moments in our 12-year history have been as much about technical achievement as editorial.

In 2006 I developed our CMS for events listings. Since then we have carried details of more than 10,000 local events from jumble sales to public meetings and exhibitions of fine art. As well as powering a large part of the website, this system can also output InDesign tagged text ready to be imported straight onto the pages of our printed publication. How many publications have such an integrated online and print workflow?

What sort of traffic do you get and how has that changed over time?

The site consistently gets more than 500,000 page views a month.

We have a weekly email newsletter which has 7,200 subscribers, and we have about 7,500 followers on Twitter.

For us the big growth in traffic came four or five years ago. Since then there have been steady, unspectacular year-on-year increases in visitor numbers.

Consequences of covert recording of MPs’ advice surgeries

This article is a cross-post from the Wardman Wire.

We have a new Cablegate, in which Vince Cable the Business Minster has revealed that he was not carrying out his quasi-Judicial role in a takeover bid by News Corporation objectively, in the presence of Daily Telegraph undercover reporters:

I have blocked it, using the powers that I have got. And they are legal powers that I have got. I can’t politicise it, but for the people who know what is happening, this is a big thing. His whole empire is now under attack. So there are things like that, that being in Government…All we can do in opposition is protest.”

There are two angles which interest me around the intrusion of covert reporting into the Constituency Surgeries of MPs. Firstly, whether the covert reporting done was justified in the context, and then whether there will be a significant political impact.

Covert Reporting

The PCC Code of Practice states:

(*) 10 Clandestine devices and subterfuge

i) The press must not seek to obtain or publish material acquired by using hidden cameras or clandestine listening devices; or by intercepting private or mobile telephone calls, messages or emails; or by the unauthorised removal of documents or photographs; or by accessing digitally-held private information without consent.

ii) Engaging in misrepresentation or subterfuge, including by agents or intermediaries, can generally be justified only in the public interest and then only when the material cannot be obtained by other means.

but adds

The public interest

There may be exceptions to the clauses marked * where they can be demonstrated to be in the public interest.

1. The public interest includes, but is not confined to:
i) Detecting or exposing crime or serious impropriety.
ii) Protecting public health and safety.
iii) Preventing the public from being misled by an action or statement of an
individual or organisation.

2. There is a public interest in freedom of expression itself.

3. Whenever the public interest is invoked, the PCC will require editors to demonstrate fully that they reasonably believed that publication, or journalistic activity undertaken with a view to publication, would be in the public interest.

4. The PCC will consider the extent to which material is already in the
public domain, or will become so.

5. In cases involving children under 16, editors must demonstrate an exceptional public interest to over-ride the normally paramount interest of the child.

Given that the story has resulted in Vince Cable’s political role being heavily limited, and exposing his bias in a decision where he is required to be objective, I’d suggest that the subterfuge is very probably justified.

Impact on Constituency Surgeries

Politics is buried in mass lobbying from single issue campaigns by email (ask the people who run the House of Commons EMail system), demonstrations, and the rest. In this, the Constituency Surgery had given MPs at least one foot partly in touch with the ground.

As far as I am aware, this is the first time that covert recording has been used in a Constituency Surgery seriously to embarrass an MP, and I hope that MPs won’t be tempted to become much more cautious.

David Allen Green (aka Jack of Kent) has an interesting angle over at the New Statesman, pointing out that the newspapers would be fully aware of the views of Lib Dem Ministers:

… the Daily Telegraph’s lobby correspondents routinely hear what Liberal Democrat MPs are “really saying” about the Coalition. But because these conversations are on lobby terms, any criticisms will not be attributed to the MP in question.

but that therefore it was therefore necessary to record covertly somewhere else in order for direct ‘evidence’ to be obtained, and that this may form the thin end of a very long wedge. That “somewhere else” was the Constituency Surgery.

As a general rule, the constituency surgery of an MP should not be the place for secret recordings. That said, the confidentiality of the constituency surgery is there to protect the constituent, and not the MP (just as legal professional privilege is there to protect the client and not the lawyer). And so it is open for any constituent (real or supposed) to disclose what is said by an MP. So, on this basis, the Daily Telegraph’s secret recordings do not so far breach any grand political or legal principle.

However, there is some cause for concern. One suspects that the first use of interceptions of voicemails by tabloid reporters had a solid public interest basis; but it was quickly realised that such material was a rich seam to be mined just for trivial stories. Similarly, one hopes that newspapers do not now see constituency surgeries as “fair game”. The secret recording of a constituent would never be appropriate: there will always need to be a private space where a constituent can speak candidly to his or her Member of Parliament.

A variety of security measures would be available, ranging from verification of home addresses as being in the MP’s constituency to metal detectors and searches. Portable fingerprint scanners are now in use by the police routinely.

In a different context there was a conversation several years ago about whether the full Islamic veil was appropriate for an MP’s surgery, sparked off by Jack Straw.

Now corporations get the open data treatment

OpenCorporates __ The Open Database Of The Corporate World

In September I blogged about Chris Taggart’s website Open Charities, which opened up data from the Charity Commission website.

Today Taggart – along with Rob McKinnon – launches Open Corporates, which opens up companies information. This is a huge undertaking, but a vital one. As the site’s About page explains:

“Few parts of the corporate world are limited to a single country, and so the world needs a way of bringing the information together in a single place, and more than that, a place that’s accessible to anyone, not just those who subscribe to proprietary datasets.”

Taggart and McKinnon are well placed to do this. In addition to charities data, Taggart has created websites that make it easier to interrogate council spending data and hyperlocal websites; McKinnon has done the same for the New Zealand parliament and UK lobbying.

Below is a video explaining how you can interrogate data from the site using Google Refine. The site promises an API soon.

Games, systems and context in journalism at News Rewired

I went to News Rewired on Thursday, along with dozens of other journalists and folk concerned in various ways with news production. Some threads that ran through the day for me were discussions of how we publish our data (and allow others to do the same), how we link our stories together with each other and the rest of the web, and how we can help our readers to explore context around our stories.

Continue reading

Leaving Delicious – which replacement service will you use? (Comment call)

Leaving Delicious - other services already being bookmarked on my network

UPDATE: I’ve created a spreadsheet where you can add information about the various services and requirements. Please add what you can.

Delicious, it appears, is going to be closed down. I am hugely sad about this – Delicious is possibly the most useful tool I use as a journalist, academic and writer. Not just because of the way it makes it possible for me to share, store and retrieve information very easily – but because of the network of other users doing just the same whose overlapping fields of information I can share.

I follow over 100 people in my Delicious network, and my biggest requirement of any service that I might switch to is that as many of those people move there too.

So I’d like to ask: if Delicious does shut down, where will you move to? Publish2? Pinboard.in? Diigo? Google Reader (sorry, not functional enough for me)?  Or something else? (Here are some ideas) Please post your comments.

Adding geographical information to a spreadsheet based on postcodes – Google Refine and APIs

If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week – and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.

You can watch a video tutorial of this here.

1. Find a website that gives information based on a postcode

First, I needed to find an API which would return a page of information on any postcode in JSON…

If that sounds like double-dutch, don’t worry, try this instead.

Translation: First, I needed either of these websites: http://www.uk-postcodes.com/ or http://mapit.mysociety.org/

Both of these will generate a page giving you details about any given postcode. The formatting of these pages is consistent, e.g.

(The first removes the space between the two parts of the postcode, and adds .json; the second replaces the space with %20 – although I’m told by Matthew Somerville that it will work with spaces and postcodes without spaces)

This information will be important when we start to use Google Refine…

2. Create a new column that has text in the same format as the webpages you want to fetch

In Google Refine click on the arrow at the top of your postcode column and follow the instructions here to create a new column which has the same postcode information, but with no spaces. To replace the space with %20 instead you would replace the express with

value.split(" ").join("%20")

Let’s name this column ‘SpacesRemoved’ and click OK.

Now that we’ve got postcodes in the same format as the webpages above, we can start to fetch a bunch of code giving us extra information on those postcodes.

3. Write some code that goes to a webpage and fetches information about each postcode

In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column by fetching URLs…’

You can read more about this functionality here.

This time you will type the expression:

"http://www.uk-postcodes.com/postcode/"+value+".json"

That basically creates a URL that inserts ‘value’ (the value in the previous column) where you want it.

Call this column ‘JSON for postcode’ and click OK.

Each cell will now be filled with the results of that webpage. This might take a while.

4. Write some code that pulls out a specific piece of information from that

In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column based on this column…’

Write the following expression:

value.parseJson()["administrative"]["district"]["title"]

Look at the preview as you type this and you’ll see information become more specific as you add each term in square brackets.

Call this ‘Council’ and click OK.

This column will now be populated with the council names for each postcode. You can repeat this process for other information, adapting the expression for different pieces of information such as constituency, easting and northing, and so on.

5. Export as a standard spreadsheet

Click Export in the top right corner and save your spreadsheet in the format you prefer. You can then upload this to Google Docs and share it publicly.

Other possibilities

Although this post is about postcode data you can use the same principles to add information based on any data that you can find an API for. For example if you had a column of charities you could use the Open Charities API to pull further details (http://opencharities.org/info/about). For local authority data you could pull from the OpenlyLocal API (http://openlylocal.com/info/api).

If you know of other similarly useful APIs let me know.

Adding geographical information to a spreadsheet based on postcodes – Google Refine and APIs

If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week – and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.

You can watch a video tutorial of this here.

1. Find a website that gives information based on a postcode

First, I needed to find an API which would return a page of information on any postcode in JSON…

If that sounds like double-dutch, don’t worry, try this instead.

Translation: First, I needed either of these websites: http://www.uk-postcodes.com/ or http://mapit.mysociety.org/

Both of these will generate a page giving you details about any given postcode. The formatting of these pages is consistent, e.g.

(The first removes the space between the two parts of the postcode, and adds .json; the second replaces the space with %20 – although I’m told by Matthew Somerville that it will work with spaces and postcodes without spaces)

This information will be important when we start to use Google Refine…

2. Create a new column that has text in the same format as the webpages you want to fetch

In Google Refine click on the arrow at the top of your postcode column and follow the instructions here to create a new column which has the same postcode information, but with no spaces. To replace the space with %20 instead you would replace the express with

value.split(" ").join("%20")

Let’s name this column ‘SpacesRemoved’ and click OK.

Now that we’ve got postcodes in the same format as the webpages above, we can start to fetch a bunch of code giving us extra information on those postcodes.

3. Write some code that goes to a webpage and fetches information about each postcode

In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column by fetching URLs…’

You can read more about this functionality here.

This time you will type the expression:

"http://www.uk-postcodes.com/postcode/"+value+".json"

That basically creates a URL that inserts ‘value’ (the value in the previous column) where you want it.

Call this column ‘JSON for postcode’ and click OK.

Each cell will now be filled with the results of that webpage. This might take a while.

4. Write some code that pulls out a specific piece of information from that

In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column based on this column…’

Write the following expression:

value.parseJson()["administrative"]["district"]["title"]

Look at the preview as you type this and you’ll see information become more specific as you add each term in square brackets.

Call this ‘Council’ and click OK.

This column will now be populated with the council names for each postcode. You can repeat this process for other information, adapting the expression for different pieces of information such as constituency, easting and northing, and so on.

5. Export as a standard spreadsheet

Click Export in the top right corner and save your spreadsheet in the format you prefer. You can then upload this to Google Docs and share it publicly.

Other possibilities

Although this post is about postcode data you can use the same principles to add information based on any data that you can find an API for. For example if you had a column of charities you could use the Open Charities API to pull further details (http://opencharities.org/info/about). For local authority data you could pull from the OpenlyLocal API (http://openlylocal.com/info/api).

If you know of other similarly useful APIs let me know.

Case Study – Two political blog articles which went viral

One of the areas which interests me is how independent publishers can cut through to build an audience, or drive a story into the wider public arena. This is a cross-post from the Wardman Wire.

Two articles from the last month by the Heresiarch and Anna Raccoon form an interesting study in articles by political bloggers which gained widespread attention. Both of these pieces went viral via Twitter, rather than Facebook or any other social network.

Firstly, a piece, which caught the moment when the conviction of “Twitter Terrorist” Paul Chambers was confirmed. This piece achieved almost 1000 retweets.

This is the headline and abstract:

Heresy Corner: With the Conviction of Paul Chambers, it is now illegal to be English.

There is something deeply and shockingly offensive about the conviction of Paul Chambers for his Twitter joke, almost unbelievably reaffirmed today at the Crown Court in Doncaster. It goes beyond the normal anger anyone would feel at a blatant injustice, at a piece of prosecutorial and judicial overkill that sees the might of the state pitted against a harmless, unthreatening individual for no good reason.

Secondly, a piece from Anna Raccoon last week, about the case of Stephen Neary, who seems to have been caught up in a bureaucratic whirlpool through his autism:

The Orwellian Present – Never Mind the Future.

Steven Neary, Deprivation of Liberty Safeguards, Welfare Deputyships and The Court of Protection

These numbers of tweets are 50-100 times more than will be achieved by a reasonably well-received article. As a comparison the last 6 articles on the Heresy Corner homepage this morning are showing 3, 5, 4, 9, 40 and 2 retweets.

My observations:

1 – Both are non party-aligned writers embedded in the political blog niche, but also cover political questions from a position of non-political knowledge, with a degree of authority/respect which has come from their own work over two years or more.

2 – In these instances, both are amateur or professional subject specialists in the areas they cover here, and have an established readership who are able to give a boost to a piece in the social media nexus. As a comparison, in the world of Internet Consultancy much time (and money) is spent trying to build initial traction for articles and websites to give them a boost into wider internet prominence.

3 – The importance of “connectors”. Anna Raccoon’s piece received a significant boost from Charon QC, who provides an important hub-site in the legal niche – which of course is one place where a real difference can be made to Stephen Neary’s situation.

4 – The “edge of the political blogosphere” has become very important – both for specialist sites writing about political questions, and political blogs who “do more than politics”.

5 – These are two different types of article. The Heresy Corner summarised the online reaction to the “I’l blow you’re airport sky high” Twitter Joke Trial case at the right time to catch the Zeitgeist, while Anna Raccoon’s piece is a campaigning piece trying to direct attention to a particular case, in an area of society she has written about on perhaps a dozen occasions.

6 – Several legal commentators (eg Jack of Kent in addition to Charon) have pointed out (correctly) that for campaigning piece to convert attention into action, there needs to be more complete information about both sides of the story. A spotlight can be directed onto a perceived abuse, but there needs to be objective investigation afterwards.

That is a good distinction; but the rub is that officialdom can prevent both sides of the story being available to the public, and often only react to media spotlights – not to problems which they have not been embarrassed about.

7 – Neither of these bloggers are deeply embedded in the Facebook ecosystem, which is a distinct difference from some other mainly political sites, which report Facebook as a major source of traffic (example). I’ll write more on this another time, because I think it is important.

8 – During November, when the Paul Chambers piece was published, Heresy Corner jumped from 134 in the Wikio blog ranks to number 15 (illustrated). This was after changes which introduced a “Twitter” factor into the Wikio rankings. I’d suggest that this level of volatility may illustrate that they’ve overdone it.

Wrapping Up

The missing link for independent publishers is the ability to translate incisive observation or reporting into an effective influence.

I’ll return to that subject soon.

Can I ask a favour from brave souls who’ve reached the end of this article. I need a couple of dozen Facebook “Likes” for my own site’s new Facebook page to gain access to all features. You can “Like” me at the bottom of the rh sidebar here.