Sockpuppetry and Wikipedia – a PR transparency project

Wikipedia image by Octavio Rojas

Wikipedia image by Octavio Rojas

Last month you may have read the story of lobbyists editing Wikipedia entries to remove criticism of their clients and smear critics. The story was a follow-up to an undercover report by the Bureau of Investigative Journalism and The Independent on claims of political access by Bell Pottinger, written as a result of investigations by SEO expert Tim Ireland.

Ireland was particularly interested in reported boasts by executives that they could “manipulate Google results to ‘drown out’ negative coverage of human rights violations and child labour”. His subsequent digging resulted in the identification of a number of Wikipedia edits made by accounts that he was able to connect with Bell Pottinger, an investigation by Wikipedia itself, and the removal of edits made by suspect accounts (also discussed on Wikipedia itself here).

This month the story reverted to an old-fashioned he-said-she-said report on conflict between Wikipedia and the PR industry as Jimmy Wales spoke to Bell Pottinger employees and was criticised by co-founder Tim (Lord) Bell.

More insightfully, Bell’s lack of remorse has led Tim Ireland to launch a campaign to change the way the PR industry uses Wikipedia, by demonstrating directly to Lord Bell the dangers of trying to covertly shape public perception:

“Mr Bell needs to learn that the age of secret lobbying is over, and while it may be difficult to change the mind of someone as obstinate as he, I think we have a jolly good shot at changing the landscape that surrounds him in the attempt.

“I invite you to join an informal lobbying group with one simple demand; that PR companies/professionals declare any profile(s) they use to edit Wikipedia, name and link to them plainly in the ‘About Us’ section of their website, and link back to that same website from their Wikipedia profile(s).”

The lobbying group will be drawing attention to Bell Pottinger’s techniques by displacing some of the current top ten search results for ‘Tim Bell’ (“absurd puff pieces”) with “factually accurate and highly relevant material that Tim Bell would much rather faded into the distance” – specifically, the contents of an unauthorised biography of Bell, currently “largely invisible” to Google.

Ireland writes that:

“I am hoping that the prospect of dealing with an unknown number of anonymous account holders based in several different countries will help him to better appreciate his own position, if only to the extent of having him revise his policy on covert lobbying.”

…and from there to the rest of the PR industry.

It’s a fascinating campaign (Ireland’s been here before, using Google techniques to demonstrate factual inaccuracies to a Daily Mail journalist) and one that we should be watching closely. The PR industry is closely tied to the media industry, and sockpuppetry in all its forms is something journalists should do more than merely complain about.

It also highlights again how distribution has become a role of the journalist: if a particular piece of public interest reporting is largely invisible to Google, we should care about it.

UPDATE: See the comments for further exploration of the issues raised by this, in particular: if you thought someone had edited a Wikipedia entry to promote a particular cause or point of view, would you seek to correct it? Is that what Tim Ireland is doing here, but on the level of search results?


The test of data journalism: checking the claims of lobbyists via government

Day 341 - Pull The Wool Over My Eyes - image by Simon James

Day 341 - Pull The Wool Over My Eyes - image by Simon James

While the public image of data journalism tends to revolve around big data dumps and headline-grabbing leaks, there is a more important day-to-day application of data skills: scrutinising the claims regularly made in support of spending public money.

I’m blogging about this now because I recently came across a particularly good illustration of politicians being dazzled by numbers from lobbyists (that journalists should be checking) in this article by Simon Jenkins, from which I’ll quote at length:

“This government, so draconian towards spending in public, is proving as casual towards dodgy money in private as were Tony Blair and Gordon Brown. Earlier this month the Olympics boss, Lord Coe, moseyed into Downing Street and said that his opening and closing ceremonies were looking a bit mean at £40m. Could he double it to £81m for more tinsel? Rather than scream and kick him downstairs, David Cameron said: my dear chap, but of course. I wonder what the prime minister would have said if his lordship had been asking for a care home, a library or a clinic.

“Much of the trouble comes down to the inexperience of ingenue ministers, and their susceptibility to the pestilence of lobbying now infecting Westminster. On this occasion the hapless Olympics minister, Hugh Robertson, claimed that the extra £41m was “worth £2-5bn in advertising revenue alone”, a rate of return so fanciful as to suggest a lobbyist’s lunch beyond all imagining. Robertson also claimed to need another £271m for games security (not to mention 10,000 troops, warships and surface-to-air missiles), despite it being “not in response to any specific security threat”. It was just money.

“This was merely the climax of naivety. In their first month in office, ministers were told – and believed – that it would be “more expensive” to cancel two new aircraft carriers than to build them. Ministers were told it would cost £2bn to cancel Labour’s crazy NHS computer rather than dump it in the nearest skip. Chris Huhne, darling of the renewables industry, wants to give it £8bn a year to rescue the planet, one of the quickest ways of transferring money from poor consumer to rich landowner yet found. The chancellor, George Osborne, was told by lobbyists he could save £3bn a year by giving away commercial planning permissions. All this was statistical rubbish.

“If local government behaved as credulously as Whitehall it would be summoned before the audit commission and subject to surcharge.”

And if you want to keep an eye on such claims, try a Google News search like this one.

Active Lobbying Through Meetings with UK Government Ministers

In a move that seemed to upset collectors of UK ministerial meeting data, @whoslobbying, on grounds of wasted effort, the Guardian datastore published a spreadsheet last night containing data relating to ministerial meetings between May 2010 and March 2011.

(The first release of the spreadsheet actually omitted the column containing who the meeting was with, but that seems to be fixed now… There are, however, still plenty of character encoding issues (apostrophes, accented characters, some sort of em-dash, etc) that might cripple some plug and play tools.)

Looking over the data, we can use it as the basis for a network diagram with actors (Ministers and lobbiests) with edges representing meetings between Minsiters and lobbiests. There is one slight complication in that where there is a meeting between a Minister and several lobbiests, we ideally need to separate out the separate lobbiests into their own nodes.

UK gov meetings spreadsheet

This probably provides an ideal opportunity to have a play with the Stanford Data Wrangler and try forcing these separate lobbiests onto separate rows, but I didn’t allow myself much time for the tinkering (and the requisite learning!), so I resorted to Python script to read in the data file and split out the different lobbiests. (I also did an iterative step, cleaning the downloaded CSV file in a text editor by replacing nasty characters that caused the script to choke.) You can find the script here (note that it makes use of the networkx network analysis library, which you’ll need to install if you want to run the script.)

The script generates a directed graph with links from Ministers to lobbiests and dumps it to a GraphML file (available here) that can be loaded directly into Gephi. Here’s a view – using Gephi – of the hearth of the network. If we filter the graph to show nodes that met with at least five different Ministers…

Gephi - k-core filter

we can get a view into the heart of the UK lobbying netwrok:

Active Lobbiests

I sized the lobbiest nodes according to eigenvector centrality, which gives an indication of well connected they are in the network.

One of the nice things about Gephi is that it allows for interactive exploration of a graph, For example, I can hover over a lobbiest node – Barclays in this case – to see which Ministers were met:

Bankers connect...

Alternatively, we can see who of the well connected met with the Minister for Welfare Reform:

Welfare meetings...

Looking over the data, we also see how some Ministers are inconsistently referenced within the original dataset:

Multiple mentions

Note that the layout algorithm is such that the different representations of the same name are likely to meet similar lobbiests, which will end up placing the node in a similar location under the force directed layout I used. Which is to say – we may be able to use visual tools to help us identify fractured representations of the same individual. (Note that multiple meetings between the same parties can be visualised using the thickness of the edges, which are weighted according to the number of times the edge is described in the GraphML file…)

Unifying the different representations of the same indivudal is something that Google Refine could help us tidy up with its various clustering tools, although it would be nice if the Datastore folk addressed this at source (or at least, as part of an ongoing data quality enhancement process…;-)

I guess we could also trying reconciling company names against universal company identifiers, for example by using Google Refine’s reconciliation service and the Open Corporates database? Hmmm, which makes me wonder: do MySociety, or Public Whip, offer an MP or Ministerial position reconciliation service that works with Google Refine?

A couple of things I haven’t done: represented the department (which could be done via a node attribute, maybe, at least for the Ministers); represented actual meetings, and what I guess we might term co-lobbying behaviour, where several organisations are in the same meeting.

New UK site launches to tackle lobbying data

Who's Lobbying treemap

I’ve been waiting for the launch of Who’s Lobbying ever since they stuck up that little Post-It note on a holding page in the run-up to the general election. Well now the site is live – publishing and visualising lobbying data, beginning with information about “ministerial meetings with outside interests, based on the reports released by UK government departments in October.”

This information is presented on the homepage very simply: with 3 leaderboards and a lovely search interface.

Who's Lobbying homepage

There are also a couple of treemaps to explore, for a more visual (and clickable) kick.

These allow you to see more quickly any points of interest in particular areas. The Who’s Lobbying blog notes, for instance, that “the treemap shows about a quarter of the Department of Energy and Climate Change meetings are with power companies. Only a small fraction are with environmental or climate change organisations.”

It also critically notes in another post that

“The Number 10 flickr stream calls [its index to transparency] a “searchable online database of government transparency information”. However it is really just a page of links to department reports. Each report containing slightly different data. The reports are in a mix of PDF, CSV, and DOC formats.

“Unfortunately Number 10 and the Cabinet Office have not mandated a consistent format for publishing ministerial meeting information.

“The Ministry of Defence published data in a copy-protected PDF format, proventing copy and paste from the document.

DEFRA failed to publish the name of each minister in its CSV formatted report.

“The Department for Transport is the only department transparent enough to publish the date of each meeting.

“All other departments only provided the month of each meeting – was that an instruction given centrally to departments? Because of this it isn’t possible to determine if two ministers were at the same meeting. Our analysis is likely to be double counting meetings with two ministers in attendance.

“Under the previous Labour government, departments had published dates for individual meetings. In this regard, are we seeing less transparency under the Conservative/Lib Dem coalition?”

When journalists start raising these questions then something will really have been achieved by the open data movement. In the meantime, we can look at Who’s Lobbying as a very welcome addition to a list of sites that feels quite weighty now: MySociety’s family of tools as the grandaddy, and ElectionLeaflets.org (formerly The Straight Choice), OpenlyLocal, Scraperwiki, Where Does My Money Go? and OpenCharities as the new breed (not to mention all the data-driven sites that sprung up around this year’s election). When they find their legs, they could potentially be quite powerful.