Author Archives: Paul Bradshaw

Magazines and digital: a report from the PTC Academies and Industry Forum

Suzanne Kavanagh reports on key insights and highlights from the Periodicals Training Council (PTC) Academies and Industry Forum, at Bauer Media’s central London office.

Editorial is at the heart of management at Bauer, said the company’s CEO, Paul Keenan, who explained how they work across media and events for brands and are embracing digital.

Keenan provided several insights into the industry and Bauer’s business – helpful information for anyone applying to get into the industry: Continue reading →

Quackwatch sued by Doctor’s Data

7 Replies

A familiar story. Here’s the rundown from The Quackometer:

“Stephen Barrett [of Quackwatch] has been very critical of [Doctor’s Data] and has written that the diagnostic health tests it provides are used to defraud patients. One test in particular stood out for his criticism where patients are given a “provoking agent” that flushes out heavy metals into the urine. A urine test is then analysed by DDI and the concentration of heavy metals is compared with standards. Except the standards used are for patients who have not had the provoking agent. The levels of metals are going to be much higher than normal and this ‘elevated result’ is then used to sell expensive and unnecessary treatments.”

Sounds like a valid subject to investigate. Then:
Continue reading →

When Open Public Data Isn’t…?

Leave a reply

This year was always going to be an exciting year for open data. The launch of data.gov.uk towards the end of last year, along with commitments from both sides of the political divide before the election that are continuing to be acted upon now means data is starting to be opened up -scruffily at first, but that’s okay – and commercial enterprises are maybe starting to get interested too…

…which was always the plan…

…but how is it starting to play out?

The story so far…

A couple of weeks ago, the first meeting of the Public Data Transparency Board was convened, which discussed – and opened up for further discussion, a set of draft public data principles. (Papers relating to the meeting can be found here.)

In a letter to the responsible Minister prior to the meeting (commentable extracts here), Professor Nigel Shadbolt suggested that:

4. … The economic analysis, and the views we regularly hear from the business community themselves, are unequivocal: data must be released for free re-use so that the private sector can add new value and develop innovative new business services from government information. …

…

8. Transparency principles need to be extended to those who operate public services on a franchised, regulated or subsidised basis. If the state is controlling a service to the public or is franchising or regulating its delivery the data about that activity should be treated as public data and made available. …

…

11. We need to support the development of licences and supporting policies to ensure that data released by all public bodies can be freely re-used and is interoperable with the internationally recognised Creative Commons model. …

12. A key Government objective is to realise significant economic benefits by enabling businesses and non-profit organisations to build innovative applications and websites using public data. …

The business imperative is further reinforced by the second of three reasons given by the Open Government Data tracking project in Why Open Government Data?:

Releasing social and commercial value. In a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value.

So how has business been getting involved? As several local councils start to pick up a request contained in a letter from the Prime Minister published at the end of May that they open up their financial data, Chris Taggart/@countculture, developer of OpenlyLocal posted a piece on The open spending data that isn’t… this is not good in which he described how apparently privileged access to financial data from several councils was being used to drive Spikes Cavell’s SpotlightOnSpend website (for a related open equivalent, see Adrian Short’s Armchair Auditor). Downstream use of the data was hampered by a “personal use only” license, and a CAPTCHA that requires a human in the loop in order to access the data. The Public Sector Transparency Board promptly responded to Chris’ post (Work on Local Spending Data), quoting the principle that:

“Public data will be released under the same open licence which enables free reuse, including commercial reuse – all data should be under the same easy to understand licence. Data released under the Freedom of Information Act or the new Right to Data should be automatically released under that licence.”

and further commenting: “We have already reminded those involved of this principle and the existing availability of the ‘data.gov.uk’ licence which meets its criteria, and we understand that urgent measures are already taking place to rectify the problems identified by Chris.”

Spikes Cavell chief executive Luke Spikes responded via an interview with Information Age, (SpotlightOnSpend reacts to open criticism):

[SpotlightOnSpend] is first and foremost a spend analysis software and consultancy supplier, and that it publishes data through SpotlightOnSpend as a free, optional and supplementary service for its local government customers. The hope is that this might help the company to win business, he explains, but it is not a money-spinner in itself.

“The contribution we’re making to transparency is less about what the purists would like to see, it’s simply putting the information out there in a form that is useful for the audience for which it is intended [i.e. citizens and small businesses]” he said. “But there are a few things we haven’t done right, and we’ll fix that.”

Following the criticism, Cavell says that SpotlightOnSpend will make the data available for download in its raw form. “That’s what we thought was the most sensible solution to overcoming this obstacle,” he told me.

Adrian Short, developer of the open Armchair Auditor, then picked up the baton in a comment to the Information Age article:

There is room for Spikes Cavell to develop their applications and I doubt that anyone has any objection to them offering their services to councils commercially just like thousands of other businesses. But they do not have a monopoly of ideas, talent and resources to build great applications with public spending data. Nor does anyone else.

The concerns that @CountCulture raised were not that Spikes Cavell were trading with councils or trying to attract their business but that they are doing so in a way that precludes anyone else developing applications with this data. By legally and technically locking the data into the Spotlight on Spend platform, everyone else is excluded.

…

It’s understandable that most councils have no understanding of the culture, legalities or technicalities of open data. This is new territory for nearly all of them. Those councils that have put their data straight onto Spotlight on Spend, bypassing the part where it is made genuinely open — cannot be criticised for not complying with what to them must be a very unusual requirement. But that’s why @CountCulture and I and others want to be very clear that the end result of this process is having effective scrutiny of council finances through multiple websites and applications, not just Spotlight on Spend or any other single website or application. The way we get there is with open data.

And Chris Taggart’s response? (Update on the local spending data scandal… the empire strikes back):

Lest we forget, Spikes Cavell is not an agent for change here, not part of those pushing to open public data, but in fact has a business model which seems to be predicated on data being closed, and the maintenance of the existing procurement model which has served us so badly.

(For recommendations on how councils might publish financial data in an open way, see: Publishing itemised local authority expenditure – advice for comment (reiterated here: Open Government Data: Finances. The Office for National Statistics occasionally releases summary statistics (e.g. as republished in Openlocal: Local spending data in OpenlyLocal, and some thoughts on standards) but at only a coarse resolution. As to how much it might cost to do that, some are claiming Cost of publishing ‘£500 and above’ council expenditure prohibitive.)

From my own perspective, I would also add that should consultants like Spikes Cavell create derived data works from open public data, there should be some transparency in declaring how the derived work was created (see for example: So Where Do the Numbers in Government Reports Come From? and Data is not Binary).

Another example of how once open data is becoming “closed” behind a paywall comes from Paul Geraghty (“Closed Data Now” SOCITM does a “Times”):

If my memory serves me well the e-Gov Register (eGR) hosted by Brent has been showing every IT supplier sortable by product type, supplier, local government type and even on maps for about 6 or 7 years (some links below if you hurry up).

I am aware that there are problems with this data, in my own past employer I know that some of the data is out of date.

But it is there, it is useful and informative and it is OPEN to all, even SMEs like me researching on niche markets in local government.

The latest move by SOCITM (and presumably with the knowledge of the LGA and the IDeA) means all that data is going to go behind the SOCITM paywall.

And the response from Socitm, via a comment from Vicky Sargent:

First of all, I’d like to clear up some points of fact. No local authority or other public sector service provider that provides data to the Applications Register will have to pay for their subscription and for them, access to the data will be free, regardless of whether they subscribe to Socitm Insight (as 95% of local authorities do). Anyone who is employed in an organisation that is an Applications Register subscriber – f-o-c or paid, will be able to access the data.
Then there is who pays. Clearly an information service like this that adds value, has to cover the costs of development and delivery. Unlike government departments, LGA, IDeA and local councils, Socitm is not directly funded by the taxpayer, and needs to fund the services it delivers from money raised from fees, subscriptions, events and other services.
The business model we use for the Applications Register is that public bodies that contribute should not pay to use the service, but those that do not contribute pay in cash. Private sector bodies can only pay in cash.
…
Your article also suggests that Socitm’s support for the move towards open data is hypocritical, set against our business model for the Applications Register. I think this misunderstands the thinking behind ‘open data’, which is to get raw data out of government systems for transparency purposes, also so that it can be re-used. Socitm has been a long-term strong supporter of this.
The open data agenda explicitly acknowledges that ‘re-use’ includes adding value and selling on. If councils were to routinely publish the sort of data we will collect for the Applications Register, there would still be work to be done aggregating and manipulating and re-publishing the information to make it useful, and that is what we do, recovering our costs in the way described.

Adrian Short (can you see how it’s the same few players engaging in these debates?!;-) develops the “keep it free” argument in a further comment:

Your argument presupposes your conclusion, which is that Socitm is the best organisation to be managing/publishing the applications register. Because, as you correctly say, you don’t receive any direct funding from the taxpayer, you have to find other ways of paying for that work. Inevitably this means charging non-contributing users.

…

What you’re missing is that millions of pounds of public money is spent every year supporting businesses, helping to create new markets and generally oiling the parts of the economy that don’t easily oil themselves. That’s what BIS and the economic development departments of local authorities do. The public interest and private benefit aren’t easily distinguishable unless you contrive that private benefit for a small group to the exclusion of others. But as Paul rightly points out, the potential market for this information is enormous — essentially every business and individual that works for, supplies or wants to work for the public sector, from the individual IT worker to the massive global consultancies, manufacturers and software firms.

Currently it’s a small number of incumbent suppliers that benefit from this relatively inefficient market. Other businesses lose. Public sector buyers lose. The taxpayer loses.

Keeping this information free for everyone to use and enabling it to be used in future when combined with the enormous amount of data that will be released soon will be likely to produce economic benefits to the public through market efficiencies that outstrip its cost by several orders of magnitude. If Socitm can’t publish this data in the most useful, non-discriminatory way then it’s not the best organisation for the job. I can see no reason in principle or practice why it shouldn’t be fully funded by the taxpayer and free at the point of use for everyone. To do otherwise would be an extremely false economy.

(Note that “free vs. open” debates have also been played out in the open source software arena. Maybe it’s worth revisiting them…?)

The previously quouted comment from Vicky Sargent also contains what might be described as an example case study:

This brings me to Better Connected, the annual survey of council websites carried out by Socitm. You say:
Just about every council in the UK has little option but to pay SOCITM hundreds of pounds annually to join their club to find out the exact details of how their website is being ranked.The data for Better connected only exists because Socitm has devised a methodology for evaluating websites, pays for a team of reviewers collect the data each year, and then analyses and publishes the results. No one has to subscribe, they choose to do so because the information is valuable to them.
Information about how we do the evaluation and ranking is freely available on our website, in our press releases and in our free-to-join Website Usage and Improvement community. The 2010 headline results for all councils are published on socitm.net as open data under a creative commons licence and are linked from data.gov.uk.
If the Better connected report has become a ‘must read’, that is because the investment Socitm has made in the product has led to it being a more cost-effective investment for councils than alternative sources of advice on improving their website. Many users have told us Better connected (cover price £415 for non-subscribers or free as part of the Socitm Insight subscription that starts at £670 pa for a small district council) is worth many days’ consultancy, even when that consultancy is purchased from lower cost SME providers.

As these examples show, the license under which data is originally released can have significant consequences on its downstream use and commercialisation. The open source software community has know this for years, of course, which is why organisations like GNU have two different licenses – GPL, which keeps software open by tainting other software that includes GPL libraries, and LGPL, which allows libraries to be used in closed/proprietary code. There is a good argument that by combining data from different open sources in a particular way valuable results may be created, but it should also be recognised that work may be expended doing this and a financial return may need to be generated (so maybe companies shouldn’t have to open up their aggregated datasets?) Just how we balance commercial exploitation with ongoing openness and access to raw public data is yet to be seen.

(The academic research area – which also has it’s own open data movement (e.g. Panton Principles) – also suggests a different sort of tension arising from the “potential value” of a data set or aggregated data set. For example, research groups analysing data in one particular way may be loathe to release to others because they want to analyse it in another, value creating way at a later date.)

Getting the licensing right is particularly important if councils become obliged to use third party services to publish their data. For example, the grand vision of the Public Sector Transparency Board identified in this paragraph in Shadbolt’s letter to Maude states:

13. We must promote and support the development and application of open, linked data standards for public data, including the development of appropriate skills in the public services. …

But as a recent report, again from Chris Taggart, on Publishing Local Open Data – Important Lessons from the Open Election Data project suggests, there are certain challenges associated with web related development in local authorities, and in particular a significant lack of experience and expertise in dealing with Linked Data (which is not surprising – it is a relatively new, and so far arcane) technology. Here are the first four lessons, for example:

– There is a lack of ‘corporate’ awareness/understanding of open data issues, and this will inhibit take up of open, linked data publishing unless it is addressed
– There is a lack of even basic web skills at some councils
– Many councils lack web publishing resources, never mind the resources to implement open, linked data publishing
– The understanding of even the basics of linked data and the steps to publishing public data in this way is very, very limited

What this suggests to me is that it is likely that in the short term at least, the capability for publishing Linked Data will reside in specialist third party companies, possibly one of only a few companies. As Paul Geraghty discovers from the eGovernment Register in If #localgovweb supplier says “RDF WTF?” Sack em #opendata #spending:

[I]t seems to me that of 450 or so local government organisations, 357 are listed as having a “Financials” supplier **.

There are only 18 suppliers listed, and of those there are 6 Big Ones.

Between them the 6 Big Ones supply “Financials” to 326 Councils.

Don’t you think that the first one of those 6 Big Ones who natively supports LOD [Linked Open Data] as an export option (or agrees to within, say, 8 months) really ought to be favoured when bidding for new business?

…

Lets go further, lets say that it should be mandated that all new contracts with “Financials” suppliers include an LOD clause.

Perhaps Mr Pickles could dispatch someone to have a chat with one or two of these suppliers, or that he should have someone check that future contracts for Financial products being sold to Local Government all contain the necessary wording to make this happen?

So instead of trying to train and cajole 450 councils to FTP assorted CSV files into localdata.gov.uk (FFS) all the way through to grokking RDF, namespaces, LOD et al – why does the government not get on and make a strategy to bully and coerce 6 suppliers instead – and potentially get 326 councils teed up to produce useful LOD a bit sharpish?

Another technology option is for councils to publish their own linked data to a commercially hosted datastore. At the moment, the two companies I know of that offer “datastore” services for publishing Linked Data, at scale, are Talis, and the Stationery Office (in partnership with Garlik). It is, of course, open knowledge that one Professor Nigel Shadbolt is a director of Garlik Limited.

An introduction to data scraping with Scraperwiki

23 Replies

Last week I spent a day playing with the screen scraping website Scraperwiki with a class of MA Online Journalism students and a local blogger or two, led by Scraperwiki’s own Anna Powell-Smith. I thought I might take the opportunity to try to explain what screen scraping is through the functionality of Scraperwiki, in journalistic terms.

It’s pretty good.
Continue reading →

Don't stop us digging into public spending data

2 Replies

A disturbing discovery by Chris Taggart last week: a number of councils in the UK are handing over their ‘open’ data to a company which only allows it to be downloaded for “personal” use.

As Chris himself points out, this runs completely against the spirit of the push to release public data in a number of ways:

Data cannot be used for “commercial gain”. This includes publishers wanting to present the information in ways that make most sense to the reader, and startups wanting to find innovative ways to involve people in their local area. Oh, and that whole ‘Big Society‘ stuff.
The way the sites are built means you couldn’t scrape this information with a computer anyway
It’s only a part of the data. “Download the data from SpotlightOnSpend and it’s rather different from the published data [on the Windsor & Maidenhead site]. Different in that it is missing core data that is in W&M published data (e.g. categories), and that includes data that isn’t in the published data (e.g. data from 2008).”

It’s a worrying path. As Chris sums it up: ” Councils hand over all their valuable financial data to a company which aggregates for its own purposes, and, er, doesn’t open up the data, shooting down all those goals of mashing up the data, using the community to analyse and undermining much of the good work that’s been done.”

The Transparency Board quickly issued a statement about this issue saying that “urgent” measures are taking place to rectify the problem.

And Spikes Cavell, who make the software, responded in Information Age, pointing out that “it is first and foremost a spend analysis software and consultancy supplier, and that it publishes data through SpotlightOnSpend as a free, optional and supplementary service for its local government customers. The hope is that this might help the company to win business, he explains, but it is not a money-spinner in itself.”

They are now promising to make the data available for download in its “raw form”, although it’s not clear what that will be. Adrian Short’s comment to the piece is worth reading.

Nevertheless, this is an issue that anyone interested in holding power to account should keep a close eye on. And to that aim, Chris has started an investigation on Help Me Investigate to find out how and why councils are giving access to their spending data. Please join it and help here.

(Comment or email me on paul at helpmeinvestigate.com if you want an invitation.)

77,000 pageviews and multimedia archive journalism (MA Online Journalism multimedia projects pt4)

2 Replies

(Read part 1 here; part 2 here and part 3 here)

The ‘breadth portfolio’ was only worth 20% of the Multimedia Journalism module, and was largely intended to be exploratory, but Alex Gamela used it to produce work that most journalists would be proud of.

Firstly, he worked with maps and forms to cover the Madeira Island mudslides:

“When on the 20th of February a storm hit Madeira Island, causing mudslides and floods, the silence on most news websites, radios and TV stations was deafening. But on Twitter there were accounts from local people about what was going on, and, above all, they had videos. The event was being tagged as #tempmad, so it was easy to follow all the developments, but the information seemed to be too scattered to get a real picture of what was going on in the island, and since there was no one organizing the information available, I decided to create a map on Google[ii], to place videos, pictures and other relevant information.

“It got 10,000 views in the first hours and reached 30,000 in just two days. One month later, it has the impressive number of 77 thousand visits.”

Not bad, then.

Secondly, Alex experimented with data visualisation to look at newspaper brand values and the online traffic of Portuguese news websites.

“My goal was to understand the relative and proportional position of each one, regarding visits, page views, and how those two values relate to each other. The data I got also has portals, specialized websites, and entertainment magazines so it has a broad range of themes (all charts are available live here – http://is.gd/aZLXs)”

And finally, he produced a beautiful Flash interactive on Moseley Road Baths (which he talks about here).

All of which was produced and submitted within the first six weeks of the Multimedia Journalism module.

The other 80%: multimedia archive journalism

Alex was particularly interested in archive journalism and using multimedia to bring archives to life. As a way of exploring this he produced the Paranoia Timeline, a website exploring “all the events that caused some type of social hysteria throughout the world in the last 20 years.

“Some of the situations presented here were real dangers, others not really. But all caused disturbances in our daily lives … Why does that happen? Why are we caught in these bursts of information, sometimes based on speculative data and other times borne out of the imagination of few and fed by the beliefs of many?”

The site – which is an ongoing project in its earliest stages – combines video, visualisation, a Dipity timeline, mapping and the results of some fascinating data and archive journalism. Alex explains:

“The swine flu data came from Wolfram-Alpha[vi] that generated a rather reliable (after cross checking with other official websites) amount of data, with the number of cases and deaths per country. I had to make an option about which would be highlighted, but discrepancies in the logical amount of cases between countries made me go just for the death numbers. The conclusion that I got from the map is that swine flu was either more serious or reported in the developed countries. Traditionally considered Third World countries do not have many reports, which reflect the lack of structures to deal with the problem or how overhyped it was in the Western world. But France on its own had almost 3 million cases reported against 57 thousand in the United States, which led me to verify closely other sources. It seems Wolfram Alpha had the number wrong, there were only about 5000 reports, which proves that outliers in data are either new stories or just input errors.

“For the credit crunch[vii], I researched the FDIC – Federal Deposit Insurance Corporation[viii] database. They have a considerable amount of statistical data available for download. My idea was to chart the evolution of loans in the United States in the last years, and the main idea was that overall loans slowed down since 2009 but individual credits rose, meaning an increase in personal debt to cope with overall difficulties caused by the crunch.I selected the items that seemed more relevant and went for a simple line chart. My purpose was served.”

“Though the current result falls short of my initial goals,” says Alex, “it is a prototype for a more involving experience, and I consider it to be a work in construction. What I’ll be defending here is a concept with a few examples using interactive tools, but I realize this is just a small sample of what it can really be: an immersive, ongoing project, with more interactive features, providing a journalistic approach to issues highly debated and prone to partisanship, many of them used by religious and political groups to spin their own ideologies to the general audience. The purpose is to create context.”

Alex is currently back in Portugal as he completes the final MA by Production part of his Masters. You might want to hire him, or Caroline, Dan, Ruihua, Chiara, Natalie or Andy.

Using data to scrutinise local swimming facilities (MA Online Journalism multimedia projects pt3)

2 Replies

(Read part 1 here and part 2 here)

The third student to catch the data journalism bug was Andy Brightwell. Through his earlier reporting on swimming pool facilities in Birmingham, Andy had developed an interest in the issue, and wanted to use data journalism techniques to dig further.

The result was a standalone site – Where Can We Swim? – which documented exactly how he did that digging, and presented the results.

He also blogged about the results for social media firm Podnosh, where he has been working.
Continue reading →

Announcing the Birmingham Hacks & Hackers day

2 Replies

If you are a journalist, blogger or developer interested in the possibilities of public data I’d be very happy if you came to a Hack Day I’m involved in, here in Birmingham on Friday July 23.

The idea is very simple: we get a bunch of public data, and either find stories in it, or ways to help others find stories.

You don’t need technical expertise because that’s why the hackers are there; and you don’t need journalistic expertise because that’s why the hacks are there.

What I’m particularly excited about in Birmingham is that we’ve got a real mix of people coming – from press and broadcast, and local bloggers, and hopefully a mix of people with backgrounds in various programming languages and even gaming.

And apart from all that there should be free beer and pizza. Which is the important thing.

So come.

The day is being organised by Scraperwiki and we’ve already got a whole bunch of interesting people signed up.

You can register for the day here.

Local history as a game (MA Online Journalism multimedia projects pt2)

1 Reply

Following on from the previous post on serious music journalism using data, here’s some more detail on how MA Online Journalism students have been exploring multimedia journalism.

Using data to shed light on dangers for cyclists

Dan Davies explored video and mapping audio before catching the data bug – in this case, around cycling collisions. Like Caroline, he sourced data from a range of sources, including media reports, an RSS feed from FixMyStreet, another RSS feed from Google News, Freedom of Information requests – and getting out there and collecting it himself.

He’s visualised the data in a range of ways at Birmingham Cycle Data, using tools such as Yahoo! Pipes and ManyEyes, and collaborated with cycling communities too. The results provide a range of insights into transport issues for cyclists: Continue reading →

Music journalism and data (MA Online Journalism multimedia projects pt1)

9 Replies

I’ve just finished looking at the work from the Diploma stage of my MA in Online Journalism, and – if you’ll forgive the effusiveness – boy is it good.

The work includes data visualisation, Flash, video, mapping and game journalism – in short, everything you’d want from a group of people who are not merely learning how to do journalism but exploring what journalism can become in a networked age.

But before I get to the detail, a bit of background… Continue reading →

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.

Author Archives: Paul Bradshaw

Magazines and digital: a report from the PTC Academies and Industry Forum

Quackwatch sued by Doctor’s Data

When Open Public Data Isn’t…?

An introduction to data scraping with Scraperwiki

Don't stop us digging into public spending data

77,000 pageviews and multimedia archive journalism (MA Online Journalism multimedia projects pt4)

The other 80%: multimedia archive journalism

Using data to scrutinise local swimming facilities (MA Online Journalism multimedia projects pt3)

Announcing the Birmingham Hacks & Hackers day

Local history as a game (MA Online Journalism multimedia projects pt2)

Using data to shed light on dangers for cyclists

Music journalism and data (MA Online Journalism multimedia projects pt1)