Monthly Archives: September 2009

Data and the future of journalism panel discussion: Linked Data London

Tonight I had the pleasure of chairing an extremely informative panel discussion on data and the future of journalism at the first London Linked Data Meetup. On the panel were:

What follows is a series of notes from the discussion, which I hope are of some use.

For a primer on Linked Data there is A Skim-Read Introduction to Linked DataLinked Data: The Story So Far PDF) by Tom Heath, Christian Bizer and Berners-Lee; and this TED video by Sir Tim Berners-Lee (who was on the panel before this one).

To set some brief context, I talked about how 2009 was, for me, a key year in data and journalism – largely because it has been a year of crisis in both publishing and government. The seminal point in all of this has been the MPs’ expenses story, which both demonstrated the power of data in journalism, and the need for transparency from government – for example, the government appointment of Sir Tim Berners-Lee, seeking developers to suggest things to do with public data, and the imminent launch of around the same issue.

Even before then the New York Times and Guardian both launched APIs at the beginning of the year, MSN Local and the BBC have both been working with Wikipedia and we’ve seen the launch of a number of startups and mashups around data including Timetric, Verifiable, BeVocal, OpenlyLocal, MashTheState, the open source release of Everyblock, and Mapumental.

Q: What are the implications of paywalls for Linked Data?

The general view was that Linked Data – specifically standards like RDF – would allow users and organisations to access information about content even if they couldn’t access the content itself. To give a concrete example, rather than linking to a ‘wall’ that simply requires payment, it would be clearer what the content beyond that wall related to (e.g. key people, organisations, author, etc.)

Leigh Dodds felt that using standards like RDF would allow organisations to more effectively package content in commercially attractive ways, e.g. ‘everything about this organisation’.

Q: What can bloggers do to tap into the potential of Linked Data?

This drew some blank responses, but Leigh Dodds was most forthright, arguing that the onus lay with developers to do things that would make it easier for bloggers to, for example, visualise data. He also pointed out that currently if someone does something with data it is not possible to track that back to the source and that better tools would allow, effectively, an equivalent of pingback for data included in charts (e.g. the person who created the data would know that it had been used, as could others).

Q: Given that the problem for publishing lies in advertising rather than content, how can Linked Data help solve that?

Dan Brickley suggested that OAuth technologies (where you use a single login identity for multiple sites that contains information about your social connections, rather than creating a new ‘identity’ for each) would allow users to specify more specifically how they experience content, for instance: ‘I only want to see article comments by users who are also my Facebook and Twitter friends.’

The same technology would allow for more personalised, and therefore more lucrative, advertising.

John O’Donovan felt the same could be said about content itself – more accurate data about content would allow for more specific selling of advertising.

Martin Belam quoted James Cridland on radio: “[The different operators] agree on technology but compete on content”. The same was true of advertising but the advertising and news industries needed to be more active in defining common standards.

Leigh Dodds pointed out that semantic data was already being used by companies serving advertising.

Other notes

I asked members of the audience who they felt were the heroes and villains of Linked Data in the news industry. The Guardian and BBC came out well – The Daily Mail were named as repeat offenders who would simply refer to “a study” and not say which, nor link to it.

Martin Belam pointed out that The Guardian is increasingly asking itself ‘How will that look through an API’ when producing content, representing a key shift in editorial thinking. If users of the platform are swallowing up significant bandwidth or driving significant traffic then that would probably warrant talking to them about more formal relationships (either customer-provider or partners).

A number of references were made to the problem of provenance – being able to identify where a statement came from. Dan Brickley specifically spoke of the problem with identifying the source of Twitter retweets.

Dan also felt that the problem of journalists not linking would be solved by technology. In conversation previously, he also talked of “subject-based linking” and the impact of SKOS and linked data style identifiers. He saw a problem in that, while new articles might link to older reports on the same issue, older reports were not updated with links to the new updates. Tagging individual articles was problematic in that you then had the equivalent of an overflowing inbox.

(I’ve invited all 4 participants to correct any errors and add anything I’ve missed)

Finally, here’s a bit of video from the very last question addressed in the discussion (filmed with thanks by @countculture):

Linked Data London 090909 from Paul Bradshaw on Vimeo.

Online Journalism lesson #6: Interactivity

I’ve been rather tardy about getting all of these online, so here’s the 6th of my presentations from the Online Journalism class of Spring 2009, looking at Interactivity. Much of what I talk about here is also in my lengthy post on the topic:

Is the Mirror selling links to

The Mirror wants to watch out – as it looks like it’s selling links, even if it isn’t (as I first posted here and which later went hot on Sphinn). Several stories on the site share all these characteristics, and must look extremely suspicious to Google:

  • All the stories contain three links to the same MoneyExtra page.
  • All the links use different anchor text.
  • The text happens to be competitive search terms.
  • MoneyExtra isn’t mentioned in the article itself.
  • They were all published in August.

There’s nothing wrong or illegal about selling links if that is what they’re doing. But it’s likely to get you penalized by Google if they spot it as it’s done to manipulate their search results for SEO reasons (Google counts the number of links to a page as a measure of its importance).

Pages on from August

Now let’s look at several pages from

Headline: Sorting out the best credit card rate

This page from 20th August contains three links to the MoneyExtra credit cards page, using the link text “best credit card rate in the UK”, “best credit card” and “credit cards”. There is no mention of MoneyExtra in the article.

Headline: Why do credit card providers offer credit cards with 0% interest?

This page from 20th August contains three links to the MoneyExtra credit cards page, using the link text “credit card providers”, “0% credit card interest rates”, and “0% credit card deal”. No mention of MoneyExtra in the article.

Headline: Best credit card transfer: Does one size fit all?

This page from 5th August for once contains, er, three links to the MoneyExtra credit cards page, using the link text “best credit card”, “0% balance transfer rate” and “best credit card balance transfer rate”. Again, no mention of MoneyExtra in the article.

Headline: Is it too late for debt management in England?

This page from 20th August contains, er, three links to the MoneyExtra debt page, using the link text “debt management”, “debt” and “debt advice”. There is no mention of MoneyExtra in the article.

Headline: What is ‘government debt management’?

This page from 20th August contains, guess what, three links to the MoneyExtra debt page, using the link text “Government debt solution”, “debt management plans” and “debt”. There’s no mention of MoneyExtra in the article.

Something a bit different!

This page is a bit different. It’s from the 20th August, naturally. But it contains FOUR links to the MoneyExtra car insurance quotes page – and mentions MoneyExtra in the article!

Some other pages

Other pages from August (not the 20th this time) which contain three links to a specific MoneyExtra page but which don’t mention MoneyExtra in the article include: this one and this one and this one (OK, that one’s only got two links) and this one (as has that one) and this one.


As I say, there’s nothing wrong with selling links, and there’s no actual evidence that that’s what the Mirror is doing. However, this looks like the sort of pattern you’d see with sold links – so the Mirror wants to watch out it doesn’t get hit by a penalty by Google.

Data and the future of journalism: what questions should I ask?

Tomorrow I’m chairing a discussion panel on the Future of Journalism at the first London Linked Data Meetup. On the panel are:

What questions would you like me to ask them about data and the future of journalism?

Gawker offers a personalised news experience

Gawker, the popular news aggregator network, has launched an aggregator for its internal sites. Users can specify what topics from each network they wish to see, and then they are given a unique url with that content aggregated.

There are two places news will be consumed in the future, editorially aggregated sites like Gawker, or The Huffington Post and machine aggregated platforms like Techmeme and Google News. Gawker Hybrid appears to be a splendid blend of the benefits of both worlds.

You can sign in via Facebook Connect, or your Gawker profile – and then you’re instantly presented with an easy to customize view:

Customizing my Gawker Experience

Customizing my Gawker Experience

Very cool. You can even see my hybrid site here. (Sign up for the service by clicking “Hybrid” in the menu bar on any Gawker site.)

Thats really interesting – what does that profile say about me? That I’m an avid gamer, interested in tech gossip, and geeky stuff. That profile actually sums me up really well [sadly] — is this a fantastic method of pushing this is who I am?

Gawker Hybrid is great addition for users – not only do you get to consume all the news you care about from a single page, you can even avoid content you don’t want to see – perhaps you hate NFL? Or don’t want to see Valleywag coverage on Gawker.

This will however have a negative impact on Gawkers public traffic numbers – as, obviously, you need to visit fewer sites and therefore Gawkers page views will decrease. Naturally Gawker can counter this with better targeted (= more valuable) adverts. Gawker can target specialist adverts to exact types of people. Maybe thats Xbox 360 Games + Car Reviews, or NBA + PC Games. Factor this with logging in via Facebook Connect [sex – age – etc] and you have a stellar targeting platform.

Its a pity Gawker left it to the user to decide which content topics they read – it feels like the next step for this service is a machine learning “you voted this story up, how about this one?” and allow the user to have complete personalization without being constrained via arbitrary topics.

My key question is who powers this service, and can any blog network utilize this technology?

Do you find these services useful? Leave a comment with your gawker hybrid profile!
(Disclaimer: I am the CEO of a personalised newspaper called Broadersheet.)

How much local council coverage is there in your local newspaper? – help crowdsource the answer

Are local newspapers really wimping out on council coverage? Sarah Hartley would like you to help her investigate council coverage in local newspapers:

“After responses to the debate about council “newspapers” prompted so many comments … about local papers dumbing down and failing to cover civic issues at the expense of celebrity trivia, I suggested on this blog carrying out some sort of a survey to see whether that was truly the case.

“This alleged withdrawal of bread-and-butter reporting hasn’t been my experience of working on regional papers in northern England and Scotland, but, maybe times have changed or other regions have different stories to tell?”

Sarah’s investigation began on her blog with the Darlington & Stockton Times (of 7 eligible pages, the equivalent of 2 are concerned with local council stories) before I suggested she use Help Me Investigate to crowdsource the research.

If you’d like to help and need an invite contact Sarah, leave a comment here, or request an invite on Help Me Investigate itself.

UK newspapers add 213,892 Twitter followers in a month

National UK newspapers had 1,471,936 Twitter followers at the start of September – up 213,892 or 17% on August 1 (when they had 1,258,044 followers).

You can see the September figures (orignally posted here) below or here.

I have more Twitter statistics here.

Taking cues from Citizen Science

One rap against citizen journalism is that there is always a possibility that it isn’t accurate or credible. Unmonitored, unmoderated blogs can get it wrong. Well, so can traditional journalists, but with blogs, it’s harder to hold someone accountable, and erroneous information is that much trickier to retract.

Would it help then, to look for ideas in a field where inaccuracy is barely tolerated, if at all? The media should be able to tap into crowd wisdom for credible content if, as Dan Schultz notes, “members of the scientific community, a professional group that arguably maintains higher standards for verification than journalism, are trying to harness the crowd in the same way that we are.”

Citizen science has been effectively used in one main way – collection of data, which is then used by scientists for contextualization, analysis and consolidation with experiments and previous scientific literature.

Be it recording the dates of Spring’s first lilac blossoms, or counting the number of eggs in bird nests, citizens are contributing in meaningful ways, so scientists can then then use this for more specialized tasks, like assessing the information thus obtained to study the impact of global warming or the influence of human activity on wildlife.

Perhaps, the closest counterpart to this use in journalism is something akin to WNYC’s crowdsourced project to track price gouging in New York City or the Shropshire Star’s map of fuel prices. In both these exercises, citizens were not expected to do much more than report their daily observations.

Since scientific research usually requires a high level of education and training, the tasks get divided neatly between professionals and dabblers. As Schultz points out, in the case of science, “professionals have bigger and better things to do; it doesn’t make sense for a PhD to use a million-dollar telescope to look at something that a hobbyist could view using a thousand-dollar one, especially when there is so much of the universe left to unlock.”

This is not to say that such a clear definition would not work for journalism. In fact, citizen journalism pioneer Jay Rosen has often said that division of labor is essential for crowdsourced journalism projects. In WNYC’s case, citizens were responsible for collecting information that was put together in a story. In more complex investigative projects, the public is given the task of perusing documents, as is happening with The Guardian’s investigation of the MP’s expenses scandal.

Another idea would be to outsource so-called “fluff” journalism to the public (self plug warning). Many sites are already implementing this, by allowing citizens to post blogs and articles on lifestyle and recreational topics. Schulz suggests hyperlocal content as one such department where citizens can often do a good, if not better, job than reporters.

One of the main problems is that unlike scientists, journalists–irrationally or not–are in constant fear of being replaced by amateurs. Hence, they seem more hesitant to solicit citizen help. The fact that journalists are losing jobs, however, has more to do with the lack of revenue-generating mechanisms on the Internet than it has to do with bloggers posting content online. In fact, by recruiting audiences to act as eyes and ears for news organizations, the latter would actually save costs and be able to divert resources toward more specialized reporting.

Secondly, in the case of scientific crowdsourcing or citizen science, there is a distinct classification of contributors and their scope of contribution–as identified by what professionals, amateurs and citizens can do. This leads to a clear division of labor, which is not quite possible in journalism, at least in the way it is being practiced right now. While there is no doubt that journalism needs a special set of skills and training, it’s not rocket science, quite literally.

Amateurs contribute toward citizen science in significant ways by performing unspecialized tasks. In the case of bloggers, on the other hand, short of traveling to a war zone (with some exceptions) they are pretty much doing–or attempting to do–what professional journalists routinely do.

The solution is not to curb bloggers and independent journalists, however. It is to produce the sort of in-depth, high-quality journalism that makes newsroom journalism “special.” In order to have clear-cut division of labor, professionals merely have to offer a product that makes use of the creativity and resources that are available to them. And in the process, they can implement projects that involve the lay public so the latter can do what they do best.

Maps on news websites – an overview

The following is part of a chapter for a forthcoming book on online journalism. Contributions welcome.

Maps have become a familiar part of the news language online due to a number of advantages:

  • They provide an easy way to grasp a story at a glance
  • They allow users to drill down to relevant information local to them very quickly
  • Maps can be created very easily, and added to relatively easily by non-journalists
  • Maps draw on structured data, making them a very useful way to present data such as schools tables, crime statistics or petrol prices
  • They can be automated, updating in response to real-time information

News organisations have used maps in a number of ways: Continue reading