Tag Archives: john o’donovan

BBC moves to more structured data in its relaunch

code behind BBC pages

Behind the story of the BBC website’s recent relaunch is, among other things, an update to their content management system. In a post on the changes, John O’Donovan explains how the changes mean that webpages will have a more structured and semantic quality:

“We will … no longer be using tables to layout the content, instead we will be rendering the pages using CSS layout and only using tables for data.

“There are lots of reasons to do this, but some include making the content more efficient, more standards compliant and faster to render. It also allows us to publish semantic XHTML, which means that content blocks are better marked up to describe what they are and has benefits like creating a better header structure to help screen readers.

“Better structure also means you will see a more consistent presentation of stories in Google and search engines with, for example, story dates and author information showing more clearly.

“This reflects a new content model which is now largely based around a simple and generic data model of assets and groups of assets which are typed (meaning we don’t just manage blocks of content, we use metadata to describe what is in the blocks of content) and publishing through templates and services based around Velocity.”

In addition code that now looks like the image above will mean that the site is better search engine optimised (as if a PageRank of 9 wasn’t good enough), more accessible, and it will be easier for developers to do interesting things with BBC content.

On the subject of SEO the site is simplifying URLs but still won’t be including descriptive words there – but “there is more work to do yet on how we might use even shorter URLs (such as http://www.bbc.co.uk/10250603) and longer more descriptive ones http://www.bbc.co.uk/story-about-something-interesting.”


Data and the future of journalism panel discussion: Linked Data London

Tonight I had the pleasure of chairing an extremely informative panel discussion on data and the future of journalism at the first London Linked Data Meetup. On the panel were:

What follows is a series of notes from the discussion, which I hope are of some use.

For a primer on Linked Data there is A Skim-Read Introduction to Linked DataLinked Data: The Story So Far PDF) by Tom Heath, Christian Bizer and Berners-Lee; and this TED video by Sir Tim Berners-Lee (who was on the panel before this one).

To set some brief context, I talked about how 2009 was, for me, a key year in data and journalism – largely because it has been a year of crisis in both publishing and government. The seminal point in all of this has been the MPs’ expenses story, which both demonstrated the power of data in journalism, and the need for transparency from government – for example, the government appointment of Sir Tim Berners-Lee, seeking developers to suggest things to do with public data, and the imminent launch of Data.gov.uk around the same issue.

Even before then the New York Times and Guardian both launched APIs at the beginning of the year, MSN Local and the BBC have both been working with Wikipedia and we’ve seen the launch of a number of startups and mashups around data including Timetric, Verifiable, BeVocal, OpenlyLocal, MashTheState, the open source release of Everyblock, and Mapumental.

Q: What are the implications of paywalls for Linked Data?

The general view was that Linked Data – specifically standards like RDF – would allow users and organisations to access information about content even if they couldn’t access the content itself. To give a concrete example, rather than linking to a ‘wall’ that simply requires payment, it would be clearer what the content beyond that wall related to (e.g. key people, organisations, author, etc.)

Leigh Dodds felt that using standards like RDF would allow organisations to more effectively package content in commercially attractive ways, e.g. ‘everything about this organisation’.

Q: What can bloggers do to tap into the potential of Linked Data?

This drew some blank responses, but Leigh Dodds was most forthright, arguing that the onus lay with developers to do things that would make it easier for bloggers to, for example, visualise data. He also pointed out that currently if someone does something with data it is not possible to track that back to the source and that better tools would allow, effectively, an equivalent of pingback for data included in charts (e.g. the person who created the data would know that it had been used, as could others).

Q: Given that the problem for publishing lies in advertising rather than content, how can Linked Data help solve that?

Dan Brickley suggested that OAuth technologies (where you use a single login identity for multiple sites that contains information about your social connections, rather than creating a new ‘identity’ for each) would allow users to specify more specifically how they experience content, for instance: ‘I only want to see article comments by users who are also my Facebook and Twitter friends.’

The same technology would allow for more personalised, and therefore more lucrative, advertising.

John O’Donovan felt the same could be said about content itself – more accurate data about content would allow for more specific selling of advertising.

Martin Belam quoted James Cridland on radio: “[The different operators] agree on technology but compete on content”. The same was true of advertising but the advertising and news industries needed to be more active in defining common standards.

Leigh Dodds pointed out that semantic data was already being used by companies serving advertising.

Other notes

I asked members of the audience who they felt were the heroes and villains of Linked Data in the news industry. The Guardian and BBC came out well – The Daily Mail were named as repeat offenders who would simply refer to “a study” and not say which, nor link to it.

Martin Belam pointed out that The Guardian is increasingly asking itself ‘How will that look through an API’ when producing content, representing a key shift in editorial thinking. If users of the platform are swallowing up significant bandwidth or driving significant traffic then that would probably warrant talking to them about more formal relationships (either customer-provider or partners).

A number of references were made to the problem of provenance – being able to identify where a statement came from. Dan Brickley specifically spoke of the problem with identifying the source of Twitter retweets.

Dan also felt that the problem of journalists not linking would be solved by technology. In conversation previously, he also talked of “subject-based linking” and the impact of SKOS and linked data style identifiers. He saw a problem in that, while new articles might link to older reports on the same issue, older reports were not updated with links to the new updates. Tagging individual articles was problematic in that you then had the equivalent of an overflowing inbox.

(I’ve invited all 4 participants to correct any errors and add anything I’ve missed)

Finally, here’s a bit of video from the very last question addressed in the discussion (filmed with thanks by @countculture):

Linked Data London 090909 from Paul Bradshaw on Vimeo.

Data and the future of journalism: what questions should I ask?

Tomorrow I’m chairing a discussion panel on the Future of Journalism at the first London Linked Data Meetup. On the panel are:

What questions would you like me to ask them about data and the future of journalism?

BBC and Google juice: the BBC responds

Demonstrating once again why journalists should not only blog but monitor incoming links, the BBC’s response to the recent story about ‘holding back Google juice’ in its linking came to my attention as I was scanning the incoming links to this blog. John O’Donovan, Chief Architect, BBC FM&T Journalism, says “nothing sinister”, and:

“We are rolling out improvements to the way this works, as already used on some other parts of the website. Essentially we use JavaScript to retain SEO (“Search Engine Optimisation“) and Google juice for external sites, while we will still be able to track external links. Search Engines, casual observers and those without JavaScript will still see the original URL.”