Linked data and structured journalism at the BBC

Last month Basile Simon from BBC News Labs gave a talk at the CSV conference in Berlin: a two-day “community conference for data makers” (notes here). I invited Basile to publish his talk here in a special guest post.

At BBC News Labs, we’ve been pushing for more linked data in news for years now. We built a massive international news aggregator based on linked data, and spent years making it better… but it’s our production and live services who do the core of the job today.

We’re trying to stay relevant and to model our massive dataset of facts, quotes, news and articles. The answer to this may lie in structured journalism.

Starting in 2012, News Labs was founded to play with linked data. The original team, comprised of many data architects, strongly believed this was a revolution in the way we approached our journalism.

BBC moves to more structured data in its relaunch

Behind the story of the BBC website’s recent relaunch is, among other things, an update to their content management system. In a post on the changes, John O’Donovan explains how the changes mean that webpages will have a more structured and semantic quality:

“We will … no longer be using tables to layout the content, instead we will be rendering the pages using CSS layout and only using tables for data.

“There are lots of reasons to do this, but some include making the content more efficient, more standards compliant and faster to render. It also allows us to publish semantic XHTML, which means that content blocks are better marked up to describe what they are and has benefits like creating a better header structure to help screen readers.

“Better structure also means you will see a more consistent presentation of stories in Google and search engines with, for example, story dates and author information showing more clearly.

“This reflects a new content model which is now largely based around a simple and generic data model of assets and groups of assets which are typed (meaning we don’t just manage blocks of content, we use metadata to describe what is in the blocks of content) and publishing through templates and services based around Velocity.”

In addition code that now looks like the image above will mean that the site is better search engine optimised (as if a PageRank of 9 wasn’t good enough), more accessible, and it will be easier for developers to do interesting things with BBC content.

On the subject of SEO the site is simplifying URLs but still won’t be including descriptive words there – but “there is more work to do yet on how we might use even shorter URLs (such as http://www.bbc.co.uk/10250603) and longer more descriptive ones http://www.bbc.co.uk/story-about-something-interesting.”