Linked data and structured journalism at the BBC

Dont repeat yourself

Last month Basile Simon from BBC News Labs gave a talk at the CSV conference in Berlin: a two-day “community conference for data makers” (notes here). I invited Basile to publish his talk here in a special guest post.

At BBC News Labs, we’ve been pushing for more linked data in news for years now. We built a massive international news aggregator based on linked data, and spent years making it better… but it’s our production and live services who do the core of the job today.

We’re trying to stay relevant and to model our massive dataset of facts, quotes, news and articles. The answer to this may lie in structured journalism.

Starting in 2012, News Labs was founded to play with linked data. The original team, comprised of many data architects, strongly believed this was a revolution in the way we approached our journalism.

They were right.

Effortlessly updating the Olympics and elections

BBC olympics

The BBC’s 2012 Olympics pages were generated using linked data

The BBC powered its delivery of the 2012 London Olympics with linked data: effortlessly updating hundreds of athlete, sport, and competition pages with all the recent related news.

We use linked data to curate indexes, such as constituency pages during the elections – imagine keeping 650 pages up to date at any given moment – or the massive repository of all our programmes.

And we also have a lot of fun and academic research interest with our Juicer, a tool we built to ingest news coming from more than 600 international sources and to extract entities to match them against DBPedia. Juicer is a fantastic example of semantic technology.

BBC Juicer

But now you see, there’s a bit of an issue.

‘Trusting in articles’

We humans are extremely good at extracting knowledge out of information, and at making connections between things in our heads.

This “web of meanings,” as my colleague Paul Rissen calls it, is what makes you think of the Prime Minister when you read “Cameron warns prices would rise if the UK leaves the EU.”

Your brain instinctively tells you that we’re probably not talking about James Cameron, the film director, here. Or about the Scottish clan Cameron, for that matter.

The way we do online journalism is by publishing articles, which remain the basic unit of journalism (leaving aside broadcast journalism, of course).

Besides our live blogs, our interactives, our Snapchats and bots, we still write articles: massive walls of text, ranging from 300 to a couple of thousand words, that we just put online in the hope that, somehow, people will find them and read them.

That means that we probably trust that articles are currently the best way to contain and deliver the knowledge we produce.

Knowledge is association and links

This knowledge is the aforementioned mental graph, the connections made between topics.

It isn’t isolated facts: a random collection of things you know. Knowledge is association and links.

The whole point of writing an article about an event or a person is to convey knowledge: to provide context, insights gained from this knowledge.

We journalists pride ourselves in our unbiased and balanced knowledge that we deem more valuable than our opinions.

The real issue is the cost of producing these articles. From a newsroom perspective, the research that goes into each of them is made inefficient by the fact that journalists have to go back to old pieces of content and parse them again to find what they’re looking for: a date, a spelling, an information, etc.

But many stories develop. They’re in fact composed of a series of events, and oftentimes news organisations follow these developments and publish new articles as events unfold.

The costs multiply and add up.

Repeated context

Each of these articles follows the inverted pyramid structure: the most recent and newsworthy bits at the top; details, context and background further down.

This background and context information is in fact repeated, if only slightly differently, from article to article.

You probably see what I’m on about here:

Dont repeat yourself

There is a fundamental programming principle coming from Hunt and Thomas’ bible: “The Pragmatic Programmer“:

DRY – don’t repeat yourself.

And when I see how much we repeat ourselves in news, I cringe.

The costs multiply and add up every time you repeat yourself.

Worse, this is not even beneficial to our cherished audiences. The accumulation of articles and pieces about a topic offers to a new reader only the contemplation of an unstructured chaos.

Chaos in reverse chronological order, where articles, comment pieces, live blogs, columns, videos and social media are mixed together.

“Where to start?” asks our reader. “How can I inform myself and get a grasp of this story?”

Starting all over again, again

This is all because, every time we publish something, we let all the knowledge we invested in this piece’s production go to waste and we’re starting all over again the next time.

We need to start saving knowledge.

We’ve seen that linked data has proven to be quite an amazing tool for news publishers in the past. The point I want to make now is that we could – and probably should – go further.

We could produce knowledge and facts only once, in a way that makes them re-usable next time.

We could take events as the basic unit of reporting in such a way.

Journalism is about reporting on events, after all. Events implicate people, places, organisations, or things.

If you think about it, it’s also clear that a piece of journalism contains a mini graph in itself: it is about something, after all, it mentions people and companies… and you see that we’ve got a web, a graph here.

Events and ontologies

Events can be big events or small events, and some ontologies nest these events – as if some of them could contain, or rather be composed of other events.

Take an election, for example. That’s an event itself, but there are also many smaller, other events that participate in the election:

  • The campaign;
  • A candidate saying something at a press conference;
  • The results coming in…

A candidate touring a factory somewhere during their campaign is an event that we humans would see as linked to the bigger one: the election.

But it’s also linked to this factory’s story, and that’s a web again, because there are links everywhere.


At the BBC, we’ve been playing with the concept of storylines for a little while now. A storyline is supported by an ontology, like the one above, and is no more and no less than a way to represent an event-based narrative.

Events and information are inputed in a structured way and can then be ordered in a narrative.

Obviously, events are often part of several stories and narratives. They can also have several interpretations.

For example, our candidate’s visit to a factory is, according to this ontology, an event; the story is the editorial perspective on this event: the election itself and how successful this candidate is being so far.

linked data

The reporter’s task is then to input facts and events into the database, as well as to connect and explain to the machine the relationships between them and earlier ones.

The storyline is made of components, such as the events we just mentioned, or datasets, or even other storylines!

This all looks tidy and rich, right?

But there’s even more: because the information and the content is now all tidily structured, it can be exposed with a bit of magic through an API.

And in as many formats as we want.

A presentation issue

After all, it’s just a presentation issue now that the content is there and structured. It can be delivered in many languages, you just need to plug in some translation pipeline like Google Translate or IBM Watson in the middle.

It can be snack-sized for people on their smartphones, expanded into a fully-fledged feature piece on desktop, and queried by our new fancy Facebook or Slack bot.

BBC R and D

BBC R&D are playing around this idea with the Elastic News project, a way to deliver the same piece of content with variable depth to strengthen our audiences’ understanding of stories.

It’s up to the user to explore and dive into the details of a story should they want to do so.

BBC live reporting

Similarly, we’ve been looking at tuning up our content with in-line immediate explanations of some concepts about which we’ve got structured information (these are the people, places, organisations, and things I mentioned earlier, if you remember).

Object-Based Broadcasting

BBC object based broadcasting

There are now lots of things that can be done with content made available through an API. R&D have a large workstream called Object-Based Broadcasting. The idea is that the content is a set of individual assets whose relationships and associations are described through metadata.

Content can be adjusted, re-versioned, made longer or shorter, and even explorable by the audiences.

We’ve separated the content from its delivery and consumption, and structured this ideal content around events.

Not only are the possibilities infinite and appealing, but if you think about it, it kind of makes sense to think this way.

It makes sense because news never stops. It happens all the time, it is a continuous flow and accumulation of events and facts.

As I said, stories develop and evolve, and journalism follows this.

We don’t have to repeat ourselves every time we’ve got to write an article

The only structure that makes sense is the narrative: “a spoken or written account of connected events; a story.” And see: “connected events.”

And we can invest our efforts into the creation and the curation of such narratives, now that we don’t have to repeat ourselves every time we’ve got to write an article.

These narratives make sense to somebody who stumbles upon them, because we think and understand the world in narrative forms.

“Oh, but what led to this event?” asks the reader. A click later, the narrative expanded to include another causal and helpful event.

“Ah, that makes sense, I get it now.” Job done.

Narratives evolve

These narratives are not frozen, they’re constantly evolving as stories develop. They can even represent the different views of the world we have. Chancellor Merkel thought it was only human and decent to open Germany’s doors to Syrian and Iraqi refugees; across the Channel the Daily Mail thinks it’s madness.

Same facts, different narratives.

Manifesto for structured journalism

Behind the scenes, and I quote Jacqui Maher and Paul Rissen‘s Manifesto for Structured Journalism:

“Such a database of knowledge – which already exists in the collective knowledge of our newsroom staff – could be used to provide context at scale across all our output. Structured journalism is a way of preserving a reporter’s expertise so that it isn’t lost once aired or published, and instead, is surfaced in related coverage.”


4 thoughts on “Linked data and structured journalism at the BBC

  1. Pingback: Linked data and structured journalism at the BBC | Online Journalism Blog | do not drop the ball

  2. Pingback: Structured Data News Round Up: June 7th, 2016 - Hunch Manifest Inc

  3. Pingback: The most-read posts on Online Journalism Blog — and on Medium — in 2016 | Online Journalism Blog

  4. Pingback: Datos, metadatos y quién se encarga de mantenerlos |

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.