Tag Archives: linked data

VIDEO: Big data, open data, linked data and other big ideas that data journalists need to know about

Three key terms you might hear used in data journalism circles are “open data“, “linked data” and “big data“. This video, made for students on the MA in Data Journalism at Birmingham City University, explores definitions of the three terms, explains some of the jargon used in relation to them, and the critical and ethical issues to consider in relation to open and big data in particular.

Three other video clips are mentioned in the video, and these are embedded below. First of all, Tim Berners-Lee‘s 2009 call for “raw data now”, where he outlined the potential of open and linked data…

Continue reading →

The most-read posts on Online Journalism Blog — and on Medium — in 2016

Leave a reply

Rounding up the best posts of the year is a good habit to get into, but one that I’ve failed to acquire. In 2014 – the ten year anniversary of this site – I rounded up the year’s best performing posts, which does give you a flavour of what was happening that year — but I forgot to repeat it for 2015.

Here, then, are some reflections on the 10 pieces which did best in 2016 (there were 100 posts across the year), plus the older posts which keep on giving, and a comparison of some pieces which did far better on Medium than on OJB. Continue reading →

Linked data and structured journalism at the BBC

4 Replies

Dont repeat yourself

Last month Basile Simon from BBC News Labs gave a talk at the CSV conference in Berlin: a two-day “community conference for data makers” (notes here). I invited Basile to publish his talk here in a special guest post.

At BBC News Labs, we’ve been pushing for more linked data in news for years now. We built a massive international news aggregator based on linked data, and spent years making it better… but it’s our production and live services who do the core of the job today.

We’re trying to stay relevant and to model our massive dataset of facts, quotes, news and articles. The answer to this may lie in structured journalism.

Starting in 2012, News Labs was founded to play with linked data. The original team, comprised of many data architects, strongly believed this was a revolution in the way we approached our journalism.

They were right. Continue reading →

Linked data and why the current approach to archives is “just not working” – David Caswell on Structured Stories

Leave a reply

By Agustin Palacio

Structured Stories is a news database under construction which intends to empower everyone to collect, use and improve a permanent record of news events. Creator David Caswell wants to switch the current approach to archives, which “is just not working”, for “some form of structured information that can be networked.”

According to Caswell, adding value to the structured narrative could be a way to return to something similar to the economic mechanism of the 20th century: a distribution-based bundle.

And as for journalists? Caswell believes it could be a powerful tool: Continue reading →

When information is power, these are the questions we should be asking

Leave a reply

Various commentators over the past year have made the observation that “Data is the new oil“. If that’s the case, journalists should be following the money. But they’re not.

Instead it’s falling to the likes of Tony Hirst (an Open University academic), Dan Herbert (an Oxford Brookes academic) and Chris Taggart (a developer who used to be a magazine publisher) to fill the scrutiny gap. Recently all three have shone a light into the move towards transparency and open data which anyone with an interest in information would be advised to read.

Hirst wrote a particularly detailed post breaking down the results of a consultation about higher education data.

Herbert wrote about the publication of the first Whole of Government Accounts for the UK.

And Taggart made one of the best presentations I’ve seen on the relationship between information and democracy.

What all three highlight is how control of information still represents the exercise of power, and how shifts in that control as a result of the transparency/open data/linked data agenda are open to abuse, gaming, or spin. Continue reading →

Is Ice Cream Strawberry? Part 4: Human Capital

7 Replies

This is the fourth part of my inaugural lecture at City University London, ‘Is Ice Cream Strawberry?’. You can find part one here, part two here, and part three here.

Human capital

So here’s person number 4: Gary Becker, a Nobel prize-winning economist.

Fifty years ago he used the phrase ‘human capital’ to refer to the economic value that companies should ascribe to their employees.

These days, of course, it is common sense to invest time in recruiting, training and retaining good employees. But at the time employees were seen as a cost.

We need a similar change in the way we see our readers – not as a cost on our time but as a valuable part of our operations that we should invest in recruiting, developing and retaining. Continue reading →

Games, systems and context in journalism at News Rewired

8 Replies

I went to News Rewired on Thursday, along with dozens of other journalists and folk concerned in various ways with news production. Some threads that ran through the day for me were discussions of how we publish our data (and allow others to do the same), how we link our stories together with each other and the rest of the web, and how we can help our readers to explore context around our stories.

Continue reading →

Charities data opened up – journalists: say thanks.

1 Reply

Having made significant inroads in opening up council and local election data, Chris Taggart has now opened up charities data from the less-than-open Charity Commission website. The result: a new website – Open Charities.

The man deserves a round of applause. Charity data is enormously important in all sorts of ways – and is likely to become more so as the government leans on the third sector to take on a bigger role in providing public services. Making it easier to join the dots between charitable organisations, the private and public sector, contracts and individuals – which is what Open Charities does – will help journalists and bloggers enormously.

A blog post by Chris explains the site and its background in more depth. In it he explains that:

“For now, it’s just a the simplest of things, a web application with a unique URL for every charity based on its charity number, and with the basic information for each charity available as data (XML, JSON and RDF). It’s also searchable, and sortable by most recent income and spending, and for linked data people there are dereferenceable Resource URIs.

“The entire database is available to download and reuse (under an open, share-alike attribution licence). It’s a compressed CSV file, weighing in at just under 20MB for the compressed version, and should probably only attempted by those familiar with manipulating large datasets (don’t try opening it up in your spreadsheet, for example). I’m also in the process of importing it into Google Fusion Tables (it’s still churning away in the background) and will post a link when it’s done.”

Chris promises to add more features “if there’s any interest”.

Well, go on…

Experiments in online journalism

10 Replies

Last month the first submissions by students on the MA in Online Journalism landed on my desk. I had set two assignments. The first was a standard portfolio of online journalism work as part of an ongoing, live news project. But the second was explicitly branded ‘Experimental Portfolio‘ – you can see the brief here. I wanted students to have a space to fail. I had no idea how brave they would be, or how successful. The results, thankfully, surpassed any expectations I had. They included:

Dan Davies did a number of experiments around covering cycling collisions in Birmingham that involved mapping, RSS feeds, FOI requests, data, Help Me Investigate, and eventually an idea for a game of sorts.
Alex Gamela constructed the Hashbrum website, experimenting with mapping plugins and other content management technologies. His series of posts on hyperlocal publishing provide an excellent insight into his processes.
Caroline Beavon experimented with Google Wave.
Natalie Chillington experimented with a self-updating gig map. Although she didn’t succeed in achieving what she’d set out to do, the knowledge of web tools and technologies such as KML.
Ruihua Yao experimented with recruiting members of the Chinese community in Birmingham to contribute to a Chinese community blog.
Andy Brightwell looked into the ways linked data can be used to uncover political relationships in local councils. There’s a good reason why there’s no blog post to link to, but I’m not telling you what it is…
Ioana Epure (studying MA Freelancing and Journalism Enterprise, which has some overlap with Online Journalism) looked at music communities and different ways of producing music journalism.
One student launched the map-based social network Blomap.
Mikel Plana was exploring lifestreaming, but was offered a job before the deadline (congratulations Mikel).

There are a range of things that I found positive about the results. Firstly, the sheer variety – students seemed to either instinctively or explicitly choose areas distinct from each other. The resulting reservoir of knowledge and experience, then, has huge promise for moving into the second and final parts of the MA, providing a foundation to learn from each other. Continue reading →

Why news organisations should start thinking seriously about their data

Leave a reply

I don’t often post a simple link-and-quote to another post these days, but Martin Belam’s article on the value of linked data to the news industry is worth blogging about. In it he makes the clearest argument I’ve yet seen for linked data. First, the commercial argument:

“Pages [on a non-news BBC project using linked data] are performing very well in SEO terms. They sometimes even outrank Wikipedia in Google when people make one word searches for animals, which is no mean feat … And the ongoing maintenance cost of organising this wealth of content is reduced.”

Second, the editorial one:

“Let us picture a scenario where each school has a unique canonical identifier, which is applied to all Government data relating to that school. Or – more likely perhaps – that we have mappings of all the different ways that one school might be uniquely identified, depending on the data source. Now picture that news organisations have also tagged any content about that school with the same unique or a similarly interoperable identifier.

“Suddenly, when a newsworthy event takes place, a researcher within a news organisation has at their fingertips a wealth of data – was the school failing, had the people involved been in any coverage of the school before, does the school have a ‘history’ of related incidents that might build up to a story. We have here a potential application of linked civic and news data that improves the tools in our newsrooms.

“And just because we share some common identifiers for data, it doesn’t necessarily mean producing homogeneous content. It is perfectly possible to imagine one news group producing an application that works out the greenest place to live if you want your child to be in the catchment area of a particular school, and another newspaper to use different sets of data to produce an application to tell you where you need to buy a house if you want to get your child into school x, and have the least chance of being burgled. And then news organisations repackaging these services and syndicating them to estate agent and property websites as part of their B2B activities.”

(With a commercial flourish there). It’s worth reading from start to finish.