Three key terms you might hear used in data journalism circles are “open data“, “linked data” and “big data“. This video, made for students on the MA in Data Journalism at Birmingham City University, explores definitions of the three terms, explains some of the jargon used in relation to them, and the critical and ethical issues to consider in relation to open and big data in particular.
Three other video clips are mentioned in the video, and these are embedded below. First of all, Tim Berners-Lee‘s 2009 call for “raw data now”, where he outlined the potential of open and linked data…
Here, then, are some reflections on the 10 pieces which did best in 2016 (there were 100 posts across the year), plus the older posts which keep on giving, and a comparison of some pieces which did far better on Medium than on OJB. Continue reading →
Last month Basile Simon from BBC News Labs gave a talk at the CSV conference in Berlin: a two-day “community conference for data makers” (notes here). I invited Basile to publish his talk here in a special guest post.
At BBC News Labs, we’ve been pushing for more linked data in news for years now. We built a massive international news aggregator based on linked data, and spent years making it better… but it’s our production and live services who do the core of the job today.
We’re trying to stay relevant and to model our massive dataset of facts, quotes, news and articles. The answer to this may lie in structured journalism.
Starting in 2012, News Labs was founded to play with linked data. The original team, comprised of many data architects, strongly believed this was a revolution in the way we approached our journalism.
Structured Stories is a news database under construction which intends to empower everyone to collect, use and improve a permanent record of news events. Creator David Caswell wants to switch the current approach to archives, which “is just not working”, for “some form of structured information that can be networked.”
According to Caswell, adding value to the structured narrative could be a way to return to something similar to the economic mechanism of the 20th century: a distribution-based bundle.
Instead it’s falling to the likes of Tony Hirst (an Open University academic), Dan Herbert (an Oxford Brookes academic) and Chris Taggart (a developer who used to be a magazine publisher) to fill the scrutiny gap. Recently all three have shone a light into the move towards transparency and open data which anyone with an interest in information would be advised to read.
What all three highlight is how control of information still represents the exercise of power, and how shifts in that control as a result of the transparency/open data/linked data agenda are open to abuse, gaming, or spin. Continue reading →
So here’s person number 4: Gary Becker, a Nobel prize-winning economist.
Fifty years ago he used the phrase ‘human capital’ to refer to the economic value that companies should ascribe to their employees.
These days, of course, it is common sense to invest time in recruiting, training and retaining good employees. But at the time employees were seen as a cost.
We need a similar change in the way we see our readers – not as a cost on our time but as a valuable part of our operations that we should invest in recruiting, developing and retaining. Continue reading →
I went to News Rewired on Thursday, along with dozens of other journalists and folk concerned in various ways with news production. Some threads that ran through the day for me were discussions of how we publish our data (and allow others to do the same), how we link our stories together with each other and the rest of the web, and how we can help our readers to explore context around our stories.
The man deserves a round of applause. Charity data is enormously important in all sorts of ways – and is likely to become more so as the government leans on the third sector to take on a bigger role in providing public services. Making it easier to join the dots between charitable organisations, the private and public sector, contracts and individuals – which is what Open Charities does – will help journalists and bloggers enormously.
“For now, it’s just a the simplest of things, a web application with a unique URL for every charity based on its charity number, and with the basic information for each charity available as data (XML, JSON and RDF). It’s also searchable, and sortable by most recent income and spending, and for linked data people there are dereferenceable Resource URIs.
“The entire database is available to download and reuse (under an open, share-alike attribution licence). It’s a compressed CSV file, weighing in at just under 20MB for the compressed version, and should probably only attempted by those familiar with manipulating large datasets (don’t try opening it up in your spreadsheet, for example). I’m also in the process of importing it into Google Fusion Tables (it’s still churning away in the background) and will post a link when it’s done.”
Chris promises to add more features “if there’s any interest”.
Last month the first submissions by students on the MA in Online Journalism landed on my desk. I had set two assignments. The first was a standard portfolio of online journalism work as part of an ongoing, live news project. But the second was explicitly branded ‘Experimental Portfolio‘ – you can see the brief here. I wanted students to have a space to fail. I had no idea how brave they would be, or how successful. The results, thankfully, surpassed any expectations I had. They included:
Alex Gamela constructed the Hashbrum website, experimenting with mapping plugins and other content management technologies. His series of posts on hyperlocal publishing provide an excellent insight into his processes.
Ruihua Yao experimented with recruiting members of the Chinese community in Birmingham to contribute to a Chinese community blog.
Andy Brightwell looked into the ways linked data can be used to uncover political relationships in local councils. There’s a good reason why there’s no blog post to link to, but I’m not telling you what it is…
There are a range of things that I found positive about the results. Firstly, the sheer variety – students seemed to either instinctively or explicitly choose areas distinct from each other. The resulting reservoir of knowledge and experience, then, has huge promise for moving into the second and final parts of the MA, providing a foundation to learn from each other. Continue reading →
“Pages [on a non-news BBC project using linked data] are performing very well in SEO terms. They sometimes even outrank Wikipedia in Google when people make one word searches for animals, which is no mean feat … And the ongoing maintenance cost of organising this wealth of content is reduced.”
Second, the editorial one:
“Let us picture a scenario where each school has a unique canonical identifier, which is applied to all Government data relating to that school. Or – more likely perhaps – that we have mappings of all the different ways that one school might be uniquely identified, depending on the data source. Now picture that news organisations have also tagged any content about that school with the same unique or a similarly interoperable identifier.
“Suddenly, when a newsworthy event takes place, a researcher within a news organisation has at their fingertips a wealth of data – was the school failing, had the people involved been in any coverage of the school before, does the school have a ‘history’ of related incidents that might build up to a story. We have here a potential application of linked civic and news data that improves the tools in our newsrooms.
“And just because we share some common identifiers for data, it doesn’t necessarily mean producing homogeneous content. It is perfectly possible to imagine one news group producing an application that works out the greenest place to live if you want your child to be in the catchment area of a particular school, and another newspaper to use different sets of data to produce an application to tell you where you need to buy a house if you want to get your child into school x, and have the least chance of being burgled. And then news organisations repackaging these services and syndicating them to estate agent and property websites as part of their B2B activities.”