Tag Archives: martin belam

ITV News’s new website – welcome to the news stream

The new ITV website

A few months ago I saw an early version of ITV News’ new website, and was pretty impressed. Now it’s finally live – and I’m still impressed.

Why? Because this doesn’t just fiddle around the edges like so many news website relaunches have done since 2007. This reinvents the whole thing – and the result is more like a social network or a blog than a newspaper front page. Continue reading

Crowdsourcing investigative journalism: a case study (part 1)

As I begin on a new Help Me Investigate project, I thought it was a good time to share some research I conducted into the first year of the site, and the key factors in how that project tried to crowdsource investigative and watchdog journalism.

The findings of this research have been key to the development of this new project. They also form the basis of a chapter in the book Face The Future, and another due to be published in the Handbook of Online Journalism next year (not to be confused with my own Online Journalism Handbook). Here’s the report:

In both academic and mainstream literature about the world wide web, one theme consistently recurs: the lowering of the barrier allowing individuals to collaborate in pursuit of a common goal. Whether it is creating the world’s biggest encyclopedia (Lih, 2009), spreading news about a protest (Morozov, 2011) or tracking down a stolen phone (Shirky, 2008), the rise of the network has seen a decline in the role of the formal organisation, including news organisations.

Two examples of this phenomenon were identified while researching a book chapter on investigative journalism and blogs (De Burgh, 2008). The first was an experiment by The Florida News Press: when it started receiving calls from readers complaining about high water and sewage connection charges for newly constructed homes the newspaper, short on in-house resources to investigate the leads, decided to ask their readers to help. The result is by now familiar as a textbook example of “crowdsourcing” – outsourcing a project to ‘the crowd’ or what Brogan & Smith (2009, p136) describe as “the ability to have access to many people at a time and to have them perform one small task each”:

“Readers spontaneously organized their own investigations: Retired engineers analyzed blueprints, accountants pored over balance sheets, and an inside whistle-blower leaked documents showing evidence of bid-rigging.” (Howe, 2006a)

The second example concerned contaminated pet food in the US, and did not involve a mainstream news organisation. In fact, it was frustration with poor mainstream ‘churnalism’ (see Davies, 2009) that motivated bloggers and internet users to start digging into the story. The resulting output from dozens of blogs ranged from useful information for pet owners and the latest news to the compilation of a database that suggested the official numbers of pet deaths recorded by the US Food and Drug Administration was short by several thousand. One site, Itchmo.com, became so popular that it was banned in China, the source of the pet food in question.

What was striking about both examples was not simply that people could organise to produce investigative journalism, but that this practice of ‘crowdsourcing’ had two key qualities that were particularly relevant to journalism’s role in a democracy. The first was engagement: in the case of the News-Press for six weeks the story generated more traffic to its website than “ever before, excepting hurricanes” (Weise, 2007). Given that investigative journalism often concerns very ‘dry’ subject matter that has to be made appealing to a wider audience, these figures were surprising – and encouraging for publishers.

The second quality was subject: the contaminated pet food story was, in terms of mainstream news values, unfashionable and unjustifiable in terms of investment of resources. It appeared that the crowdsourcing model of investigation might provide a way to investigate stories which were in the public interest but which commercial and public service news organisations would not consider worth their time. More broadly, research on crowdsourcing more generally suggested that it worked “best in areas that are not core to your product or central to your business model” (Tapscott and Williams, 2006, p82).

Investigative journalism: its history and discourses

DeBurgh (2008, p10) defines investigative journalism as “distinct from apparently similar work [of discovering truth and identifying lapses from it] done by police, lawyers and auditors and regulatory bodies in that it is not limited as to target, not legally founded and usually earns money for media publishers.” The term is notoriously problematic and contested: some argue that all journalism is investigative, or that the recent popularity of the term indicates the failure of ‘normal’ journalism to maintain investigative standards. This contestation is a symptom of the various factors underlying the growth of the genre, which range from journalists’ own sense of a democratic role, to professional ambition and publishers’ commercial and marketing objectives.

More recently investigative journalism has been used to defend traditional print journalism against online publishing, with publishers arguing that true investigative journalism cannot be maintained without the resources of a print operation. This position has become harder to defend as online-only operations and journalists have won increasing numbers of awards for their investigative work – Clare Sambrook in the UK and VoiceOfSanDiego.com and Talking Points Memo in the US are three examples – while new organisations have been established to pursue investigations without any associated print operation including Canada’s OpenFile; the UK’s Bureau of Investigative Journalism and a number of bodies in the US such as ProPublica, The Florida Center for Investigative Reporting, and the Huffington Post’s investigative unit.

In addition, computer technology has started to play an increasingly important role in print investigative journalism: Stephen Grey’s investigation into the CIA’s ‘extraordinary rendition’ programme (Grey, 2006) was facilitated by the use of software such as Analyst’s Notebook, which allowed him to analyse large amounts of flight data and identify leads. The Telegraph’s investigation into MPs’ expenses was made possible by digitisation of data and the ability to store large amounts on a small memory stick. And newspapers around the world collaborated with the Wikileaks website to analyse ‘warlogs’ from Iraq and Afghanistan, and hundreds of thousands of diplomatic cables. More broadly the success of Wikipedia inspired a raft of examples of ‘Wiki journalism’ where users were invited to contribute to editorial coverage of a particular issue or field, with varying degrees of success.

Meanwhile, investigative journalists such as The Guardian’s Paul Lewis have been exploring a more informal form of crowdsourcing, working with online communities to break stories including the role of police in the death of newspaper vendor Ian Tomlinson; the existence of undercover agents in the environmental protest movement; and the death of a man being deported to Angola (Belam, 2011b).

This is part of a broader move to networked journalism explored by Charlie Beckett (2008):

“In a world of ever-increasing media manipulation by government and business, it is even more important for investigative journalists to use technology and connectivity to reveal hidden truths. Networked journalists are open, interactive and share the process. Instead of gatekeepers they are facilitators: the public become co-producers. Networked journalists “are ‘medium agnostic’ and ‘story-centric’”. The process is faster and the information sticks around longer.” (2008, p147)

As one of its best-known practitioners Paul Lewis talks particularly of the role of technology in his investigations – specifically Twitter – but also the importance of the crowd itself and journalistic method:

“A crucial factor that makes crowd-sourcing a success [was that] there was a reason for people to help, in this case a perceived sense of injustice and that the official version of events did not tally with the truth. Six days after Tomlinson’s death, Paul had twenty reliable witnesses who could be placed on a map at the time of the incident – and only one of them had come from the traditional journalistic tool of a contact number in his notebook.” (Belam, 2011b)

A further key skill identified by Lewis is listening to the crowd – although he sounds a note of caution in its vulnerability to deliberately placed misinformation, and the need for verification.

“Crowd-sourcing doesn’t always work […] The most common thing is that you try, and you don’t find the information you want […] The pattern of movement of information on the internet is something journalists need to get their heads around. Individuals on the web in a crowd seem to behave like a flock of starlings – and you can’t control their direction.” (Belam, 2011b)

Conceptualising Help Me Investigate

The first plans for Help Me Investigate were made in 2008 and were further developed over the next 18 months. They built on research into crowdsourced investigative journalism, as well as other research into online journalism and community management. In particular the project sought to explore concepts of “P2P journalism” which enables “more engaged interaction between and amongst users” (Bruns, 2005, p120, emphasis in original) and of “produsage”, whose affordances included probabilistic problem solving, granular tasks, equipotentiality, and shared content (Bruns, 2008, p19).

A key feature in this was the ownership of the news agenda by users themselves (who could be either members of the public or journalists). This was partly for reasons identified above in research into the crowdsourced investigation into contaminated pet food. It would allow the site to identify questions that would not be considered viable for investigation within a traditional newsroom; but the feature was also implemented because ‘ownership’ was a key area of contestation identified within crowdsourcing research (Lih, 2009; Benkler, 2006; Surowiecki, 2005) – ‘outsourcing’ a project to a group of people raises obvious issues regarding claims of authorship, direction and benefits (Bruns, 2005).

These issues were considered carefully by the founders. The site adopted a user interface with three main modes of navigation for investigations: most-recent-top; most popular (those investigations with the most members); and two ‘featured’ investigations chosen by site staff: these were chosen on the basis that they were the most interesting editorially, or because they were attracting particular interest and activity from users at that moment. There was therefore an editorial role, but this was limited to only two of the 18 investigations listed on the ‘Investigations’ page, and was at least partly guided by user activity.

In addition there were further pages where users could explore investigations through different criteria such as those investigations that had been completed, or those investigations with particular tags (e.g. ‘environment’, ‘Bristol’, ‘FOI’, etc.).

A second feature of the site was that ‘journalism’ was intended to be a by-product: the investigation process itself was the primary objective, which would inform users, as research suggested that if users were to be attracted to the site, it must perform the function that they needed it to (Porter, 2008), which was – as became apparent – one of project management. The ‘problem’ that the site was attempting to ‘solve’ needed to be user-centric rather than publisher-centric: ‘telling stories’ would clearly be lower down the priority list for users than it was for journalists and publishers. Of higher priority were the need to break down a question into manageable pieces; find others to investigate those with; and get answers. This was eventually summarised in the strapline to the site: “Connect, mobilise, uncover”.

Thirdly, there was a decision to use ‘game mechanics’ that would make the process of investigation inherently rewarding. As the site and its users grew, the interface was changed so that challenges started on the left hand side of the screen, coloured red, then moved to the middle when accepted (the colour changing to amber), and finally to the right column when complete (now with green border and tick icon). This made it easier to see at a glance what needed doing and what had been achieved, and also introduced a level of innate satisfaction in the task. Users, the idea went, might grow to like to feeling of moving those little blocks across the screen, and the positive feedback (see Graham, 2010 and Dondlinger, 2007) provided by the interface.

Similar techniques were coincidentally explored at the same time by The Guardian’s MPs’ expenses app (Bradshaw, 2009). This provided an interface for users to investigate MP expense claim forms that used many conventions of game design, including a ‘progress bar’, leaderboards, and button-based interfaces. A second iteration of the app – created when a second batch of claim forms were released – saw a redesigned interface based on a stronger emphasis on positive feedback. As developer Martin Belam explains (2011a):

“When a second batch of documents were released, the team working on the app broke them down into much smaller assignments. That meant it was easier for a small contribution to push the totals along, and we didn’t get bogged down with the inertia of visibly seeing that there was a lot of documents still to process.

“By breaking it down into those smaller tasks, and staggering their start time, you concentrated all of the people taking part on one goal at a time. They could therefore see the progress dial for that individual goal move much faster than if you only showed the progress across the whole set of documents.”

These game mechanics are not limited to games: many social networking sites have borrowed the conventions to provide similar positive feedback to users. Jon Hickman (2010, p2) describes how Help Me Investigate uses these genre codes and conventions:

“In the same way that Twitter records numbers of “followers”, “tweets”, “following” and “listed”, Help Me Investigate records the number of “things” which the user is currently involved in investigating, plus the number of “challenges”, “updates” and “completed investigations” they have to their credit. In both Twitter and Help Me Investigate these labels have a mechanistic function: they act as hyperlinks to more information related to the user’s profile. They can also be considered culturally as symbolic references to the user’s social value to the network – they give a number and weight to the level of activity the user has achieved, and so can be used in informal ranking of the user’s worth, importance and usefulness within the network.” (2010, p8)

This was indeed the aim of the site design, and was related to a further aim of the site: to allow users to build ‘social capital’ within and through the site: users could add links to web presences and Twitter accounts, as well as add biographies and ‘tag’ themselves. They were also ranked in a ‘Most active’ table; and each investigation had its own graph of user activity. This meant that users might use the site not simply for information-gathering reasons, but also for reputation building ones, a characteristic of open source communities identified by Bruns (2005) and Leadbeater (2008) among others.

There were plans to take these ideas much further which were shelved during the proof of concept phase as the team concentrated on core functionality. For example, it was clear that users needed to be able to give other users praise for positive contributions, and they used the ‘update feature’ to do so. A more intuitive function allowing users to give a ‘thumbs up’ to a contribution would have made this easier, and also provided a way to establish the reputation of individual users, and encourage further use.

Another feature of the site’s construction was a networked rather than centralised design. The bid document to 4iP proposed to aggregate users’ material:

“via RSS and providing support to get users onto use web-based services. While the technology will facilitate community creation around investigations, the core strategy will be community-driven, ‘recruiting’ and supporting alpha users who can drive the site and community forward.”

Again, this aggregation functionality was dropped as part of focusing the initial version of the site. However, the basic principle of working within a network was retained, with many investigations including a challenge to blog about progress on other sites, or use external social networks to find possible contributors. The site included guidance on using tools elsewhere on the web, and many investigations linked to users’ blog posts.

In the second part I discuss the building of the site and reflections on the site’s initial few months.

Why we need open courts data – and newspapers need to improve too

Justice

Justice photo by mira66

Few things sum up the division of the UK around the riots like the sentencing of those involved. Some think courts are too lenient, while others gape at six month sentences for people who stole a bottle of water.

These judgments are often made on the basis of a single case, rather than any overall view. And you might think, in such a situation, that a journalist’s role would be to find out just how harsh or lenient sentencing has been – not just across the 1,600 or more people who have been arrested during the riots, but also in comparison to previous civil disturbances – or indeed, to similar crimes outside of a riot situation.

As Martin Belam argues:

“Really good data journalism will help us untangle the truth from those prejudiced assumptions. But this is data journalism that needs to stay the course, and seems like an ideal opportunity to do “long-form data journalism”. How long will these looters serve? What is the ethnic make-up and age range of those convicted? How many other criminals will get an early release because our jails are newly full of looters? How many people convicted this week will go on to re-offend?”

And yet, amazingly, we cannot reliably answer these questions – because it is still not possible to get raw data on sentencing in UK courts, not even through FOI. Continue reading

3 things that BBC Online has given to online journalism

It’s now 3 weeks since the BBC announced 360 online staff were to lose their jobs as part of a 25% cut to the online budget. It’s a sad but unsurprising part of a number of cuts which John Naughton summarises as: “It’s not television”, a sign that “The past has won” in the internal battle between those who saw consumers as passive vessels for TV content, and those who credited them with some creativity.

Dee Harvey likewise poses the question: “In the same way that openness is written into the design of the Internet, could it be that closedness is written into the very concept of the BBC?”

If it is, I don’t think it can remain that way for ever. Those who have been part of the BBC’s work online will feel rightly proud of what has been achieved since the corporation went online in 1997. Here are just 3 ways that the corporation has helped to define online journalism as we know it – please add others that spring to mind:

1. Web writing style

The BBC’s way of writing for the web has always been a template for good web writing, not least because of the BBC’s experience with having to meet similar challenges with Ceefax – the two shared a content management system and journalists writing for the website would see the first few pars of their content cross-published on Ceefax too.

Even now it is difficult to find an online publisher who writes better for the web.

2. Editors blogs

Thanks to the likes of Robin Hamman, Martin Belam, Jem Stone and Tom Coates – to name just a few – when the BBC did begin to adopt blogs (it was not an early adopter) it did so with a spirit that other news organisations lacked.

In particular, the Editors’ Blogs demonstrated a desire for transparency that many other news organisations have yet to repeat, while the likes of Robert Peston, Kevin Anderson and Rory Cellan-Jones have played a key role in showing skeptical journalists how engaging with the former audience on blogs can form a key part of the newsgathering process.

Unfortunately, many of those innovators later left the BBC, and the earlier experimentation was replaced with due process.

3. Backstage

While so many sing and dance about the APIs of The Guardian and The New York Times, Ian Forrester’s BBC Backstage project was well ahead of the game when it opened up the corporation’s API and started hosting hack days and meetups way back in 2005.

Backstage closed at the end of last year, just as the rest of the UK’s media were starting to catch up. You can read an e-book on its history here.

What else?

I’m sure you can add others – the iPlayer and their on-demand team; Special Reports; the UGC hub (the biggest in the world as far as I know); and even their continually evolving approach to linking (still not ideal, but at least they think about it) are just some that spring to mind. What parts of BBC Online have influenced or inspired you?

PCC gets SEO in new ruling on online corrections


Mirror URL which could land them in court

More from the PCC following yesterday’s Twitter ruling: new guidance on online corrections shows a surprising awareness of search engine optimisation techniques.

Among other points of the guidance are that:

  • “Care must be taken that the URL of an article does not contain information that has been the subject of successful complaint. If an article is amended, then steps should be taken to amend the URL, as necessary.
  • “Online corrections and apologies should be tagged when published to ensure that they are searchable.”

The guidance addresses a recurring problem with news reports which are corrected after subs see sense – but whose HTML and URL continue to display information which could land the publisher in court – for example that shown in the image above (from here) and below, from this post.(Thanks to Martin Belam for finding the main image) – if you can recall the others, let me know.

UPDATE: Thanks to Malcolm Coles for pointing me to some prime candidates at the end of this Robots.txt file

UPDATE 2: Here’s another one from Malcolm: even newspapers who change their URL can still be found out.

Daily Mail article - corrected text, but original HTML

UK general election 2010 – online journalism is ordinary

Has online journalism become ordinary? Are the approaches starting to standardise? Little has stood out in the online journalism coverage of this election – the innovation of previous years has been replaced by consolidation.

Here are a few observations on how the media approached their online coverage: Continue reading

Why news organisations should start thinking seriously about their data

I don’t often post a simple link-and-quote to another post these days, but Martin Belam’s article on the value of linked data to the news industry is worth blogging about. In it he makes the clearest argument I’ve yet seen for linked data. First, the commercial argument:

“Pages [on a non-news BBC project using linked data] are performing very well in SEO terms. They sometimes even outrank Wikipedia in Google when people make one word searches for animals, which is no mean feat … And the ongoing maintenance cost of organising this wealth of content is reduced.”

Second, the editorial one:

“Let us picture a scenario where each school has a unique canonical identifier, which is applied to all Government data relating to that school. Or – more likely perhaps – that we have mappings of all the different ways that one school might be uniquely identified, depending on the data source. Now picture that news organisations have also tagged any content about that school with the same unique or a similarly interoperable identifier.

“Suddenly, when a newsworthy event takes place, a researcher within a news organisation has at their fingertips a wealth of data – was the school failing, had the people involved been in any coverage of the school before, does the school have a ‘history’ of related incidents that might build up to a story. We have here a potential application of linked civic and news data that improves the tools in our newsrooms.

“And just because we share some common identifiers for data, it doesn’t necessarily mean producing homogeneous content. It is perfectly possible to imagine one news group producing an application that works out the greenest place to live if you want your child to be in the catchment area of a particular school, and another newspaper to use different sets of data to produce an application to tell you where you need to buy a house if you want to get your child into school x, and have the least chance of being burgled. And then news organisations repackaging these services and syndicating them to estate agent and property websites as part of their B2B activities.”

(With a commercial flourish there). It’s worth reading from start to finish.

Data and the future of journalism panel discussion: Linked Data London

Tonight I had the pleasure of chairing an extremely informative panel discussion on data and the future of journalism at the first London Linked Data Meetup. On the panel were:

What follows is a series of notes from the discussion, which I hope are of some use.

For a primer on Linked Data there is A Skim-Read Introduction to Linked DataLinked Data: The Story So Far PDF) by Tom Heath, Christian Bizer and Berners-Lee; and this TED video by Sir Tim Berners-Lee (who was on the panel before this one).

To set some brief context, I talked about how 2009 was, for me, a key year in data and journalism – largely because it has been a year of crisis in both publishing and government. The seminal point in all of this has been the MPs’ expenses story, which both demonstrated the power of data in journalism, and the need for transparency from government – for example, the government appointment of Sir Tim Berners-Lee, seeking developers to suggest things to do with public data, and the imminent launch of Data.gov.uk around the same issue.

Even before then the New York Times and Guardian both launched APIs at the beginning of the year, MSN Local and the BBC have both been working with Wikipedia and we’ve seen the launch of a number of startups and mashups around data including Timetric, Verifiable, BeVocal, OpenlyLocal, MashTheState, the open source release of Everyblock, and Mapumental.

Q: What are the implications of paywalls for Linked Data?

The general view was that Linked Data – specifically standards like RDF – would allow users and organisations to access information about content even if they couldn’t access the content itself. To give a concrete example, rather than linking to a ‘wall’ that simply requires payment, it would be clearer what the content beyond that wall related to (e.g. key people, organisations, author, etc.)

Leigh Dodds felt that using standards like RDF would allow organisations to more effectively package content in commercially attractive ways, e.g. ‘everything about this organisation’.

Q: What can bloggers do to tap into the potential of Linked Data?

This drew some blank responses, but Leigh Dodds was most forthright, arguing that the onus lay with developers to do things that would make it easier for bloggers to, for example, visualise data. He also pointed out that currently if someone does something with data it is not possible to track that back to the source and that better tools would allow, effectively, an equivalent of pingback for data included in charts (e.g. the person who created the data would know that it had been used, as could others).

Q: Given that the problem for publishing lies in advertising rather than content, how can Linked Data help solve that?

Dan Brickley suggested that OAuth technologies (where you use a single login identity for multiple sites that contains information about your social connections, rather than creating a new ‘identity’ for each) would allow users to specify more specifically how they experience content, for instance: ‘I only want to see article comments by users who are also my Facebook and Twitter friends.’

The same technology would allow for more personalised, and therefore more lucrative, advertising.

John O’Donovan felt the same could be said about content itself – more accurate data about content would allow for more specific selling of advertising.

Martin Belam quoted James Cridland on radio: “[The different operators] agree on technology but compete on content”. The same was true of advertising but the advertising and news industries needed to be more active in defining common standards.

Leigh Dodds pointed out that semantic data was already being used by companies serving advertising.

Other notes

I asked members of the audience who they felt were the heroes and villains of Linked Data in the news industry. The Guardian and BBC came out well – The Daily Mail were named as repeat offenders who would simply refer to “a study” and not say which, nor link to it.

Martin Belam pointed out that The Guardian is increasingly asking itself ‘How will that look through an API’ when producing content, representing a key shift in editorial thinking. If users of the platform are swallowing up significant bandwidth or driving significant traffic then that would probably warrant talking to them about more formal relationships (either customer-provider or partners).

A number of references were made to the problem of provenance – being able to identify where a statement came from. Dan Brickley specifically spoke of the problem with identifying the source of Twitter retweets.

Dan also felt that the problem of journalists not linking would be solved by technology. In conversation previously, he also talked of “subject-based linking” and the impact of SKOS and linked data style identifiers. He saw a problem in that, while new articles might link to older reports on the same issue, older reports were not updated with links to the new updates. Tagging individual articles was problematic in that you then had the equivalent of an overflowing inbox.

(I’ve invited all 4 participants to correct any errors and add anything I’ve missed)

Finally, here’s a bit of video from the very last question addressed in the discussion (filmed with thanks by @countculture):

Linked Data London 090909 from Paul Bradshaw on Vimeo.

Data and the future of journalism: what questions should I ask?

Tomorrow I’m chairing a discussion panel on the Future of Journalism at the first London Linked Data Meetup. On the panel are:

What questions would you like me to ask them about data and the future of journalism?