This, says FSB, is a lie that demonstrates the “pretence” that “‘crunching the numbers’ is somehow an an abstract, scientific, mathematical task”. Continue reading
UPDATE [Feb 14 2012]: Full Fact picked up the challenge and dug into the data:
“The crucial difference is in methodology – while the TPA used individuals as its basis, the IFS used households as provided by the Government data.
“This led to substantially different conclusions. The IFS note that using household income as a measure demonstrates increased gains for households with two or more earners. As they state:
“”families with two taxpayers would gain more than families with one taxpayer, who tend to be worse off. Thus, overall, better-off families (although not the very richest) would tend to gain most in cash terms from this reform…””
Here’s a great test for eagle-eyed journalists, tweeted by Guardian’s James Ball. It’s a tale of two charts that claim to show the impact of a change in the income tax threshold to £10,000. Here’s the first:
And here’s the second:
So: same change, very different stories. In one story (Institute for Fiscal Studies) it is the the wealthiest that appear to benefit the most; but in the other (Taxpayers’ Alliance via Guido Fawkes) it’s the poorest who are benefiting.
Did you spot the difference? The different y axis is a slight clue – the first chart covers a wider range of change – but it’s the legend that gives the biggest hint: one is measuring change as a percentage of gross income (before, well, taxes); the other as a change in net income (after tax).
James’s colleague Mary Hamilton put it like this: “4.5% of very little is of course much less than 1% of loads.” Or, more specifically: 4.6% of £10,853 (the second decile mentioned in Fawkes’ post) is £499.24; 1.1% of £47,000 (the 9th decile according to the same ONS figures) is £517. (Without raw data, it’s hard to judge what figures are being used – if you include earnings over that £47k marker then it changes things, for example, and there’s no link to the net earnings).
In a nutshell, like James, I’m not entirely sure why they differ so strikingly. So, further statistical analysis welcome.
UPDATE: Seems a bit of a Twitter fight erupted between Guido Fawkes and James Ball over the source of the IFS data. James links to this pre-election document containing the chart and this one on ‘Budget 2011’. Guido says the chart’s “projections were based on policy forecasts that didn’t pan out”. I’ve not had the chance to properly scrutinise the claims of either James or Guido. I’ve also yet to see a direct link to the Taxpayers’ Alliance data, so that is equally in need of unpicking.
In this post, however, my point isn’t to do with the specific issue (or who is ‘right’) but rather how it can be presented in different ways, and the importance of having access to the raw data to ‘unspin’ it.
As many readers of this blog will have received a Kindle for Christmas I thought I should share my list of the free ebooks that I recommend stocking up on.
Online journalism and multimedia ebooks
Starting with more general books, Mark Briggs‘s book Journalism 2.0 (PDF*) is a few years old but still provides a good overview of online journalism to have by your side. Mindy McAdams‘s 42-page Reporter’s Guide to Multimedia Proficiency (PDF) adds some more on that front, and Adam Westbrook‘s Ideas on Digital Storytelling and Publishing (PDF) provides a larger focus on narrative, editing and other elements.
After the first version of this post, MA Online Journalism student Franzi Baehrle suggested this free book on DSLR Cinematography, as well as Adam Westbrook on multimedia production (PDF). And Guy Degen recommends the free ebook on news and documentary filmmaking from ImageJunkies.com.
The Participatory Documentary Cookbook [PDF] is another free resource on using social media in documentaries.
The Traffic Factories is an ebook that explores how a number of prominent US news organisations use metrics, and Chartbeat’s role in that. You can download it in mobi, PDF or epub format here.
Continuing the serialisation of the research underpinning a new Help Me Investigate project, in this fourth part I describe how one particular investigation took shape. Previous parts are linked below:
- Part 1: Investigative journalism; conceptualising Help Me Investigate
- Part 2: Building the site
- Part 3: Reflections on the Proof of Concept phase
Case study: the London Weekly investigation
In early 2010 Andy Brightwell and I conducted some research into one particular successful investigation on the site. The objective was to identify what had made the investigation successful – and how (or if) those conditions might be replicated for other investigations both on the site and elsewhere online.
The investigation chosen for the case study was ‘What do you know about The London Weekly?’ – an investigation into a free newspaper that was, the owners claimed (part of the investigation was to establish if the claim was a hoax), about to launch in London.
The people behind The London Weekly had made a number of claims about planned circulation, staffing and investment which went unchallenged in specialist media. Journalists Martin Stabe, James Ball and Judith Townend, however, wanted to dig deeper. So, after an exchange on Twitter, Judith logged onto Help Me Investigate and started an investigation.
A month later members of the investigation (most of whom were non-journalists) had unearthed a wealth of detail about the people behind The London Weekly and the facts behind their claims. Some of the information was reported in MediaWeek and The Guardian podcast Media Talk; some formed the basis for posts on James Ball’s blog, Journalism.co.uk and the Online Journalism Blog. Some has, for legal reasons, remained unpublished.
Andrew Brightwell conducted a number of semi-structured interviews with contributors to the investigation. The sample was randomly selected but representative of the mix of contributors, who were categorised as either ‘alpha’ contributors (over 6 contributions), ‘active’ (2-6 contributions) and ‘lurkers’ (whose only contribution was to join the investigation). These interviews formed the qualitative basis for the research.
Complementing this data was quantitative information about users of the site as a whole. This was taken from two user surveys – one conducted when the site was three months’ old and another at 12 months – and analysis of analytics taken from the investigation (such as numbers and types of actions, frequency, etc.)
As I begin on a new Help Me Investigate project, I thought it was a good time to share some research I conducted into the first year of the site, and the key factors in how that project tried to crowdsource investigative and watchdog journalism.
The findings of this research have been key to the development of this new project. They also form the basis of a chapter in the book Face The Future, and another due to be published in the Handbook of Online Journalism next year (not to be confused with my own Online Journalism Handbook). Here’s the report:
In both academic and mainstream literature about the world wide web, one theme consistently recurs: the lowering of the barrier allowing individuals to collaborate in pursuit of a common goal. Whether it is creating the world’s biggest encyclopedia (Lih, 2009), spreading news about a protest (Morozov, 2011) or tracking down a stolen phone (Shirky, 2008), the rise of the network has seen a decline in the role of the formal organisation, including news organisations.
Two examples of this phenomenon were identified while researching a book chapter on investigative journalism and blogs (De Burgh, 2008). The first was an experiment by The Florida News Press: when it started receiving calls from readers complaining about high water and sewage connection charges for newly constructed homes the newspaper, short on in-house resources to investigate the leads, decided to ask their readers to help. The result is by now familiar as a textbook example of “crowdsourcing” – outsourcing a project to ‘the crowd’ or what Brogan & Smith (2009, p136) describe as “the ability to have access to many people at a time and to have them perform one small task each”:
“Readers spontaneously organized their own investigations: Retired engineers analyzed blueprints, accountants pored over balance sheets, and an inside whistle-blower leaked documents showing evidence of bid-rigging.” (Howe, 2006a)
The second example concerned contaminated pet food in the US, and did not involve a mainstream news organisation. In fact, it was frustration with poor mainstream ‘churnalism’ (see Davies, 2009) that motivated bloggers and internet users to start digging into the story. The resulting output from dozens of blogs ranged from useful information for pet owners and the latest news to the compilation of a database that suggested the official numbers of pet deaths recorded by the US Food and Drug Administration was short by several thousand. One site, Itchmo.com, became so popular that it was banned in China, the source of the pet food in question.
What was striking about both examples was not simply that people could organise to produce investigative journalism, but that this practice of ‘crowdsourcing’ had two key qualities that were particularly relevant to journalism’s role in a democracy. The first was engagement: in the case of the News-Press for six weeks the story generated more traffic to its website than “ever before, excepting hurricanes” (Weise, 2007). Given that investigative journalism often concerns very ‘dry’ subject matter that has to be made appealing to a wider audience, these figures were surprising – and encouraging for publishers.
The second quality was subject: the contaminated pet food story was, in terms of mainstream news values, unfashionable and unjustifiable in terms of investment of resources. It appeared that the crowdsourcing model of investigation might provide a way to investigate stories which were in the public interest but which commercial and public service news organisations would not consider worth their time. More broadly, research on crowdsourcing more generally suggested that it worked “best in areas that are not core to your product or central to your business model” (Tapscott and Williams, 2006, p82).
Investigative journalism: its history and discourses
DeBurgh (2008, p10) defines investigative journalism as “distinct from apparently similar work [of discovering truth and identifying lapses from it] done by police, lawyers and auditors and regulatory bodies in that it is not limited as to target, not legally founded and usually earns money for media publishers.” The term is notoriously problematic and contested: some argue that all journalism is investigative, or that the recent popularity of the term indicates the failure of ‘normal’ journalism to maintain investigative standards. This contestation is a symptom of the various factors underlying the growth of the genre, which range from journalists’ own sense of a democratic role, to professional ambition and publishers’ commercial and marketing objectives.
More recently investigative journalism has been used to defend traditional print journalism against online publishing, with publishers arguing that true investigative journalism cannot be maintained without the resources of a print operation. This position has become harder to defend as online-only operations and journalists have won increasing numbers of awards for their investigative work – Clare Sambrook in the UK and VoiceOfSanDiego.com and Talking Points Memo in the US are three examples – while new organisations have been established to pursue investigations without any associated print operation including Canada’s OpenFile; the UK’s Bureau of Investigative Journalism and a number of bodies in the US such as ProPublica, The Florida Center for Investigative Reporting, and the Huffington Post’s investigative unit.
In addition, computer technology has started to play an increasingly important role in print investigative journalism: Stephen Grey’s investigation into the CIA’s ‘extraordinary rendition’ programme (Grey, 2006) was facilitated by the use of software such as Analyst’s Notebook, which allowed him to analyse large amounts of flight data and identify leads. The Telegraph’s investigation into MPs’ expenses was made possible by digitisation of data and the ability to store large amounts on a small memory stick. And newspapers around the world collaborated with the Wikileaks website to analyse ‘warlogs’ from Iraq and Afghanistan, and hundreds of thousands of diplomatic cables. More broadly the success of Wikipedia inspired a raft of examples of ‘Wiki journalism’ where users were invited to contribute to editorial coverage of a particular issue or field, with varying degrees of success.
Meanwhile, investigative journalists such as The Guardian’s Paul Lewis have been exploring a more informal form of crowdsourcing, working with online communities to break stories including the role of police in the death of newspaper vendor Ian Tomlinson; the existence of undercover agents in the environmental protest movement; and the death of a man being deported to Angola (Belam, 2011b).
This is part of a broader move to networked journalism explored by Charlie Beckett (2008):
“In a world of ever-increasing media manipulation by government and business, it is even more important for investigative journalists to use technology and connectivity to reveal hidden truths. Networked journalists are open, interactive and share the process. Instead of gatekeepers they are facilitators: the public become co-producers. Networked journalists “are ‘medium agnostic’ and ‘story-centric’”. The process is faster and the information sticks around longer.” (2008, p147)
As one of its best-known practitioners Paul Lewis talks particularly of the role of technology in his investigations – specifically Twitter – but also the importance of the crowd itself and journalistic method:
“A crucial factor that makes crowd-sourcing a success [was that] there was a reason for people to help, in this case a perceived sense of injustice and that the official version of events did not tally with the truth. Six days after Tomlinson’s death, Paul had twenty reliable witnesses who could be placed on a map at the time of the incident – and only one of them had come from the traditional journalistic tool of a contact number in his notebook.” (Belam, 2011b)
A further key skill identified by Lewis is listening to the crowd – although he sounds a note of caution in its vulnerability to deliberately placed misinformation, and the need for verification.
“Crowd-sourcing doesn’t always work […] The most common thing is that you try, and you don’t find the information you want […] The pattern of movement of information on the internet is something journalists need to get their heads around. Individuals on the web in a crowd seem to behave like a flock of starlings – and you can’t control their direction.” (Belam, 2011b)
Conceptualising Help Me Investigate
The first plans for Help Me Investigate were made in 2008 and were further developed over the next 18 months. They built on research into crowdsourced investigative journalism, as well as other research into online journalism and community management. In particular the project sought to explore concepts of “P2P journalism” which enables “more engaged interaction between and amongst users” (Bruns, 2005, p120, emphasis in original) and of “produsage”, whose affordances included probabilistic problem solving, granular tasks, equipotentiality, and shared content (Bruns, 2008, p19).
A key feature in this was the ownership of the news agenda by users themselves (who could be either members of the public or journalists). This was partly for reasons identified above in research into the crowdsourced investigation into contaminated pet food. It would allow the site to identify questions that would not be considered viable for investigation within a traditional newsroom; but the feature was also implemented because ‘ownership’ was a key area of contestation identified within crowdsourcing research (Lih, 2009; Benkler, 2006; Surowiecki, 2005) – ‘outsourcing’ a project to a group of people raises obvious issues regarding claims of authorship, direction and benefits (Bruns, 2005).
These issues were considered carefully by the founders. The site adopted a user interface with three main modes of navigation for investigations: most-recent-top; most popular (those investigations with the most members); and two ‘featured’ investigations chosen by site staff: these were chosen on the basis that they were the most interesting editorially, or because they were attracting particular interest and activity from users at that moment. There was therefore an editorial role, but this was limited to only two of the 18 investigations listed on the ‘Investigations’ page, and was at least partly guided by user activity.
In addition there were further pages where users could explore investigations through different criteria such as those investigations that had been completed, or those investigations with particular tags (e.g. ‘environment’, ‘Bristol’, ‘FOI’, etc.).
A second feature of the site was that ‘journalism’ was intended to be a by-product: the investigation process itself was the primary objective, which would inform users, as research suggested that if users were to be attracted to the site, it must perform the function that they needed it to (Porter, 2008), which was – as became apparent – one of project management. The ‘problem’ that the site was attempting to ‘solve’ needed to be user-centric rather than publisher-centric: ‘telling stories’ would clearly be lower down the priority list for users than it was for journalists and publishers. Of higher priority were the need to break down a question into manageable pieces; find others to investigate those with; and get answers. This was eventually summarised in the strapline to the site: “Connect, mobilise, uncover”.
Thirdly, there was a decision to use ‘game mechanics’ that would make the process of investigation inherently rewarding. As the site and its users grew, the interface was changed so that challenges started on the left hand side of the screen, coloured red, then moved to the middle when accepted (the colour changing to amber), and finally to the right column when complete (now with green border and tick icon). This made it easier to see at a glance what needed doing and what had been achieved, and also introduced a level of innate satisfaction in the task. Users, the idea went, might grow to like to feeling of moving those little blocks across the screen, and the positive feedback (see Graham, 2010 and Dondlinger, 2007) provided by the interface.
Similar techniques were coincidentally explored at the same time by The Guardian’s MPs’ expenses app (Bradshaw, 2009). This provided an interface for users to investigate MP expense claim forms that used many conventions of game design, including a ‘progress bar’, leaderboards, and button-based interfaces. A second iteration of the app – created when a second batch of claim forms were released – saw a redesigned interface based on a stronger emphasis on positive feedback. As developer Martin Belam explains (2011a):
“When a second batch of documents were released, the team working on the app broke them down into much smaller assignments. That meant it was easier for a small contribution to push the totals along, and we didn’t get bogged down with the inertia of visibly seeing that there was a lot of documents still to process.
“By breaking it down into those smaller tasks, and staggering their start time, you concentrated all of the people taking part on one goal at a time. They could therefore see the progress dial for that individual goal move much faster than if you only showed the progress across the whole set of documents.”
These game mechanics are not limited to games: many social networking sites have borrowed the conventions to provide similar positive feedback to users. Jon Hickman (2010, p2) describes how Help Me Investigate uses these genre codes and conventions:
“In the same way that Twitter records numbers of “followers”, “tweets”, “following” and “listed”, Help Me Investigate records the number of “things” which the user is currently involved in investigating, plus the number of “challenges”, “updates” and “completed investigations” they have to their credit. In both Twitter and Help Me Investigate these labels have a mechanistic function: they act as hyperlinks to more information related to the user’s profile. They can also be considered culturally as symbolic references to the user’s social value to the network – they give a number and weight to the level of activity the user has achieved, and so can be used in informal ranking of the user’s worth, importance and usefulness within the network.” (2010, p8)
This was indeed the aim of the site design, and was related to a further aim of the site: to allow users to build ‘social capital’ within and through the site: users could add links to web presences and Twitter accounts, as well as add biographies and ‘tag’ themselves. They were also ranked in a ‘Most active’ table; and each investigation had its own graph of user activity. This meant that users might use the site not simply for information-gathering reasons, but also for reputation building ones, a characteristic of open source communities identified by Bruns (2005) and Leadbeater (2008) among others.
There were plans to take these ideas much further which were shelved during the proof of concept phase as the team concentrated on core functionality. For example, it was clear that users needed to be able to give other users praise for positive contributions, and they used the ‘update feature’ to do so. A more intuitive function allowing users to give a ‘thumbs up’ to a contribution would have made this easier, and also provided a way to establish the reputation of individual users, and encourage further use.
Another feature of the site’s construction was a networked rather than centralised design. The bid document to 4iP proposed to aggregate users’ material:
“via RSS and providing support to get users onto use web-based services. While the technology will facilitate community creation around investigations, the core strategy will be community-driven, ‘recruiting’ and supporting alpha users who can drive the site and community forward.”
Again, this aggregation functionality was dropped as part of focusing the initial version of the site. However, the basic principle of working within a network was retained, with many investigations including a challenge to blog about progress on other sites, or use external social networks to find possible contributors. The site included guidance on using tools elsewhere on the web, and many investigations linked to users’ blog posts.
Earlier this year I was asked to write a chapter for a book on the future of investigative journalism – ‘Investigative Journalism: Dead Or Alive?‘. I’m reproducing it here. The chapter was originally published on my Facebook page. An open event around the book’s launch, with a panel discussion, is being held at the Frontline Club next month.
We may finally be moving past the troubled youth of the internet as a medium for investigative journalism. For more than a decade observers looked at this ungainly form stumbling its way around journalism, and said: “It will never be able to do this properly.”
They had short memories, of course. Television was an equally awkward child: the first news broadcast was simply a radio bulletin on a black screen, and for decades print journalists sneered at the idea that this fleeting, image-obsessed medium could ever do justice to investigative journalism. But it did. And it did it superbly, finding a new way to engage people with the dry, with the political, and the complex.
There have been quite a few scraping-related stories that I’ve been meaning to blog about – so many I’ve decided to write a round up instead. It demonstrates just the increasing role that scraping is playing in journalism – and the possibilities for those who don’t know them:
Scraping company information
Chris Taggart explains how he built a database of corporations which will be particularly useful to journalists and anyone looking at public spending:
“Let’s have a look at one we did earlier: the Isle of Man (there’s also one for Gibraltar, Ireland, and in the US, the District of Columbia) … In the space of a couple of hours not only have we liberated the data, but both the code and the data are there for anyone else to use too, as well as being imported in OpenCorporates.”
OpenCorporates are also offering a bounty for programmers who can scrape company information from other jurisdictions.