Category Archives: twitter

What you need to know about the laws on harassment, data protection and hate speech {UPDATED: Stalking added}

The following is taken from the law chapter of The Online Journalism Handbook. The book blog and Facebook page contain updates and additions – those specifically on law can be found here.

Harassment

The Protection From Harrassment Act 1997 is occasionally used to prevent journalists on reporting on particular individuals. Specifically, any conduct which amounts to harassment of someone can be considered to a criminal act, for which the victim can seek an injunction (followed by arrest if broken) or damages.

One example of a blogger’s experience is illustrative of the way the act can be used with regard to online journalism, even if no case reaches court. Continue reading

Generation AudioBoo: how journalism students are interacting online

This post is by Judith Townend (@jtownend).

The journalism class of 2012 has a pretty enviable opportunity to get their stuff out there; the development of online platforms like Twitter, Google+, Storify, Tumblr, Posterous, AudioBoo, Pinterest, Facebook, Instagram, CoverItLive and Vimeo allows piecemeal dissemination of content to relevant and engaged audiences, without necessarily needing to set up a specific site.

Free technology allows them to find and do journalism outside journalism, in productive and creative ways. To adapt David Carr’s description of Brian Stelter, his browser tab-flicking colleague at the New York Times, we’re seeing the rise of the ‘robots in the basement‘. Continue reading

The New Online Journalists #12: Michael Greenfield

Michael Greenfield

As part of an ongoing series on recent graduates who have gone into online journalism, Michael Greenfield talks about how he won a job as a Sky News Graduate Trainee, the different roles he’s experiencing across the organisation, and how he sees his career developing as the industry changes.

I’m on a 2-year rotational contract, meaning that every 10 weeks or so I move onto a different position and am trained up in that role. By the end of the scheme I should have a thorough overview of what Sky News does across all platforms, in both input and output.

Much of what I do is ‘on the job’ training, so I am fully immersed in that particular role and quickly pick up the skills along the way. For me it’s by far the best way of learning and getting the job done.

So far I’ve worked as a Researcher on the Planning Desk, a role which takes instructions and ideas from editorial meetings and sets about practically making them happen in advance so that we effectively cover a story.

This involves finding the right experts, case studies and locations to film, arranging interviews and logistically making sure that we will have reporters and crews in the right places.

Currently I’m training as a Field Producer, so I am out on the road either getting pre-recorded material or at live news events making sure, above all, that we get the shot. I am in constant communication with the reporter, crew and news desk so that all sides know what is needed and what is happening on the ground. Tweeting is now a big part of the role, for instance I have been providing live updates from the Leveson Inquiry.

What factors helped you land the job?

I was offered an interview after I was recommended to Sky News by someone I was doing freelance work for.

The main factors that helped me get to that point were:

  • having a Broadcast Journalism MA from City University London;
  • having a substantial amount of work experience in the industry;
  • going straight into work wherever I could get it straight off the back of my MA;
  • and applying myself as best I could when given the chance of bits of freelance work.

The whole process proved to me that you really don’t know how things will fall so you just have to get yourself out there.

Where do you see your career developing?

Well the scheme finishes at the end of August 2013 and I’m hoping that I will continue to work at Sky News. They are the pioneers in news coverage – they were the first UK news broadcaster to go HD, their iPad app has been awarded for it’s innovation and they are constantly looking to embrace new ideas and different approaches to how we see news.

I see my career and its relative success revolving around my ability to be a multi-platform journalist. The notion of TV, radio and online journalism being mutually exclusive is becoming increasingly outdated, and so I must strive to be a good journalist across all multi-media platforms.

Audiences expect news in many different formats now, so the more skilled I am at delivering the story through pictures, audio, online copy and social media outlets, the better I will be able to serve a public hungry for information.

I am keen to stress, however, that despite all the technological change, I will stick to the core principles of journalism that I have been taught and now exercise every day.

Are Sky and BBC leaving the field open to Twitter competitors?

At first glance, Sky’s decision that its journalists should not retweet information that has “not been through the Sky News editorial process” and the BBC’s policy to prioritise filing “written copy into our newsroom as quickly as possible” seem logical.

For Sky it is about maintaining editorial control over all content produced by its staff. For the BBC, it seems to be about making sure that the newsroom, and by extension the wider organisation, takes priority over the individual.

But there are also blind spots in these strategies that they may come to regret.

Our content?

The Sky policy articulates an assumption about ‘content’ that’s worth picking apart.

We accept as journalists that what we produce is our responsibility. When it comes to retweeting, however, it’s not entirely clear what we are doing. Is that news production, in the same way that quoting a source is? Is it newsgathering, in the same way that you might repeat a lead to someone to find out their reaction? Or is it merely distribution?

The answer, as I’ve written before, is that retweeting can be, and often is, all three.

Writing about a similar policy at the Oregonian late last year, Steve Buttry made the point that retweets are not endorsements. Jeff Jarvis argued that they were “quotes”.

I don’t think it’s as simple as that (as I explain below), but I do think it’s illustrative: if Sky News were to prevent journalists from using any quote on air or online where they could not verify its factual basis, then nothing would get broadcast. Live interviews would be impossible.

The Sky policy, then, seems to treat retweets as pure distribution, and – crucially – to treat the tweet in isolation. Not as a quote, but as a story, consisting entirely of someone else’s content, which has not been through Sky editorial processes but which is branded or endorsed as Sky journalism.

There’s a lot to admire in the pride in their journalism that this shows – indeed, I would like to see the same rigour applied to the countless quotes that are printed and broadcast by all media without being compared with any evidence.
But do users really see retweets in the same way? And if they do, will they always do so?

Curation vs creation

There’s a second issue here which is more about hard commercial success. Research suggests that successful users of Twitter tend to combine curation with creation. Preventing journalists from retweeting  leaves them – and their employers – without a vital tool in their storytelling and distribution.

The tension surrounding retweeting can be illustrated in the difference between two broadcast journalists who use Twitter particularly effectively: Sky’s own Neal Mann, and NPR’s Andy Carvin. Andy retweets habitually as a way of seeking further information. Neal, as he explained in this Q&A with one of my classes, feels that he has a responsibility not to retweet information he cannot verify (from 2 mins in).

Both approaches have their advantages and disadvantages. But both combine curation with creation.

Network effects

A third issue that strikes me is how these policies fit uncomfortably alongside the networked ways that news is experienced now.

The BBC policy, for example, appears at first glance to prevent journalists from diving right into the story as it develops online. Social media editor Chris Hamilton does note, importantly, that they have “a technology that allows our journalists to transmit text simultaneously to our newsroom systems and to their own Twitter accounts”. However, this is coupled with the position that:

“Our first priority remains ensuring that important information reaches BBC colleagues, and thus all our audiences, as quickly as possible – and certainly not after it reaches Twitter.”

This is an interesting line of argument, and there are a number of competing priorities underlying it that I want to understand more clearly.

Firstly, it implies a separation of newsroom systems and Twitter. If newsroom staff are not following their own journalists on Twitter as part of their systems, why not? Sky pioneered the use of Twitter as an internal newswire, and the man responsible, Julian March, is now doing something similar at ITV. The connection between internal systems and Twitter is notable.

Then there’s that focus on “all our audiences” in opposition to those early adopter Twitter types. If news is “breaking news, an exclusive or any kind of urgent update”, being first on Twitter can give you strategic advantages that waiting for the six o’clock – or even typing a report that’s over 140 characters – won’t. For example:

  • Building a buzz (driving people to watch, listen to or search for the fuller story)
  • Establishing authority on Google (which ranks first reports over later ones)
  • Establishing the traditional authority in being known as the first to break the story
  • Making it easier for people on the scene to get in touch (if someone’s just experienced a newsworthy event or heard about it from someone who was, how likely is it that they search Twitter to see who else was there? You want to be the journalist they find and contact)

“When the technology [to inform the newsroom and generate a tweet at the same time] isn’t available, for whatever reason, we’re asking them to prioritise telling the newsroom before sending a tweet.

“We’re talking a difference of a few seconds. In some situations.

“And we’re talking current guidance, not tablets of stone. This is a landscape that’s moving incredibly quickly, inside and outside newsrooms, and the guidance will evolve as quickly.”

Everything at the same time

There’s another side to this, which is evidence of news organisations taking a strategic decision that, in a world of information overload, they should stop trying to be the first (an increasingly hard task), and instead seek to be more authoritative. To be able to say, confidently, “Every atom we distribute is confirmed”, or “We held back to do this spectacularly as a team”.

There’s value in that, and a lot to be admired. I’m not saying that these policies are inherently wrong. I don’t know the full thinking that went into them, or the subtleties of their implementation (as Rory Cellan-Jones illustrates in his example, which contrasts with what can actually happen). I don’t think there is a right and a wrong way to ‘do Twitter’. Every decision is a trade off, because so many factors are in play. I just wanted to explore some of those factors here.

As soon as you digitise information you remove the physical limitations that necessitated the traditional distinctions between the editorial processes of newsgathering, production, editing and distribution.

A single tweet can be doing all at the same time. Social media policies need to recognise this, and journalists need to be trained to understand the subtleties too.

Leveson: the Internet Pops In

The following post was originally published by Gary Herman on the NUJ New Media blog. It’s reproduced here with permission.

Here at Newmedia Towers we are being swamped by events which at long last are demonstrating that the internet is really rather relevant to the whole debate about media ethics and privacy. So this is by way of a short and somewhat belated survey of the news tsunami – Google, Leveson, Twitter, ACTA, the EU and more.

When Camilla Wright, founder of celebrity gossip site Popbitch (which some years ago broke the news of Victoria Beckham’s pregnancy possibly before she even knew about it), testified before Leveson last week (26 January 2012) [Guardian liveblog; Wright’s official written statement (PDF)] the world found out (if it could be bothered) how Popbitch is used by newspaper hacks to plant stories so that they can then be said to have appeared on the internet. Anyone remember the Drudge report, over a decade ago?

Wright, of course, made a somewhat lame excuse that Popbitch is a counterweight to gossip magazines which are full of stories placed by the PR industry.

But most interesting is the fact that Wright claimed that Popbitch is self-regulated and that it works.

Leveson pronounced that he is not sure there is ‘so much of a difference’ between what Popbitch does and what newspapers do – which is somehow off the point. Popbitch – like other websites – has a global reach by definition and Wright told the Inquiry that Popbitch tries to comply with local laws wherever it was available – claims also made more publicly by Google and Yahoo! when they have in the past given in to Chinese pressure to release data that actually or potentially incriminated users and, more recently, by Twitter when it announced its intention to regulate tweets on a country-by-country basis.

Trivia – like the stuff Popbitch trades – aside, the problem is real. A global medium will cross many jurisdictions and be accessible within many different cultures. What one country welcomes, another may ban. And who should judge the merits of each?

Confusing the internet with its applications

The Arab Spring showed us that social media – like mobile phones, CB radios, fly-posted silkscreen prints, cheap offset litho leaflets and political ballads before them – have the power to mobilise and focus dissent. Twitter’s announcement should have been expected – after all, tweeting was never intended to be part of the revolutionaries’ tool-kit.

There are already alternatives to Twitter – Vibe, Futubra, Plurk, Easy Chirp and Blackberry Messenger, of course – and the technology itself will not be restrained by the need to expand into new markets. People confuse the internet with its applications – a mistake often made by those authorities who seek to impose a duty to police content on those who convey it.

Missing the point again, Leveson asked whether it would be useful to have an external ombudsman to advise Popbitch on stories and observed that a common set of standards across newspapers and websites might also help.

While not dismissing the idea, Wright made the point that the internet made it easy for publications to bypass UK regulators.

This takes us right into the territory of Google, Facebook and the various attempts by US and international authorities to introduce regulation and impose duties on websites themselves to police them.

ACTA, SOPA and PIPA

The latest example is the Anti-Counterfeit Trade Agreement (ACTA) – a shadowy international treaty which, according to Google’s legal director, Daphne Keller, speaking over a year ago, has ‘metastasized’ from a proposal on border security and counterfeit goods to an international legal framework covering copyright and the internet.

According to a draft of ACTA, released for public scrutiny after pressure from the European Union, internet providers who disable access to pirated material and adopt a policy to counter unauthorized ‘transmission of materials protected by copyright’ will be protected against legal action.

Fair use rights would not be guaranteed under the terms of the agreement.

Many civil liberty groups have protested the process by which ACTA has been drafted as anti-democratic and ACTA’s provisions as draconian.

Google’s Keller described ACTA as looking ‘a lot like cultural imperialism’.

Google later became active in the successful fight against the US Stop Online Piracy Act (SOPA) and the related Protect Intellectual Proerty Act (PIPA), which contained similar provisions to ACTA.

Google has been remarkably quite on the Megaupload case, however. This saw the US take extraterritorial action against a Hong Kong-based company operating a number of websites accused of copyright infringement.

The arrest of all Megaupload’s executives and the closure of its sites may have the effect of erasing perfectly legitimate and legal data held on the company’s servers – something which would on the face of it be an infringement of the rights of Megaupload users who own the data.

Privacy

Meanwhile, Google – in its growing battle with Facebook – has announced its intention to introduce a single privacy regime for 60 or so of its websites and services which will allow the company to aggregate all the data on individual users the better to serve ads.

Facebook already does something similar, although the scope of its services is much, much narrower than Google’s.

Privacy is at the heart of the current action against Google by Max Mosley, who wants the company to take down all links to external websites from its search results if those sites cover the events at the heart of his successful libel suit against News International.

Mosley is suing Google in the UK, France and Germany, and Daphne Keller popped up at the Leveson Inquiry, together with David-John Collins, head of corporate communications and public affairs for Google UK, to answer questions about the company’s policies on regulation and privacy.

Once again, the argument regarding different jurisdictions and the difficulty of implementing a global policy was raised by Keller and Collins.

Asked about an on-the-record comment by former Google chief executive, Eric Schmidt, that ‘only miscreants worry about net privacy’, Collins responded that the comment was not representative of Google’s policy on privacy, which it takes ‘extremely seriously’.

There is, of course, an interesting disjuncture between Google’s theoretical view of privacy and its treatment of its users. When it comes to examples like Max Mosley, Google pointed out – quite properly – that it can’t police the internet, that it does operate across jurisdictions and that it does ensure that there are comprehensive if somewhat esoteric mechanisms for removing private data and links from the Google listings and caches.

Yet it argues that, if individuals choose to use Google, whatever data they volunteer to the company is fair game for Google – even where that data involves third persons who may not have assented to their details being known or when, as happened during the process of building Google’s StreetView application, the company collected private data from domestic wi-fi routers without the consent or knowledge of the householders.

Keller and Collins brought their double-act to the UK parliament a few days later when they appeared before the joint committee on privacy and injunctions, chaired by John Whittingdale MP.

When asked why Google did not simply ‘find and destroy’ all instances of the images and video that Max Mosley objected to, they repeated their common mantras – Google is not the internet, and neither can nor should control the websites its search results list.

Accused by committee member Lord MacWhinney of ‘ducking and diving’ and of former culture minister, Ben Bradshaw of being ‘totally unconvincing’, Keller noted that Google could in theory police the sites it indexed, but that ‘doing so is a bad idea’.

No apparatus disinterested and qualified enough

That seems indisputable – regulating the internet should not be the job of providers like Google, Facebook or Twitter. On the contrary, the providers are the ones to be regulated, and this should be the job of legislatures equipped (unlike the Whittingdale committee) with the appropriate level of understanding and coordinated at a global level.

The internet requires global oversight – but we have no apparatus that is disinterested and qualified enough to do the job.

A new front has been opened in this battle by the latest draft rules on data protection issued by Viviane Reding’s Justice Directorate at the European Commission on 25 January.

Reding is no friend of Google or the big social networks and is keen to draw them into a framework of legislation that will – should the rules pass into national legislation – be coordinated at EU level.

Reding’s big ideas include a ‘right to be forgotten’ which will apply to online data only and an extension of the scope of personal data to cover a user’s IP address. Confidentiality should be built-in to online systems according to the new rules – an idea called ‘privacy by design’.

These ideas are already drawing flak from corporates like Google who point out that the ‘right to be forgotten’ is something that the company already upholds as far as the data it holds is concerned.

Reding’s draft rules includes an obligation by so-called ‘data controllers’ such as Google to notify third parties when someone wishes their data to be removed, so that links and copies can also be removed.

Not surprisingly, Google objects to this requirement which, if not exactly a demand to police the internet, is at least a demand to ‘help the police with their enquiries’.

The problem will not go away: how do you make sure that a global medium protects privacy, removes defamation and respects copyright while preserving its potential to empower the oppressed and support freedom of speech everywhere?

Answers on a postcard, please.

Twitter’s ‘censorship’ is nothing new – but it is different

Over the weekend thousands of Twitter users boycotted the service in protest at the announcement that the service will begin withholding tweets based on the demands of local governments and law enforcement.

Protesting against censorship is laudable, but it is worth pointing out that most online services already do the same, whether it’s Google’s Orkut; Apple removing apps from its store; or Facebook disabling protest groups.

Evgeny Morozov’s book The Net Delusion provides a good indicative list of examples:

“In the run-up to the Olympic torch relay passing through Hong Kong in 2008, [Facebook] shut down several groups, while many pro-Tibetan activists had their accounts deactivated for “persistent misuse of the site … Twitter has been accused of silencing online tribute to the 2008 Gaza War. Apple has been bashed for blocking Dalai Lama–related iPhone apps from its App Store for China … Google, which owns Orkut, a social network that is surprisingly popular in India, has been accused of being too zealous in removing potentially controversial content that may be interpreted as calling for religious and ethnic violence against both Hindus and Muslims.”

What’s notable about the Twitter announcement is that it suggests that censorship will be local rather than global, and transparent rather than secret. Techdirt have noted this, and Mireille Raad explains the distinction particularly well:

  • “Censorship is not silent and will not go un-noticed like most other censoring systems
  • The official twitter help center article includes the way to bypass it – simply – all you have to do is change your location to another country and overwrite the IP detection.
    Yes, that is all, and it is included in the help center
  • Quantity – can you imagine a govt trying to censor on a tweet by tweet basis a trending topic like Occupy or Egypt or Revolution – the amount of tweets can bring up the fail whale despite the genius twitter architecture , so imagine what is gonna happen to a paper work based system.
  • Speed – twitter, probably one of the fastest updating systems online –  and legislative bodies move at glaringly different speeds – It is impossible for a govt to be able to issue enough approval for a trending topic or anything with enough tweets/interest on.
  • Curiosity kills the cat  and with such an one-click-bypass process, most people will become interested in checking out that “blocked” content. People are willing to sit through endless hours of tech training and use shady services to access blocked content – so this is like doing them a service.”

I’m also reminded of Ethan Zuckerman’s ‘Cute Cats Theory’ of censorship and revolution, as explained by Cory Doctorow:

“When YouTube is taken off your nation’s internet, everyone notices, not just dissidents. So if a state shuts down a site dedicated to exposing official brutality, only the people who care about that sort of thing already are likely to notice.

“But when YouTube goes dark, all the people who want to look at cute cats discover that their favourite site is gone, and they start to ask their neighbours why, and they come to learn that there exists video evidence of official brutality so heinous and awful that the government has shut out all of YouTube in case the people see it.”

What Twitter have announced (and since clarified) perhaps makes this all-or-nothing censorship less likely, but it also adds to the ‘Don’t look at that!’ effect. The very act of censorship, online, can create a signal that is counter-productive. As journalists we should be more attuned to spotting those signals.

FAQ: Niche blogs vs mainstream media outlets

Here’s another collection of questions answered here to avoid duplication. This time from a final year student at UCLAN:

Blogs are often based on niche subject areas and created by individuals from a community. Do you think mainstream media outlets are limited by resources to compete? Or are there signs they are adapting?

I think they are more limited by passion, and by commercial imperatives. Niche blogs tend to be driven by passion initially, and sometimes by the commercial imperative to target those niches, whereas mainstream outlets are built on scale and mass audiences – or affluent audiences who still don’t really qualify as a niche.

They are adapting as the commercial drive changes and advertisers look for measurements of engagement, but it’s hard, as your next question fleshes out…

Communities by nature need conversation, and this often visible online in forums, blog comments etc. Can it be argued niche blogs are better at engaging communities and providing a platform for conversation?

…yes, but more because they often build those communities from the ground up, whereas established media platforms are having to start with a mass audience and carve niches out of those. It’s like trying to hold a community meeting in the middle of a busy high street, compared to doing it in a community centre.

… If so, do you think the success of blogs are as a result of people wanting conversation instead of a ‘lecture from journalists?

Not necessarily – I think blogs succeed (and fail) for all sorts of reasons. One of those is that blogs have made it easier to connect with likeminded people across the platform (in comments, for example, without having to fight through hundreds of comments from idiots), another is the ability for users to input into the journalistic process rather than merely consuming a story, and another is the ability to focus on elements of an issue which may not be accessible enough to justify coverage by a mass audience publication – and I’m sure there are as many other reasons as there are blogs.

Finally, with the emergence of Twitter, along with other methods of contact, are journalists now becoming more involved in conversation with communities of interest or is there still a reluctance from journalists to be involved?

Some recent research in the US suggested that Twitter is still being used overwhelmingly as a broadcast platform by journalists and news brands. But there are also an increasing number of journalists who are using it particularly effectively as a way to talk with users. My own research into blogging suggested a similar effect. So yes, there is reluctance (talking to sources is hard work, after all, whether it’s on Twitter, the phone, or face to face – and for many journalists it’s easier to avoid it) but the culture is changing slowly.

The strikes and the rise of the liveblog

Liveblogging the strikes: Twitter's #n30 stream

Liveblogging the strikes: Twitter's #n30 stream

Today sees the UK’s biggest strike in decades as public sector workers protest against pension reforms. Most news organisations are covering the day’s events through liveblogs: that web-native format which has so quickly become the automatic choice for covering rolling news.

To illustrate just how dominant the liveblog has become take a look at the BBCChannel 4 News, The Guardian’s ‘Strikesblog‘ or The TelegraphThe Independent’s coverage is hosted on their own live.independent.co.uk subdomain while Sky have embedded their liveblog in other articles. There’s even a separate Storify liveblog for The Guardian’s Local Government section, and on Radio 5 Live you can find an example of radio reporters liveblogging.

Regional newspapers such as the Chronicle in the north east and the Essex County Standard are liveblogging the local angle; while the Huffington Post liveblog the political face-off at Prime Minister’s Question Time and the PoliticsHome blog liveblogs both. Leeds Student are liveblogging too. And it’s not just news organisations: campaigning organisation UK Uncut have their own liveblog, as do the public sector workers union UNISON and Pensions Justice (on Tumblr).

So dominant so quickly

The format has become so dominant so quickly because it satisfies both editorial and commercial demands: liveblogs are sticky – people stick around on them much longer than on traditional articles, in the same way that they tend to leave the streams of information from Twitter or Facebook on in the background of their phone, tablet or PC – or indeed, the way that they leave on 24 hour television when there are big events.

It also allows print outlets to compete in the 24-hour environment of rolling news. The updates of the liveblog are equivalent to the ‘time-filling’ of 24-hour television, with this key difference: that updates no longer come from a handful of strategically-placed reporters, but rather (when done well) hundreds of eyewitnesses, stakeholders, experts, campaigners, reporters from other news outlets, and other participants.

The results (when done badly) can be more noise than signal – incoherent, disconnected, fragmented. When done well, however, a good liveblog can draw clarity out of confusion, chase rumours down to facts, and draw multiple threads into something resembling a canvas.

At this early stage liveblogging is still a form finding its feet. More static than broadcast, it does not require the same cycle of repetition; more dynamic than print, it does, however, demand regular summarising.

Most importantly, it takes place within a network. The audience are not sat on their couches watching a single piece of coverage; they may be clicking between a dozen different sources; they may be present at the event itself; they may have friends or family there, sending them updates from their phone. If they are hearing about something important that you’re not addressing, you have a problem.

The list of liveblogs above demonstrates this particularly well, and it doesn’t include the biggest liveblog of all: the #n30 thread on Twitter (and as Facebook users we might also be consuming a liveblog of sorts of our friends’ updates).

More than documenting

In this situation the journalist is needed less to document what is taking place, and more to build on the documentation that is already being done: by witnesses, and by other journalists. That might mean aggregating the most important updates, or providing analysis of what they mean. It might mean enriching content by adding audio, video, maps or photography. Most importantly, it may mean verifying accounts that hold particular significance.

Liveblogging: adding value to the network

Liveblogging: adding value to the network

These were the lessons that I sought to teach my class last week when I reconstructed an event in the class and asked them to liveblog it (more in a future blog post). Without any briefing, they made predictable (and planned) mistakes: they thought they were there purely to document the event.

But now, more than ever, journalists are not there solely to document.

On a day like today you do not need to be journalist to take part in the ‘liveblog’ of #n20. If you are passionate about current events, if you are curious about news, you can be out there getting experience in dealing with those events – not just reporting them, but speaking to the people involved, recording images and audio to enrich what is in front of you, creating maps and galleries and Storify threads to aggregate the most illuminating accounts. Seeking reaction and verification to the most challenging ones.

The story is already being told by hundreds of people, some better than others. It’s a chance to create good journalism, and be better at it. I hope every aspiring journalist takes it, and the next chance, and the next one.

Getting Started With Twitter Analysis in R

Earlier today, I saw a post vis the aggregating R-Bloggers service a post on Using Text Mining to Find Out What @RDataMining Tweets are About. The post provides a walktrhough of how to grab tweets into an R session using the twitteR library, and then do some text mining on it.

I’ve been meaning to have a look at pulling Twitter bits into R for some time, so I couldn’t but have a quick play…

Starting from @RDataMiner’s lead, here’s what I did… (Notes: I use R in an R-Studio context. If you follow through the example and a library appears to be missing, from the Packages tab search for the missing library and import it, then try to reload the library in the script. The # denotes a commented out line.)

require(twitteR)
#The original example used the twitteR library to pull in a user stream
#rdmTweets <- userTimeline("psychemedia", n=100)
#Instead, I'm going to pull in a search around a hashtag.
rdmTweets <- searchTwitter('#mozfest', n=500)
# Note that the Twitter search API only goes back 1500 tweets (I think?)

#Create a dataframe based around the results
df <- do.call("rbind", lapply(rdmTweets, as.data.frame))
#Here are the columns
names(df)
#And some example content
head(df,3)

So what can we do out of the can? One thing is look to see who was tweeting most in the sample we collected:

counts=table(df$screenName)
barplot(counts)

# Let's do something hacky:
# Limit the data set to show only folk who tweeted twice or more in the sample
cc=subset(counts,counts>1)
barplot(cc,las=2,cex.names =0.3)

Now let’s have a go at parsing some tweets, pulling out the names of folk who have been retweeted or who have had a tweet sent to them:

#Whilst tinkering, I came across some errors that seemed
# to be caused by unusual character sets
#Here's a hacky defence that seemed to work...
df$text=sapply(df$text,function(row) iconv(row,to='UTF-8'))

#A helper function to remove @ symbols from user names...
trim <- function (x) sub('@','',x)

#A couple of tweet parsing functions that add columns to the dataframe
#We'll be needing this, I think?
library(stringr)
#Pull out who a message is to
df$to=sapply(df$text,function(tweet) str_extract(tweet,"^(@[[:alnum:]_]*)"))
df$to=sapply(df$to,function(name) trim(name))

#And here's a way of grabbing who's been RT'd
df$rt=sapply(df$text,function(tweet) trim(str_match(tweet,"^RT (@[[:alnum:]_]*)")[2]))

So for example, now we can plot a chart showing how often a particular person was RT’d in our sample. Let’s use ggplot2 this time…

require(ggplot2)
ggplot()+geom_bar(aes(x=na.omit(df$rt)))+opts(axis.text.x=theme_text(angle=-90,size=6))+xlab(NULL)

Okay – enough for now… if you’re tempted to have a play yourself, please post any other avenues you explored with in a comment, or in your own post with a link in my comments;-)

Crowdsourcing investigative journalism: a case study (part 1)

As I begin on a new Help Me Investigate project, I thought it was a good time to share some research I conducted into the first year of the site, and the key factors in how that project tried to crowdsource investigative and watchdog journalism.

The findings of this research have been key to the development of this new project. They also form the basis of a chapter in the book Face The Future, and another due to be published in the Handbook of Online Journalism next year (not to be confused with my own Online Journalism Handbook). Here’s the report:

In both academic and mainstream literature about the world wide web, one theme consistently recurs: the lowering of the barrier allowing individuals to collaborate in pursuit of a common goal. Whether it is creating the world’s biggest encyclopedia (Lih, 2009), spreading news about a protest (Morozov, 2011) or tracking down a stolen phone (Shirky, 2008), the rise of the network has seen a decline in the role of the formal organisation, including news organisations.

Two examples of this phenomenon were identified while researching a book chapter on investigative journalism and blogs (De Burgh, 2008). The first was an experiment by The Florida News Press: when it started receiving calls from readers complaining about high water and sewage connection charges for newly constructed homes the newspaper, short on in-house resources to investigate the leads, decided to ask their readers to help. The result is by now familiar as a textbook example of “crowdsourcing” – outsourcing a project to ‘the crowd’ or what Brogan & Smith (2009, p136) describe as “the ability to have access to many people at a time and to have them perform one small task each”:

“Readers spontaneously organized their own investigations: Retired engineers analyzed blueprints, accountants pored over balance sheets, and an inside whistle-blower leaked documents showing evidence of bid-rigging.” (Howe, 2006a)

The second example concerned contaminated pet food in the US, and did not involve a mainstream news organisation. In fact, it was frustration with poor mainstream ‘churnalism’ (see Davies, 2009) that motivated bloggers and internet users to start digging into the story. The resulting output from dozens of blogs ranged from useful information for pet owners and the latest news to the compilation of a database that suggested the official numbers of pet deaths recorded by the US Food and Drug Administration was short by several thousand. One site, Itchmo.com, became so popular that it was banned in China, the source of the pet food in question.

What was striking about both examples was not simply that people could organise to produce investigative journalism, but that this practice of ‘crowdsourcing’ had two key qualities that were particularly relevant to journalism’s role in a democracy. The first was engagement: in the case of the News-Press for six weeks the story generated more traffic to its website than “ever before, excepting hurricanes” (Weise, 2007). Given that investigative journalism often concerns very ‘dry’ subject matter that has to be made appealing to a wider audience, these figures were surprising – and encouraging for publishers.

The second quality was subject: the contaminated pet food story was, in terms of mainstream news values, unfashionable and unjustifiable in terms of investment of resources. It appeared that the crowdsourcing model of investigation might provide a way to investigate stories which were in the public interest but which commercial and public service news organisations would not consider worth their time. More broadly, research on crowdsourcing more generally suggested that it worked “best in areas that are not core to your product or central to your business model” (Tapscott and Williams, 2006, p82).

Investigative journalism: its history and discourses

DeBurgh (2008, p10) defines investigative journalism as “distinct from apparently similar work [of discovering truth and identifying lapses from it] done by police, lawyers and auditors and regulatory bodies in that it is not limited as to target, not legally founded and usually earns money for media publishers.” The term is notoriously problematic and contested: some argue that all journalism is investigative, or that the recent popularity of the term indicates the failure of ‘normal’ journalism to maintain investigative standards. This contestation is a symptom of the various factors underlying the growth of the genre, which range from journalists’ own sense of a democratic role, to professional ambition and publishers’ commercial and marketing objectives.

More recently investigative journalism has been used to defend traditional print journalism against online publishing, with publishers arguing that true investigative journalism cannot be maintained without the resources of a print operation. This position has become harder to defend as online-only operations and journalists have won increasing numbers of awards for their investigative work – Clare Sambrook in the UK and VoiceOfSanDiego.com and Talking Points Memo in the US are three examples – while new organisations have been established to pursue investigations without any associated print operation including Canada’s OpenFile; the UK’s Bureau of Investigative Journalism and a number of bodies in the US such as ProPublica, The Florida Center for Investigative Reporting, and the Huffington Post’s investigative unit.

In addition, computer technology has started to play an increasingly important role in print investigative journalism: Stephen Grey’s investigation into the CIA’s ‘extraordinary rendition’ programme (Grey, 2006) was facilitated by the use of software such as Analyst’s Notebook, which allowed him to analyse large amounts of flight data and identify leads. The Telegraph’s investigation into MPs’ expenses was made possible by digitisation of data and the ability to store large amounts on a small memory stick. And newspapers around the world collaborated with the Wikileaks website to analyse ‘warlogs’ from Iraq and Afghanistan, and hundreds of thousands of diplomatic cables. More broadly the success of Wikipedia inspired a raft of examples of ‘Wiki journalism’ where users were invited to contribute to editorial coverage of a particular issue or field, with varying degrees of success.

Meanwhile, investigative journalists such as The Guardian’s Paul Lewis have been exploring a more informal form of crowdsourcing, working with online communities to break stories including the role of police in the death of newspaper vendor Ian Tomlinson; the existence of undercover agents in the environmental protest movement; and the death of a man being deported to Angola (Belam, 2011b).

This is part of a broader move to networked journalism explored by Charlie Beckett (2008):

“In a world of ever-increasing media manipulation by government and business, it is even more important for investigative journalists to use technology and connectivity to reveal hidden truths. Networked journalists are open, interactive and share the process. Instead of gatekeepers they are facilitators: the public become co-producers. Networked journalists “are ‘medium agnostic’ and ‘story-centric’”. The process is faster and the information sticks around longer.” (2008, p147)

As one of its best-known practitioners Paul Lewis talks particularly of the role of technology in his investigations – specifically Twitter – but also the importance of the crowd itself and journalistic method:

“A crucial factor that makes crowd-sourcing a success [was that] there was a reason for people to help, in this case a perceived sense of injustice and that the official version of events did not tally with the truth. Six days after Tomlinson’s death, Paul had twenty reliable witnesses who could be placed on a map at the time of the incident – and only one of them had come from the traditional journalistic tool of a contact number in his notebook.” (Belam, 2011b)

A further key skill identified by Lewis is listening to the crowd – although he sounds a note of caution in its vulnerability to deliberately placed misinformation, and the need for verification.

“Crowd-sourcing doesn’t always work […] The most common thing is that you try, and you don’t find the information you want […] The pattern of movement of information on the internet is something journalists need to get their heads around. Individuals on the web in a crowd seem to behave like a flock of starlings – and you can’t control their direction.” (Belam, 2011b)

Conceptualising Help Me Investigate

The first plans for Help Me Investigate were made in 2008 and were further developed over the next 18 months. They built on research into crowdsourced investigative journalism, as well as other research into online journalism and community management. In particular the project sought to explore concepts of “P2P journalism” which enables “more engaged interaction between and amongst users” (Bruns, 2005, p120, emphasis in original) and of “produsage”, whose affordances included probabilistic problem solving, granular tasks, equipotentiality, and shared content (Bruns, 2008, p19).

A key feature in this was the ownership of the news agenda by users themselves (who could be either members of the public or journalists). This was partly for reasons identified above in research into the crowdsourced investigation into contaminated pet food. It would allow the site to identify questions that would not be considered viable for investigation within a traditional newsroom; but the feature was also implemented because ‘ownership’ was a key area of contestation identified within crowdsourcing research (Lih, 2009; Benkler, 2006; Surowiecki, 2005) – ‘outsourcing’ a project to a group of people raises obvious issues regarding claims of authorship, direction and benefits (Bruns, 2005).

These issues were considered carefully by the founders. The site adopted a user interface with three main modes of navigation for investigations: most-recent-top; most popular (those investigations with the most members); and two ‘featured’ investigations chosen by site staff: these were chosen on the basis that they were the most interesting editorially, or because they were attracting particular interest and activity from users at that moment. There was therefore an editorial role, but this was limited to only two of the 18 investigations listed on the ‘Investigations’ page, and was at least partly guided by user activity.

In addition there were further pages where users could explore investigations through different criteria such as those investigations that had been completed, or those investigations with particular tags (e.g. ‘environment’, ‘Bristol’, ‘FOI’, etc.).

A second feature of the site was that ‘journalism’ was intended to be a by-product: the investigation process itself was the primary objective, which would inform users, as research suggested that if users were to be attracted to the site, it must perform the function that they needed it to (Porter, 2008), which was – as became apparent – one of project management. The ‘problem’ that the site was attempting to ‘solve’ needed to be user-centric rather than publisher-centric: ‘telling stories’ would clearly be lower down the priority list for users than it was for journalists and publishers. Of higher priority were the need to break down a question into manageable pieces; find others to investigate those with; and get answers. This was eventually summarised in the strapline to the site: “Connect, mobilise, uncover”.

Thirdly, there was a decision to use ‘game mechanics’ that would make the process of investigation inherently rewarding. As the site and its users grew, the interface was changed so that challenges started on the left hand side of the screen, coloured red, then moved to the middle when accepted (the colour changing to amber), and finally to the right column when complete (now with green border and tick icon). This made it easier to see at a glance what needed doing and what had been achieved, and also introduced a level of innate satisfaction in the task. Users, the idea went, might grow to like to feeling of moving those little blocks across the screen, and the positive feedback (see Graham, 2010 and Dondlinger, 2007) provided by the interface.

Similar techniques were coincidentally explored at the same time by The Guardian’s MPs’ expenses app (Bradshaw, 2009). This provided an interface for users to investigate MP expense claim forms that used many conventions of game design, including a ‘progress bar’, leaderboards, and button-based interfaces. A second iteration of the app – created when a second batch of claim forms were released – saw a redesigned interface based on a stronger emphasis on positive feedback. As developer Martin Belam explains (2011a):

“When a second batch of documents were released, the team working on the app broke them down into much smaller assignments. That meant it was easier for a small contribution to push the totals along, and we didn’t get bogged down with the inertia of visibly seeing that there was a lot of documents still to process.

“By breaking it down into those smaller tasks, and staggering their start time, you concentrated all of the people taking part on one goal at a time. They could therefore see the progress dial for that individual goal move much faster than if you only showed the progress across the whole set of documents.”

These game mechanics are not limited to games: many social networking sites have borrowed the conventions to provide similar positive feedback to users. Jon Hickman (2010, p2) describes how Help Me Investigate uses these genre codes and conventions:

“In the same way that Twitter records numbers of “followers”, “tweets”, “following” and “listed”, Help Me Investigate records the number of “things” which the user is currently involved in investigating, plus the number of “challenges”, “updates” and “completed investigations” they have to their credit. In both Twitter and Help Me Investigate these labels have a mechanistic function: they act as hyperlinks to more information related to the user’s profile. They can also be considered culturally as symbolic references to the user’s social value to the network – they give a number and weight to the level of activity the user has achieved, and so can be used in informal ranking of the user’s worth, importance and usefulness within the network.” (2010, p8)

This was indeed the aim of the site design, and was related to a further aim of the site: to allow users to build ‘social capital’ within and through the site: users could add links to web presences and Twitter accounts, as well as add biographies and ‘tag’ themselves. They were also ranked in a ‘Most active’ table; and each investigation had its own graph of user activity. This meant that users might use the site not simply for information-gathering reasons, but also for reputation building ones, a characteristic of open source communities identified by Bruns (2005) and Leadbeater (2008) among others.

There were plans to take these ideas much further which were shelved during the proof of concept phase as the team concentrated on core functionality. For example, it was clear that users needed to be able to give other users praise for positive contributions, and they used the ‘update feature’ to do so. A more intuitive function allowing users to give a ‘thumbs up’ to a contribution would have made this easier, and also provided a way to establish the reputation of individual users, and encourage further use.

Another feature of the site’s construction was a networked rather than centralised design. The bid document to 4iP proposed to aggregate users’ material:

“via RSS and providing support to get users onto use web-based services. While the technology will facilitate community creation around investigations, the core strategy will be community-driven, ‘recruiting’ and supporting alpha users who can drive the site and community forward.”

Again, this aggregation functionality was dropped as part of focusing the initial version of the site. However, the basic principle of working within a network was retained, with many investigations including a challenge to blog about progress on other sites, or use external social networks to find possible contributors. The site included guidance on using tools elsewhere on the web, and many investigations linked to users’ blog posts.

In the second part I discuss the building of the site and reflections on the site’s initial few months.