Author Archives: Paul Bradshaw

Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grabbed Using Google Refine

What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here’s a recipe for visualising that data…

After discovering via Martin Hawksey that the recent (December, 2011) 2.5 release of Google Refine allows you to import JSON and XML feeds to bootstrap a new project, I wondered whether it would be able to pull in data from the Facebook API if I was logged in to Facebook (Google Refine does run in the browser after all…)

Looking through the Facebook API documentation whilst logged in to Facebook, it’s easy enough to find exemplar links to things like your friends list (https://graph.facebook.com/me/friends?access_token=A_LONG_JUMBLE_OF_LETTERS) or the list of likes someone has made (https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS); replacing me with the Facebook ID of one of your friends should pull down a list of their friends, or likes, etc.

(Note that validity of the access token is time limited, so you can’t grab a copy of the access token and hope to use the same one day after day.)

Grabbing the link to your friends on Facebook is simply a case of opening a new project, choosing to get the data from a Web Address, and then pasting in the friends list URL:

Google Refine - import Facebook friends list

Click on next, and Google Refine will download the data, which you can then parse as a JSON file, and from which you can identify individual record types:

Google Refine - import Facebook friends

If you click the highlighted selection, you should see the data that will be used to create your project:

Google Refine - click to view the data

You can now click on Create Project to start working on the data – the first thing I do is tidy up the column names:

Google Refine - rename columns

We can now work some magic – such as pulling in the Likes our friends have made. To do this, we need to create the URL for each friend’s Likes using their Facebook ID, and then pull the data down. We can use Google Refine to harvest this data for us by creating a new column containing the data pulled in from a URL built around the value of each cell in another column:

Google Refine - new column from URL

The Likes URL has the form https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS which we’ll tinker with as follows:

Google Refine - crafting URLs for new column creation

The throttle control tells Refine how often to make each call. I set this to 500ms (that is, half a second), so it takes a few minutes to pull in my couple of hundred or so friends (I don’t use Facebook a lot;-). I’m not sure what limit the Facebook API is happy with (if you hit it too fast (i.e. set the throttle time too low), you may find the Facebook API stops returning data to you for a cooling down period…)?

Having imported the data, you should find a new column:

Google Refine - new data imported

At this point, it is possible to generate a new column from each of the records/Likes in the imported data… in theory (or maybe not..). I found this caused Refine to hang though, so instead I exprted the data using the default Templating… export format, which produces some sort of JSON output…

I then used this Python script to generate a two column data file where each row contained a (new) unique identifier for each friend and the name of one of their likes:

import simplejson,csv

writer=csv.writer(open('fbliketest.csv','wb+'),quoting=csv.QUOTE_ALL)

fn='my-fb-friends-likes.txt'

data = simplejson.load(open(fn,'r'))
id=0
for d in data['rows']:
	id=id+1
	#'interests' is the column name containing the Likes data
	interests=simplejson.loads(d['interests'])
	for i in interests['data']:
		print str(id),i['name'],i['category']
		writer.writerow([str(id),i['name'].encode('ascii','ignore')])

[I think this R script, in answer to a related @mhawksey Stack Overflow question, also does the trick: R: Building a list from matching values in a data.frame]

I could then import this data into Gephi and use it to generate a network diagram of what they commonly liked:

Sketching common likes amongst my facebook friends

Rather than returning Likes, I could equally have pulled back lists of the movies, music or books they like, their own friends lists (permissions settings allowing), etc etc, and then generated friends’ interest maps on that basis.

[See also: Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I and how to visualise Google+ networks]

PS dropping out of Google Refine and into a Python script is a bit clunky, I have to admit. What would be nice would be to be able to do something like a “create new rows with new column from column” pattern that would let you set up an iterator through the contents of each of the cells in the column you want to generate the new column from, and for each pass of the iterator: 1) duplicate the original data row to create a new row; 2) add a new column; 3) populate the cell with the contents of the current iteration state. Or something like that…

PPS Related to the PS request, there is a sort of related feature in the 2.5 release of Google Refine that lets you merge data from across rows with a common key into a newly shaped data set: Key/value Columnize. Seeing this, it got me wondering what a fusion of Google Refine and RStudio might be like (or even just R support within Google Refine?)

PPPS this could be interesting – looks like you can test to see if a friendship exists given two Facebook user IDs.

2011: the UK hyper-local year in review

In this guest post, Damian Radcliffe highlights some topline developments in the hyper-local space during 2011. He also asks for your suggestions of great hyper-local content from 2011. His more detailed slides looking at the previous year are cross-posted at the bottom of this article.

2011 was a busy year across the hyper-local sphere, with a flurry of activity online as well as more traditional platforms such as TV, Radio and newspapers.

The Government’s plans for Local TV have been considerably developed, following the Shott Review just over a year ago. We now have a clearer indication of the areas which will be first on the list for these new services and how Ofcom might award these licences. What we don’t know is who will apply for these licences, or what their business models will be. But, this should become clear in the second half of the year.

Whilst the Leveson Inquiry hasn’t directly been looking at local media, it has been a part of the debate. Claire Enders outlined some of the challenges facing the regional and local press in a presentation showing declining revenue, jobs and advertising over the past five years. Her research suggests that the impact of “the move to digital” has been greater at a local level than at the nationals.

Across the board, funding remains a challenge for many. But new models are emerging, with Daily Deals starting to form part of the revenue mix alongside money from foundations and franchising.

And on the content front, we saw Jeremy Hunt cite a number of hyper-local examples at the Oxford Media Convention, as well as record coverage for regional press and many hyper-local outlets as a result of the summer riots.

I’ve included more on all of these stories in my personal retrospective for the past year.

One area where I’d really welcome feedback is examples of hyper-local content you produced – or read – in 2011. I’m conscious that a lot of great material may not necessarily reach a wider audience, so do post your suggestions below and hopefully we can begin to redress that.

Mapping the New Year Honours List – Where Did the Honours Go?

When I get a chance, I’ll post a (not totally unsympathetic) response to Milo Yiannopoulos’post The pitiful cult of ‘data journalism’, but in the meantime, here’s a view over some data that was released a couple of days ago – a map of where the New Year Honours went [link]

New Year Honours map

[Hmm… so WordPress.com doesn’t seem to want to let me embed a Google Fusion Table map iframe, and Google Maps (which are embeddable) just shows an empty folder when I try to view the Fusion Table KML… (the Fusion Table export KML doesn’t seem to include lat/lng data either? Maybe I need to explore some hosting elsewhere this year…]

Note that I wouldn’t make the claim that this represents an example of data journalism. It’s a sketch map showing which parts of the country various recipients of honours this time round presumably live. Just by posting the map, I’m not reporting any particular story. Instead, I’m trying to find a way of looking at the day to see whether or not there may be any interesting stories that are suggested by viewing the data in this way.

There was a small element of work involved in generating the map view, though… Working backwards, when I used Google Fusion tables to geocode the locations of the honoured, some of the points were incorrectly located:

Google Fusion Tables - correcting fault geocoding

(It would be nice to be able to force a locale to the geocoder, maybe telling it to use maps.google.co.uk as the base, rather than (presumably) maps.google.com?)

The approach I took to tidying these was rather clunky, first going into the table view and filtering on the mispositioned locations:

Google Fusion Tables - correcting geocoding errors

Then correcting them:

Google Fusion Table, Correct Geocode errors

What would be really handy would be if Google Fusion Tables let you see a tabular view of data within a particular map view – so for example, if I could zoom in to the US map and then get a tabular view of the records displayed on that particular local map view… (If it does already support this and I just missed it, please let me know via the comments..;-)

So how did I get the data into Google Fusion Tables? The original data was posted as a PDF on the DirectGov website (New Year Honours List 2012 – in detail)…:

New Year Honours data

…so I used Scraperwiki to preview and read through the PDF and extract the honours list data (my scraper is a little clunky and doesnlt pull out 100% of the data, missing the occasional name and contribution details when it’s split over several lines; but I think it does a reasonable enough job for now, particularly as I am currently more interested in focussing on the possible high level process for extracting and manipulating the data, rather than the correctness of it…!;-)

Here’s the scraper (feel free to improve upon it….:-): Scraperwiki: New Year Honours 2012

I then did a little bit of tweaking in Google Refine, normalising some of the facets and crudely attempting to separate out each person’s role and the contribution for which the award was made.

For example, in the case of Dr Glenis Carole Basiro DAVEY, given column data of the form “The Open University, Science Faculty and Health Education and Training Programme, Africa. For services to Higher and Health Education.“, we can use the following expressions to generate new sub-columns:

value.match(/.*(For .*)/)[0] to pull out things like “For services to Higher and Health Education.”
value.match(/(.*)For .*/)[0] to pull out things like “The Open University, Science Faculty and Health Education and Training Programme, Africa.”

I also ran each person’s record through Reuters Open Calais service using Google Refine’s ability to augment data with data from a URL (“Add column by fetching URLs”), pulling the data back as JSON. Here’s the URL format I used (polling once every 500ms in order to stay with the max. 4 calls per limit threshold mandated by the API.)

"http://api.opencalais.com/enlighten/rest/?licenseID=<strong>MY_LICENSE_KEY</strong>&content=" + escape(value,'url') + "&paramsXML=%3Cc%3Aparams%20xmlns%3Ac%3D%22http%3A%2F%2Fs.opencalais.com%2F1%2Fpred%2F%22%20xmlns%3Ardf%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%3E%20%20%3Cc%3AprocessingDirectives%20c%3AcontentType%3D%22TEXT%2FRAW%22%20c%3AoutputFormat%3D%22Application%2FJSON%22%20%20%3E%20%20%3C%2Fc%3AprocessingDirectives%3E%20%20%3Cc%3AuserDirectives%3E%20%20%3C%2Fc%3AuserDirectives%3E%20%20%3Cc%3AexternalMetadata%3E%20%20%3C%2Fc%3AexternalMetadata%3E%20%20%3C%2Fc%3Aparams%3E"

Unpicking this a little:

licenseID is set to my license key value
content is the URL escaped version of the text I wanted to process (in this case, I created a new column from the name column that also pulled in data from a second column (the contribution column). The GREL formula I used to join the columns took the form: value+', '+cells["contribution"].value)
paramsXML is the URL encoded version of the following parameters, which set the content encoding for the result to be JSON (the default is XML):

<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<c:processingDirectives c:contentType="TEXT/RAW" c:outputFormat="Application/JSON"  >
</c:processingDirectives>
<c:userDirectives>
</c:userDirectives>
<c:externalMetadata>
</c:externalMetadata>
</c:params>

So much for process – now where are the stories? That’s left, for now, as an exercise for the reader. An obvious starting point is just to see who received honours in your locale. Remember, Google Fusion Tables lets you generate all sorts of filtered views, so it’s not too hard to map where the MBEs vs OBEs are based, for example, or have a stab at where awards relating to services to Higher Education went. Some awards also have a high correspondence with a particular location, as for example in the case of Enfield…

If you do generate any interesting views from the New Year Honours 2012 Fusion Table, please post a link in the comments. And if you find a problem with/fix for the data or the scraper, please post that info in a comment too:-)

2 guest posts: 2012 predictions and “Social media and the evolution of the fourth estate”

Memeburn logo

I’ve written a couple of guest posts for Nieman Journalism Lab and the tech news site Memeburn. The Nieman post is part of a series looking forward to 2012. I’m never a fan of futurology so I’ve cheated a little and talked about developments already in progress: new interface conventions in news websites; the rise of collaboration; and the skilling up of journalists in data.

Memeburn asked me a few months ago to write about social media’s impact on journalism’s role as the Fourth Estate, and it took me until this month to find the time to do so. Here’s the salient passage:

“But the power of the former audience is a power that needs to be held to account too, and the rise of liveblogging is teaching reporters how to do that: reacting not just to events on the ground, but the reporting of those events by the people taking part: demonstrators and police, parents and politicians all publishing their own version of events — leaving journalists to go beyond documenting what is happening, and instead confirming or debunking the rumours surrounding that.

“So the role of journalist is moving away from that of gatekeeper and — as Axel Bruns argues — towards that of gatewatcher: amplifying the voices that need to be heard, factchecking the MPs whose blogs are 70% fiction or the Facebook users scaremongering about paedophiles.

“But while we are still adapting to this power shift, we should also recognise that that power is still being fiercely fought-over. Old laws are being used in new waysnew laws are being proposed to reaffirm previous relationships. Some of these may benefit journalists — but ultimately not journalism, nor its fourth estate role. The journalists most keenly aware of this — Heather Brooke in her pursuit of freedom of information; Charles Arthur in his campaign to ‘Free Our Data’ — recognise that journalists’ biggest role as part of the fourth estate may well be to ensure that everyone has access to information that is of public interest, that we are free to discuss it and what it means, and that — in the words of Eric S. Raymond — “Given enough eyeballs, all bugs are shallow“.”

Comments, as always, very welcome.

A Quick Peek at Three Content Analysis Services

A long, long time ago, I tinkered with a hack called Serendipitwitterous (long since rotted, I suspect), that would look through a Twitter stream (personal feed, or hashtagged tweets), use the Yahoo term extraction service to try to identify concepts or key words/phrases in each tweet, and then use these as a search term on Slideshare, Youtube and so on to find content that may or may not be loosely related to each tweet.

The Yahoo Term Extraction is still hanging in there – just – but I think it finally gets deprecated early next year. From my feeds today, however, it seems there may be a replacement in the form of a new content analysis service via YQL – Yahoo! Opens Content Analysis Technology to all Developers:

[The Y! COntent Analysis service will] extract key terms from the content, and, more importantly, rank them based on their overall importance to the content. The output you receive contains the keywords and their ranks along with other actionable metadata.
On top of entity extraction and ranking, developers need to know whether key terms correspond to objects with existing rich metadata. Having this entity/object connection allows for the creation of highly engaging user experiences. The Y! Content Analysis output provides related Wikipedia IDs for key terms when they can be confidently identified. This enables interoperability with linked data on the semantic Web.

What this means is that you can push a content feed through the service, and get an annotated version out that includes identifier based hooks into other domains (i.e. little-l, little-d linked data). You can find the documentation here: Content Analysis Documentation for Yahoo! Search

So how does it fare? As I’ve previously explored using the Reuters Open Calais service to annotate OU/BBC programme listings (e.g. Augmenting OU/BBC Co-Pro Programme Data With Semantic Tags), I thought I’d use a programme feed from The Bottom Line again…

To start, we need to open the YQL developer console: http://developer.yahoo.com/yql/console/

We can then pull in an example programme description from the BBC using a YQL query of the form:

select long_synopsis from xml where url='http://www.bbc.co.uk/programmes/b00vy3l1.xml'

Grabbing a BBC programme feed into YQL

For reference, the text looks like this:

The view from the top of business. Presented by Evan Davis, The Bottom Line cuts through confusion, statistics and spin to present a clearer view of the business world, through discussion with people running leading and emerging companies.
In the week that Facebook launched its own new messaging service, Evan and his panel of top business guests discuss the role of email at work, amid the many different ways of messaging and communicating.
And location, location, location. It’s a cliche that location can make or break a business, but how true is it really? And what are the advantages of being next door to the competition?
Evan is joined in the studio by Chris Grigg, chief executive of property company British Land; Andrew Horton, chief executive of insurance company Beazley; Raghav Bahl, founder of Indian television news group Network 18.
Producer: Ben Crighton
Last in the series. The Bottom Line returns in January 2011.

The content analysis query example provided looks like this:

select * from contentanalysis.analyze where text="Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration"

but we can nest queries in order to pass the long_synposis from the BBC programme feed through the service:

select * from contentanalysis.analyze where text in (select long_synopsis from xml where url='http://www.bbc.co.uk/programmes/b00vy3l1.xml')

Here’s the result:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="2" yahoo:created="2011-12-22T11:03:51Z" yahoo:lang="en-US">
    <diagnostics>
        <publiclyCallable>true</publiclyCallable>
        <url execution-start-time="2" execution-stop-time="370"
            execution-time="368" proxy="DEFAULT"><![CDATA[http://www.bbc.co.uk/programmes/b00vy3l1.xml]]></url>
        <user-time>572</user-time>
        <service-time>565</service-time>
        <build-version>24402</build-version>
    </diagnostics> 
    <results>
        <categories xmlns="urn:yahoo:cap">
            <yct_categories>
                <yct_category score="0.536">Business &amp; Economy</yct_category>
                <yct_category score="0.421652">Finance</yct_category>
                <yct_category score="0.418182">Finance/Investment &amp; Company Information</yct_category>
            </yct_categories>
        </categories>
        <entities xmlns="urn:yahoo:cap">
            <entity score="0.979564">
                <text end="57" endchar="57" start="48" startchar="48">Evan Davis</text>
                <wiki_url>http://en.wikipedia.com/wiki/Evan_Davis</wiki_url>
                <types>
                    <type region="us">/person</type>
                    <type region="us">/place/place_of_interest</type>
                    <type region="us">/place/us/town</type>
                </types>
                <related_entities>
                    <wikipedia>
                        <wiki_url>http://en.wikipedia.com/wiki/Don%27t_Tell_Mama</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Lenny_Dykstra</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Los_Angeles_Police_Department</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Today_%28BBC_Radio_4%29</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Chrisman,_Illinois</wiki_url>
                    </wikipedia>
                </related_entities>
            </entity>
            <entity score="0.734099">
                <text end="265" endchar="265" start="258" startchar="258">Facebook</text>
                <wiki_url>http://en.wikipedia.com/wiki/Facebook</wiki_url>
                <types>
                    <type region="us">/organization</type>
                    <type region="us">/organization/domain</type>
                </types>
                <related_entities>
                    <wikipedia>
                        <wiki_url>http://en.wikipedia.com/wiki/Mark_Zuckerberg</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Social_network_service</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Twitter</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Social_network</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Digital_Sky_Technologies</wiki_url>
                    </wikipedia>
                </related_entities>
            </entity>
            <entity score="0.674621">
                <text end="477" endchar="477" start="450" startchar="450">location, location, location</text>
            </entity>
            <entity score="0.651227">
                <text end="79" endchar="79" start="60" startchar="60">The Bottom Line cuts</text>
                <types>
                    <type region="us">/other/movie/movie_name</type>
                </types>
            </entity>
            <entity score="0.646818">
                <text end="799" endchar="799" start="789" startchar="789">Raghav Bahl</text>
                <wiki_url>http://en.wikipedia.com/wiki/Raghav_Bahl</wiki_url>
                <types>
                    <type region="us">/person</type>
                </types>
                <related_entities>
                    <wikipedia>
                        <wiki_url>http://en.wikipedia.com/wiki/Network_18</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Superpower</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Deng_Xiaoping</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/The_Amazing_Race</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Hare</wiki_url>
                    </wikipedia>
                </related_entities>
            </entity>
            <entity score="0.644349">
                <text end="144" endchar="144" start="133" startchar="133">clearer view</text>
            </entity>
            <entity score="0.54609">
                <text end="675" endchar="675" start="665" startchar="665">Chris Grigg</text>
                <types>
                    <type region="us">/person</type>
                </types>
            </entity>
        </entities>
    </results>
</query>

So, some success in pulling out person names, and limited success on company names. The subject categories look reasonably appropriate too.

[UPDATE: I should have run the desc contentanalysis.analyze query before publishing this post to pull up the docs/examples… As well as the where text= argument, there is a where url= argument that will pul back semantic information about a URL. Running the query over the OU homepage, for example, using select * from contentanalysis.analyze where url=”http://www.open.ac.uk” identifies the OU as an organisation, with links out to Wikipedia, as well as geo-information and a Yahoo woe_id.]

Another related service in this area that I haven’t really explored yet is TSO’s Data Enrichment Service (API).

Here’s how it copes with the same programme synposis:

TSO Data Enrichment Service

Pretty good… and links in to dbpedia (better for machine readability) compared to the Wikipedia links that the Yahoo service offers.

For completeness, here’s what the Reuters Open Calais service comes up with:

OPen Calais - content analysis

The best of the bunch on this sample of one, I think, albeit admittedly in the domain the Reuters focus on?

But so what…? What are these services good for? Automatic metadata generation/extraction is one thing, as I’ve demonstrated in Visualising OU Academic Participation with the BBC’s “In Our Time”, where I generated a quick visualisation that showed the sorts of topics that OU academics had talked about as guests on Melvyn Bragg’s In Our Time, along with the topics that other universities had been engaged with on that programme.

VIDEO from the Global Investigative Journalism Conference

Global Investigative Journalism Conference logo

At the Global Investigative Journalism Conference in Kiev earlier this year I interviewed four individuals whose work I admire: Stephen Grey (talking about internet security for journalists), Luuk Sengers and Mark Lee Hunter (on organising your investigation), and Bo Elkjaer (on investigating networks).

I’ve been publishing these videos individually on the Help Me Investigate blog, but thought I would cross-publish them as a group here.

Here’s Mark Lee Hunter with his tips on gathering information before speaking to sources:

Stephen Grey on internet security considerations for journalists:

Luuk Sengers on organising your investigation:

And Bo Elkjaer on how he used computer technology to follow the money through network analysis:

Going ‘web-first’ – extract from Magazine Editing (3rd ed.)

In the final of three extracts from the 3rd edition of Magazine Editingpublished by Routledge, I talk about the tension between publishing first online, or holding material back for print. 

Magazine editors worry about topicality. Stories they send to press on Monday may be out of date by the time the magazine appears on Wednesday or Friday. It is no consolation to know that similar doubts affect the editors of daily newspapers, fated to follow in the wake of television. The print media must play to their strengths. Even a weekly magazine cannot stay on top of a breaking story of national significance. By the time it has appeared, things will have moved on and its readers will have seen more recent material online, on television and radio, or in their daily newspapers.

The internet, however, levels the playing field. TV, radio and newspapers all increasingly begin their reporting online. This is called a ‘Web first’ strategy and has its advantages and disadvantages. Clearly the major advantage is ‘owning’ the story. If you are the first to report it online then you are likely to dominate the search engine results when people look for that story. This in turn is likely to drive readers – and potential subscribers – to your main product (whether that is print or online).

The major fear that publishers have with ‘Web first’ strategies is losing their exclusives to rivals. This, however, is to misunderstand the complexities of multi-platform publishing which should involve playing to the strengths of each medium you publish in. Some publishers, for example, will supply video interviews to broadcasters (and online) just ahead of the publication of the print version of their story. This helps attract interest from people who might not normally buy your publication, without ‘giving away’ the print version of the story itself.

A good example of how not to do this comes from Rolling Stone magazine’s profile of top US commander Gen. Stanley McChrystal. The general was quoted making negative remarks about the vice president and key members of the US cabinet and the publication of these remarks in print led to his dismissal.

The dismissal, of course, increased interest in – and awareness of – the profile piece substantially – but the magazine failed to react to this interest on its website. As the website Talking Points Memo reported in a piece entitled ‘How Rolling Stone Won The News Cycle And Lost The Story’:

“Rolling Stone didn’t even bother putting [the story] online before they rolled it out [in print]. In fact, despite the fact that everyone else’s website led the profile, Rolling Stone’s site led with Lady Gaga … all day and didn’t even put the story online until 11:00.”

Nieman Journalism Lab explained why this cost them:

“The story made its way across the web anyway. Politico posted a PDF of the story and the Associated Press ran a thorough summary. Rolling Stone didn’t get much in the way of traffic out of it … After the piece ran [on Rolling Stone’s website], it started picking up incoming links, presumably driving tremendous traffic to the site. I checked in on the story today, exactly 24 hours later, to find that, despite the story completely dominating the news cycle — TV, blogosphere, Twitter, newspapers — only 16 comments had been posted to the story.

“Why? Of course the late posting was a factor. National security reporter Spencer Ackerman’s first [blog] post on the general’s apology, which went up several hours before Rolling Stone published, attracted 47 comments on his personal blog. Politico’s defense reporter Laura Rozen’s blog post on the AP’s summary of the story, which went up at 10:46 p.m. the night before the story appeared, has about twice as many comments as the Rolling Stone story itself. Twitter was buzzing with comments all day. There was nowhere to discuss at Rolling Stone, so the conversation naturally happened elsewhere.”

Another approach is to play to the community-based strengths of online publishing, by seeding an online debate with the main points of your exclusive, and using the best parts of that online discussion to flesh out the publication in print of your full exclusive.

In other words, do not fall into the trap of overvaluing the ‘exclusive’ at the expense of actual readers. If your objective is to attract the largest number of readers – online and in print – then be strategic in how you publish different parts of your story across different platforms. Can you involve online users at an early stage? Can you produce video or audio that bloggers and broadcasters might want to distribute? How can you give it the richest treatment in print that could not be duplicated in a broadcast or web treatment? And, once published, how can you ensure that discussion of the exclusive takes place on – or directs traffic to – your site (or indeed, where your revenue is coming from, which may include adverts embedded in media on other sites)? All of these elements require thought at the outset of any newsgathering operation.

A magazine has its own strengths it should play to. Instead of trailing behind newspapers and television – whose space and time is more limited, and news cycle more tempestuous – it can provide analytical coverage, based on its trusted relationships within the industry and in-house expertise. It can also focus its treatment more specifically than the mass media will – as a newspaper with a broader audience will not be able to assume much prior knowledge on their part.

Some editors, usually of weekly magazines, take the view that monthlies shouldn’t try to compete in the news area. They should simply use the space for something else. This is defeatist, and overlooks the role of the website in providing news updates as they occur. The slower pace of a monthly should mean that it can unearth and research genuinely exclusive stories. That way it will lead everybody else, which is good for morale and sales. It can certainly go deeper, using the sources it has had time to cultivate.

If you are going to do news in a monthly, you must consider the issue of the exclusivity of your stories and whether you wish to lead with the story in print or online. Given the increasing ability of sources to publish themselves (via a company or individual blog, for example – or even Facebook), or the likelihood that someone else might do the same, obtaining cooperation and silence while you wait for the next monthly print run to roll around is becoming increasingly difficult.

Ultimately you must ask yourself where the value lies: in the exclusivity, or in the treatment and distribution of that information? Do people buy your magazine purely for the exclusive news – or largely for other content? Is it better to publish part of the exclusive online, establishing ‘ownership’ of it and promoting further revelations or analysis in print? (While also attracting new readers who come across your publication when a link is sent to them)

Publishing a part of an exclusive online – and holding the remainder back for print publication – is a strategy often adopted by publishers. Your own decision will depend in large part upon where your funding comes from, where you are trying to attract it, what sort of people read your publication, and how.

More and more publishers are going for this ‘web-first’ strategy, playing to the strengths of each medium: speed, findability and social distribution online; and analysis and depth in print. It can also increase the life of a story from a single issue to a couple of weeks online, through printing, and back online with further reaction.

Building a relationship with sources often rests on the authority of your magazine and yourself, and the serious treatment you can give to their story. They will have to balance that against the control that they will have if they publish the story themselves, online. One factor that may be worth raising is that ‘exclusivity’ often attracts more interest from those who missed the exclusive, than a source-published story which all journalists can see at the same time. The founder of Wikileaks understood this when breaking the various ‘Warlogs’ stories – instead of publishing the logs online as they had with previous leaks, the organisation partnered with individual news organisations in three different countries, attracting wider coverage of the documents not just in those newspapers but also in jealous rivals.

The bulk of your coverage will not be exclusive. Use the focusing power of news design to achieve the right balance. You can give great prominence and projection to your exclusive stories, while covering the stuff most readers may have seen in a round-up box or column of news ‘briefs’.

Aside from the news that makes the printed magazine, a monthly news team tends to produce a continuously updated news page as part of the website. This may include one or more individual, team, or subject-based blogs, and a daily or weekly e-mail update.

Your own news feeds may be syndicated to other news sites and blogs, adding to your publication’s reach. Typically a magazine website’s news section will have an RSS feed of its latest stories; increasingly, they will have a number of RSS feeds for news about different parts of their field.

RSS feeds have enormous flexibility and potential for various uses. If someone uses an RSS reader on their computer or phone, they can read your feeds there; if they publish a blog they can ‘pull’ your feed to show your latest headlines (when clicked, the user will be taken to your site). You can also use RSS feeds to cross-publish your latest headlines to a Twitter account, a Facebook page, and various other places.

RSS feeds can be full (showing the entire story) or partial (showing only a first paragraph – the user then has to click through to the full story on your site – although this introduces an extra step that can reduce readership and create a frustrating user experience), and they can include advertising and multimedia. They are, in effect, one of the delivery vans of internet distribution.

Magazine editing: managing information overload

In the second of three extracts from the 3rd edition of Magazine Editingpublished by Routledge, I talk about dealing with the large amount of information that magazine editors receive. 

Managing information overload

A magazine editor now has little problem finding information on a range of topics. It is likely that you will have subscribed to email newsletters, RSS feeds, Facebook groups and pages, YouTube channels and various other sources of news and information both in your field and on journalistic or management topics.

There tend to be two fears driving journalists’ information consumption: the fear that you will miss out on something because you’re not following the right sources; and the fear that you’ll miss out on something because you’re following too many sources. This leads to two broad approaches: people who follow everything of any interest (‘follow, then filter’); and people who are very strict about the number of sources of information they follow (‘filter, then follow’).

A good analogy to use here is of streams versus ponds. A pond is manageable, but predictable. A stream is different every time you step in it, but you can miss things.

As an editor you are in the business of variety: you need to be exposed to a range of different pieces of information, and cannot afford to be caught out. A good strategy for managing your information feeds then, is to follow a wide variety of sources, but to add filters to ensure you don’t miss all the best stuff.

If you are using an RSS reader one way to do this is to have specific folders for your ‘must-read’ feeds. Andrew Dubber, a music industries academic and author of the New Music Strategies blog, recommends choosing 10 subjects in your area, and choosing five ‘must-read’ feeds for each, for example.

For email newsletters and other email updates you can adopt a similar strategy: must-reads go into your Inbox; others are filtered into subfolders to be read if you have time.

To create a folder in Google Reader, add a new feed (or select an existing one) and under the heading click on Feed Settings… – then scroll to the bottom and click on New Folder… – this will also add the feed to that folder.

If you are following hundreds or thousands of people on Twitter, use Twitter lists to split them into manageable channels: ‘People I know’; ‘journalism’; ‘industry’; and so on. To add someone to a list on Twitter, visit their profile page and click on the list button, which will be around the same area as the ‘Follow’ button.

You can also use websites such as Paper.li to send you a daily email ‘newspaper’ of the most popular links shared by a particular list of friends every day, so you don’t miss out on the most interesting stories.

Social bookmarking: creating an archive and publishing at the same time

Social bookmarking tools like Delicious, Digg and Diigo can also be useful in managing web-based resources that you don’t have time to read or think might come in useful later. Bookmarking them essentially ‘files’ each webpage so you can access them quickly when you need them (you do this by giving each page a series of relevant tags, e.g. ‘dieting’, ‘research’, ‘UK’, ‘Jane Jones’).

They also include a raft of other useful features, such as RSS feeds (allowing you to automatically publish selected items to a website, blog, or Twitter or Facebook account), and the ability to see who else has bookmarked the same pages (and what else they have bookmarked, which is likely to be relevant to your interests).

Check the site’s Help or FAQ pages to find out how to use them effectively. Typically this will involve adding a button to your browser’s Links bar (under the web address box) by dragging a link (called ‘Bookmark on Delicious’ or similar) from the relevant page of the site (look for ‘bookmarklets’).

Then, whenever you come across a page you want to bookmark, click on that button. A new window will appear with the name and address of the webpage, and space for you to add comments (a typical tactic is to paste a key quote from the page here), and tags.

Useful things to add as tags include anything that will help you find this later, such as any organisations, locations or people that are mentioned, the author or publisher, and what sort of information is included, such as ‘report’, ‘statistics’, ‘research’, ‘casestudy’ and so on.

If installing a button on your browser is too complicated or impractical many of these services also allow you to bookmark a page by sending the URL to a specific email address. Alternatively, you can just copy the URL and log on to the bookmarking site to bookmark it.

Some bookmarking services double up as blogging sites: Tumblr and Stumbleupon are just two. The process is the same as described above, but these services are more intuitively connected with other services such as Twitter and Facebook, so that bookmarked pages are also automatically published on those services too. With one click your research not only forms a useful archive but also becomes an act of publishing and distribution.

Every so often you might want to have a clear out: try diverting mailings and feeds to a folder for a week without looking at them. After seven days, ask which ones, if any, you have missed. You might benefit from unsubscribing and cutting down some information clutter. In general, it may be useful to have background information, but it all occupies your time. Treat such things as you would anything sent to you on paper. If you need it, and it is likely to be difficult to find again, file it or bookmark it. If not, bin it. After a while, you’ll find it gets easier.

Do you have any other techniques for dealing with information overload?

 

Why we shouldn’t be discouraging students from writing about students

I have a confession: I have never liked student projects aimed at students. They tend to betray a lazy approach to creativity: after all, what can be less imaginative than a project aimed at ‘people like me’?

They also don’t generally develop the skills that journalism degrees aim for: original research, for example; flexibility in style; or an exploration of professional context.

And I’m not alone: most journalism tutors, when looking for an assignment to give or weighing up a student’s proposal, will run a mile from anything aimed at students. “Go write for the student newspaper if you want to do that.”

Lazy

But I think my instinctive aversion has been wrong. I think it’s as lazy as the ideas I’ve criticised. And I think it means missing an enormous opportunity.

Traditionally, one of the biggest strengths of the regional journalist was their connection to the communities they reported on. They knew the issues; they knew who to speak to in those communities (and not just who published the press releases); they knew their readers; and they saw the impact of their work.

University students, in contrast, are perhaps at a stage in their life when they are least connected to any community. They are often living in a town or city they have no history in; they are unlikely to run businesses, or belong to any industrial or professional culture; few have children in the local education and health systems. They are inbetweeners.

It is possibly the worst time in somebody’s life to expect them to do journalism.

And the one thing that they are connected to – student life – we steer them away from.

A New Year’s resolution

So I have a New Year’s resolution for 2012: I’m going to change the habit that I’ve acquired from a decade in teaching journalism.

For the first time I am going to assign my students – just one group – a project focused on students.

It will still build those essential skills: original research; flexibility of style; professional context. But those skills will be built upon a knowledge that what they will be doing will have a large audience, and can make a real difference to them.

That means that I will be expecting more. Because they already know the community they are writing about, I will be expecting them to hit the ground running with original leads and story ideas – not trying to hit a story quota with press releases or superficial he-said-she-said conflicts.

Because the project will be online-only, I will be expecting them to be exploring new ways of engaging – and collaborating – with the most connected audiences in the country.

And because they are personally affected by the systems they are reporting on – from employment law and tenants’ rights to student councils and representation – I will be expecting them to research the system itself: where power and accountability lies; where the money goes, and why.

As a result, I’m hoping that students will develop an understanding of how to investigate systems in any field – transferring their experiences of investigating education into investigating the health system, welfare system, local government, or anything else.

I’ll be using Help Me Investigate Education as a space to help them build that knowledge, and those connections, and to collaborate with journalism students and others across the UK. If you have a class that you want to get involved, I’d be happy to help.

And there are plenty of stories to be told. Like any transient population, students are subject to many abuses of power. In 2012 I want to see if, given the opportunity, student journalists can hold that power to account.

Magazine editing: social media policies

In the first of three extracts from the 3rd edition of Magazine Editing, published by Routledge, I talk about some basic considerations in drawing up social media policies. If you are aware of any particularly good or bad examples of social media policies in the magazine industry, I’d love to know.

Social media policies

A policy need not be particularly restrictive – the key is that everyone is clear what is acceptable (and in some cases, what is encouraged, or ‘best practice’), as well as what to do in particular situations (such as when they receive abusive or offensive messages).

There are plenty of examples to look at online, including a database of social media policies at socialmediagovernance.com/policies.php – key issues for you as a publication are making all journalists aware of legal risks such as defamation, contempt and copyright (which they might normally otherwise think sub-editors are covering) and professionalism (for example, posting inappropriate images on an account they used for professional purposes).

Also worth considering carefully are the areas of objectivity and impartiality. US publications are a lot more anxious about their journalists being perceived to be anything but completely neutral in all affairs, leading to some policies that would appear draconian to the more opinionated Brits.

Neutrality, however, is different to objectivity (which is rather more complicated but comes down to a process based on facts rather than simply creating an appearance of balance through presenting conflicting beliefs), and well informed opinion is a key feature in most magazines.

You want to allow your writers to play to their strengths and find their natural ‘voice’ on social media platforms (institutional voices do not work well here), while also guarding against ill-considered comments that might be used against the publication.

What other issues should a social media policy cover? And why should a magazine have one?