Category Archives: online journalism

Comment call: Objectivity and impartiality – a newsroom policy for student projects

I’ve been updating a newsroom policy guide for a project some of my students will be working on, with a particular section on objectivity and impartiality. As this has coincided with the debate on fact-checking stirred by the New York Times public editor Arthur Brisbane, I thought I would reproduce the guidelines here, and invite comments on whether you think it hits the right note:

Objectivity and impartiality: newsroom policy

Objectivity is a method, not an element of style. In other words:

Do not write stories that give equal weight to each ‘side’ of an argument if the evidence behind each side of the argument is not equal. Doing so misrepresents the balance of opinions or facts. Your obligation is to those facts, not to the different camps whose claims may be false.
Do not simply report the assertions of different camps. As a journalist your responsibility is to check those assertions. If someone misrepresents the facts, do not simply say someone else disagrees, make a statement along the lines of “However, the actual wording of the report…” or “The official statistics do not support her argument” or “Research into X contradict this.” And of course, link to that evidence and keep a copy for yourself (which is where transparency comes in).

Lazy reporting of assertions without evidence is called the ‘View From Nowhere’ – you can read Jay Rosen’s Q&A or the Wikipedia entry, which includes this useful explanation:

“A journalist who strives for objectivity may fail to exclude popular and/or widespread untrue claims and beliefs from the set of true facts. A journalist who has done this has taken The View From Nowhere. This harms the audience by allowing them to draw conclusions from a set of data that includes untrue possiblities. It can create confusion where none would otherwise exist.”

Impartiality is dependent on objectivity. It is not (as subjects of your stories may argue) giving equal coverage to all sides, but rather promising to tell the story based on objective evidence rather than based on your own bias or prejudice. All journalists will have opinions and preconceived ideas of what a story might be, but an impartial journalist is prepared to change those opinions, and change the angle of the story. In the process they might challenge strongly-held biases of the society they report on – but that’s your job.

The concept of objectivity comes from the sciences, and this provides a useful guideline: scientists don’t sit between two camps and repeat assertions without evaluating them. They identify a claim (hypothesis) and gather the evidence behind it – both primary and secondary.

Claims may, however, already be in the public domain and attracting a lot of attention and support. In those situations reporting should be open about the information the journalist does not have. For example:

“His office, however, were unable to direct us to the evidence quoted”, or
“As the report is yet to be published, it is not possible to evaluate the accuracy of these claims”, or
“When pushed, X could not provide any documentation to back up her claims”.

Thoughts?

Different Speeches? Digital Skills Aren’t just About Coding…

Secretary of State for Education, Michael Gove, gave a speech yesterday on rethinking the ICT curriculum in UK schools. You can read a copy of the speech variously on the Department for Education website, or, err, on the Guardian website.

Seeing these two copies of what is apparently the same speech, I started wondering:

a) which is the “best” source to reference?
b) how come the Guardian doesn’t add a disclaimer about the provenance of, and link, to the DfE version? [Note the disclaimer in the DfE version – “Please note that the text below may not always reflect the exact words used by the speaker.”]
c) is the Guardian version an actual transcript, maybe? That is, does the Guardian reprint the “exact words” used by the speaker?

And that made me think I should do a diff… About which, more below…

Before that, however, here’s a quick piece of reflection on how these two things – the reinvention of the the IT curriculum, and the provenance of, and value added to, content published on news and tech industry blog sites – collide in my mind…

So for example, I’ve been pondering what the role of journalism is, lately, in part because I’m trying to clarify in my own mind what I think the practice and role of data journalism are (maybe I should apply for a Nieman-Berkman Fellowship in Journalism Innovation to work on this properly?!). It seems to me that “communication” is one important part (raising awareness of particular issues, events, or decisions), and holding governments and companies to account is another. (Actually, I think Paul Bradshaw has called me out on that, before, suggesting it was more to do with providing an evidence base through verification and triangulation, as well as comment, against which governments and companies could be held to account (err, I think? As an unjournalist, I don’t have notes or a verbatim quote against which to check that statement, and I’m too lazy to email/DM/phone Paul to clarify what he may or may not have said…(The extent of my checking is typically limited to what I can find on the web or in personal archives…which appear to be lacking on this point…))

Another thing I’ve been mulling over recently in a couple of contexts relates to the notion of what are variously referred to as digital or information skills.

The first context is “data journalism”, and the extent to which data journalists need to be able to do programming (in the sense of identifying the steps in a process that can be automated and how they should be sequenced or organised) versus writing code. (I can’t write code for toffee, but I can read it well enough to copy, paste and change bits that other people have written. That is, I can appropriate and reuse other people’s code, but can’t write it from scratch very well… Partly because I can’t ever remember the syntax and low level function names. I can also use tools such as Yahoo Pipes and Google Refine to do coding like things…) Then there’s the question of what to call things like URL hacking or (search engine) query building?

The second context is geeky computer techie stuff in schools, the sort of thing covered by Michael Gove’s speech at the BETT show on the national ICT curriculum (or lack thereof), and about which the educational digerati were all over on Twitter yesterday. Over the weekend, houseclearing my way through various “archives”, I came across all manner of press clippings from 2000-2005 or so about the activities of the OU Robotics Outreach Group, of which I was a co-founder (the web presence has only recently been shut down, in part because of the retirement of the sys admin on whose server the websites resided.) This group ran an annual open meeting every November for several years hosting talks from the educational robotics community in the UK (from primary school to HE level). The group also co-ordinated the RoboCup Junior competition in the UK, ran outreach events, developed various support materials and activities for use with Lego Mindstorms, and led the EPSRC/AHRC Creative Robotics Research Network.

At every robotics event, we’d try to involve kids and/or adults in elements of problem solving, mechanical design, programming (not really coding…) based around some sort of themed challenge: a robot fashion show, for example, or a treasure hunt (both variants on edge following/line following;-) Or a robot rescue mission, as used in a day long activity in the “Engineering: An Active Introduction” (TXR120) OU residential school, or the 3 hour “Robot Theme Park” team building activity in the Masters level “Team Engineering” (T885) weekend school. [If you’re interested, we may be able to take bookings to run these events at your institution. We can make them work at a variety of difficulty levels from KS3-4 and up;-)]

Given that working at the bits-atoms interface is where the a lot of the not-purely-theoretical-or-hardcore-engineering innovation and application development is likely to take place over the next few years, any mandate to drop the “boring” Windows training ICT stuff in favour of programming (which I suspect can be taught in not only a really tedious way, but a really confusing and badly delivered way too) is probably Not the Best Plan.

Slightly better, and something that I know is currently being mooted for reigniting interest in computing, is the Raspberry Pi, a cheap, self-contained, programmable computer on a board (good for British industry, just like the BBC Micro was…;-) that allows you to work at the interface between the real world of atoms and the virtual world of bits that exists inside the computer. (See also things like the OU Senseboard, as used on the OU course “My Digital Life” (TU100).)

If schools were actually being encouraged to make a financial investment on a par with the level of investment around the introduction of the BBC Micro, back in the day, I’d suggest a 3D printer would have more of the wow factor…(I’ll doodle more on the rationale behind this in another post…) The financial climate may not allow for that (but I bet budget will manage to get spent anyway…) but whatever the case, I think Gove needs to be wary about consigning kids to lessons of coding hell. And maybe take a look at programming in a wider creative context, such as robotics (the word “robotics” is one of the reason why I think it’s seen as a very specialised, niche subject; we need a better phrase, such as “Creative Technologies”, which could combine elements of robotics, games programming, photoshop, and, yex, Powerpoint too… Hmm… thinks.. the OU has a couple of courses that have just come to the end of their life that between them provide a couple of hundred hours of content and activity on robotics (T184) and games programming (T151), and that we delivered, in part, to 6th formers under the OU’s Young Applicants in Schools Scheme.

Anyway, that’s all as maybe… Because there are plenty of digital skills that let you do coding like things without having to write code. Such as finding out whether there are any differences between the text in the DfE copy of Gove’s BETT speech, and the Guardian copy.

Copy the text from each page into a separate text file, and save it. (You’ll need a text editor for that..) Then, if you haven’t already got one, find yourself a good text editor. I use Text Wrangler on a Mac. (Actually, I think MS Word may have a diff function?)

The difference’s all tend to be in the characters used for quotation marks (character encodings are one of the things that can make all sorts of programmes fall over, or misbehave. Just being aware that they may cause a problem, as well as how and why, would be a great step in improving the baseline level understanding of folk IT. Some of the line breaks don’t quite match up either, but other than that, the text is the same.

Now, this may be because Gove was a good little minister and read out the words exactly as they had been prepared. Or it may be the case that the Guardian just reprinted the speech without mentioning provenance, or the disclaimer that he may not actually have read the words of that speech (I have vague memories of an episode of Yes, Minister, here…;-)

Whatever the case, if you know: a) that it’s even possible to compare two documents to see if they are different (a handy piece of folk IT knowledge); and b) know a tool that does it (or how to find a tool that does it, or a person that may have a tool that can do it), then you can compare the texts for yourself. And along the way, maybe learn that churnalism, in a variety of forms, is endemic in the media. Or maybe just demonstrate to yourself when the media is acting in a purely comms, rather than journalistic, role?

PS other phrases in the area: “computational thinking”. Hear, for example: A conversation with Jeannette Wing about computational thinking

PPS I just remembered – there’s a data journalism hook around this story too… from a tweet exchange last night that I was reminded of by an RT:

josiefraser: RT @grmcall: Of the 28,000 new teachers last year in the UK, 3 had a computer-related degree. Not 3000, just 3. dlivingstone: @josiefraser Source??? Not found it yet. RT @grmcall: 28000 new UK teachers last year, 3 had a computer-related degree. Not 3000, just 3 josiefraser: That ICT qualification teacher stat RT @grmcall: Source is the Guardian http://www.guardian.co.uk/education/2012/jan/09/computer-studies-in-schools

I did a little digging and found the following document on the General Teaching Council of England website – Annual digest of statistics 2010–11 – Profiles of registered teachers in England [PDF] – that contains demographic stats, amongst others, for UK teachers. But no stats relating to subject areas of degree level qualifications held, which is presumably the data referred to in the tweet. So I’m thinking: this is partly where the role of data journalist comes in… They may not be able to verify the numbers by checking independent sources, but they may be able to shed some light on where the numbers came from and how they were arrived at, and maybe even secure their release (albeit as a single point source?)

20 free ebooks on journalism (for your Xmas Kindle)

24 Replies

For some reason there are two versions of this post on the site – please check the more up to date version here.

20 free ebooks on journalism (for your Xmas Kindle) {updated to 65}

15 Replies

As many readers of this blog will have received a Kindle for Christmas I thought I should share my list of the free ebooks that I recommend stocking up on.

Online journalism and multimedia ebooks

Starting with more general books, Mark Briggs‘s book Journalism 2.0 (PDF*) is a few years old but still provides a good overview of online journalism to have by your side. Mindy McAdams‘s 42-page Reporter’s Guide to Multimedia Proficiency (PDF) adds some more on that front, and Adam Westbrook‘s Ideas on Digital Storytelling and Publishing (PDF) provides a larger focus on narrative, editing and other elements.

After the first version of this post, MA Online Journalism student Franzi Baehrle suggested this free book on DSLR Cinematography, as well as Adam Westbrook on multimedia production (PDF). And Guy Degen recommends the free ebook on news and documentary filmmaking from ImageJunkies.com.

The Participatory Documentary Cookbook [PDF] is another free resource on using social media in documentaries.

A free ebook on blogging can be downloaded from Guardian Students when you register with the site, and Swedish Radio have produced this guide to Social Media for Journalists (in English).

The Traffic Factories is an ebook that explores how a number of prominent US news organisations use metrics, and Chartbeat’s role in that. You can download it in mobi, PDF or epub format here.

Continue reading →

Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grabbed Using Google Refine

What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here’s a recipe for visualising that data…

After discovering via Martin Hawksey that the recent (December, 2011) 2.5 release of Google Refine allows you to import JSON and XML feeds to bootstrap a new project, I wondered whether it would be able to pull in data from the Facebook API if I was logged in to Facebook (Google Refine does run in the browser after all…)

Looking through the Facebook API documentation whilst logged in to Facebook, it’s easy enough to find exemplar links to things like your friends list (https://graph.facebook.com/me/friends?access_token=A_LONG_JUMBLE_OF_LETTERS) or the list of likes someone has made (https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS); replacing me with the Facebook ID of one of your friends should pull down a list of their friends, or likes, etc.

(Note that validity of the access token is time limited, so you can’t grab a copy of the access token and hope to use the same one day after day.)

Grabbing the link to your friends on Facebook is simply a case of opening a new project, choosing to get the data from a Web Address, and then pasting in the friends list URL:

Click on next, and Google Refine will download the data, which you can then parse as a JSON file, and from which you can identify individual record types:

If you click the highlighted selection, you should see the data that will be used to create your project:

You can now click on Create Project to start working on the data – the first thing I do is tidy up the column names:

We can now work some magic – such as pulling in the Likes our friends have made. To do this, we need to create the URL for each friend’s Likes using their Facebook ID, and then pull the data down. We can use Google Refine to harvest this data for us by creating a new column containing the data pulled in from a URL built around the value of each cell in another column:

The Likes URL has the form https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS which we’ll tinker with as follows:

The throttle control tells Refine how often to make each call. I set this to 500ms (that is, half a second), so it takes a few minutes to pull in my couple of hundred or so friends (I don’t use Facebook a lot;-). I’m not sure what limit the Facebook API is happy with (if you hit it too fast (i.e. set the throttle time too low), you may find the Facebook API stops returning data to you for a cooling down period…)?

Having imported the data, you should find a new column:

At this point, it is possible to generate a new column from each of the records/Likes in the imported data… in theory (or maybe not..). I found this caused Refine to hang though, so instead I exprted the data using the default Templating… export format, which produces some sort of JSON output…

I then used this Python script to generate a two column data file where each row contained a (new) unique identifier for each friend and the name of one of their likes:

import simplejson,csv

writer=csv.writer(open('fbliketest.csv','wb+'),quoting=csv.QUOTE_ALL)

fn='my-fb-friends-likes.txt'

data = simplejson.load(open(fn,'r'))
id=0
for d in data['rows']:
	id=id+1
	#'interests' is the column name containing the Likes data
	interests=simplejson.loads(d['interests'])
	for i in interests['data']:
		print str(id),i['name'],i['category']
		writer.writerow([str(id),i['name'].encode('ascii','ignore')])

[I think this R script, in answer to a related @mhawksey Stack Overflow question, also does the trick: R: Building a list from matching values in a data.frame]

I could then import this data into Gephi and use it to generate a network diagram of what they commonly liked:

Rather than returning Likes, I could equally have pulled back lists of the movies, music or books they like, their own friends lists (permissions settings allowing), etc etc, and then generated friends’ interest maps on that basis.

PS dropping out of Google Refine and into a Python script is a bit clunky, I have to admit. What would be nice would be to be able to do something like a “create new rows with new column from column” pattern that would let you set up an iterator through the contents of each of the cells in the column you want to generate the new column from, and for each pass of the iterator: 1) duplicate the original data row to create a new row; 2) add a new column; 3) populate the cell with the contents of the current iteration state. Or something like that…

PPS Related to the PS request, there is a sort of related feature in the 2.5 release of Google Refine that lets you merge data from across rows with a common key into a newly shaped data set: Key/value Columnize. Seeing this, it got me wondering what a fusion of Google Refine and RStudio might be like (or even just R support within Google Refine?)

PPPS this could be interesting – looks like you can test to see if a friendship exists given two Facebook user IDs.

2011: the UK hyper-local year in review

12 Replies

In this guest post, Damian Radcliffe highlights some topline developments in the hyper-local space during 2011. He also asks for your suggestions of great hyper-local content from 2011. His more detailed slides looking at the previous year are cross-posted at the bottom of this article.

2011 was a busy year across the hyper-local sphere, with a flurry of activity online as well as more traditional platforms such as TV, Radio and newspapers.

The Government’s plans for Local TV have been considerably developed, following the Shott Review just over a year ago. We now have a clearer indication of the areas which will be first on the list for these new services and how Ofcom might award these licences. What we don’t know is who will apply for these licences, or what their business models will be. But, this should become clear in the second half of the year.

Whilst the Leveson Inquiry hasn’t directly been looking at local media, it has been a part of the debate. Claire Enders outlined some of the challenges facing the regional and local press in a presentation showing declining revenue, jobs and advertising over the past five years. Her research suggests that the impact of “the move to digital” has been greater at a local level than at the nationals.

Across the board, funding remains a challenge for many. But new models are emerging, with Daily Deals starting to form part of the revenue mix alongside money from foundations and franchising.

And on the content front, we saw Jeremy Hunt cite a number of hyper-local examples at the Oxford Media Convention, as well as record coverage for regional press and many hyper-local outlets as a result of the summer riots.

I’ve included more on all of these stories in my personal retrospective for the past year.

One area where I’d really welcome feedback is examples of hyper-local content you produced – or read – in 2011. I’m conscious that a lot of great material may not necessarily reach a wider audience, so do post your suggestions below and hopefully we can begin to redress that.

Mapping the New Year Honours List – Where Did the Honours Go?

When I get a chance, I’ll post a (not totally unsympathetic) response to Milo Yiannopoulos’post The pitiful cult of ‘data journalism’, but in the meantime, here’s a view over some data that was released a couple of days ago – a map of where the New Year Honours went [link]

[Hmm… so WordPress.com doesn’t seem to want to let me embed a Google Fusion Table map iframe, and Google Maps (which are embeddable) just shows an empty folder when I try to view the Fusion Table KML… (the Fusion Table export KML doesn’t seem to include lat/lng data either? Maybe I need to explore some hosting elsewhere this year…]

Note that I wouldn’t make the claim that this represents an example of data journalism. It’s a sketch map showing which parts of the country various recipients of honours this time round presumably live. Just by posting the map, I’m not reporting any particular story. Instead, I’m trying to find a way of looking at the day to see whether or not there may be any interesting stories that are suggested by viewing the data in this way.

There was a small element of work involved in generating the map view, though… Working backwards, when I used Google Fusion tables to geocode the locations of the honoured, some of the points were incorrectly located:

(It would be nice to be able to force a locale to the geocoder, maybe telling it to use maps.google.co.uk as the base, rather than (presumably) maps.google.com?)

The approach I took to tidying these was rather clunky, first going into the table view and filtering on the mispositioned locations:

Then correcting them:

What would be really handy would be if Google Fusion Tables let you see a tabular view of data within a particular map view – so for example, if I could zoom in to the US map and then get a tabular view of the records displayed on that particular local map view… (If it does already support this and I just missed it, please let me know via the comments..;-)

So how did I get the data into Google Fusion Tables? The original data was posted as a PDF on the DirectGov website (New Year Honours List 2012 – in detail)…:

…so I used Scraperwiki to preview and read through the PDF and extract the honours list data (my scraper is a little clunky and doesnlt pull out 100% of the data, missing the occasional name and contribution details when it’s split over several lines; but I think it does a reasonable enough job for now, particularly as I am currently more interested in focussing on the possible high level process for extracting and manipulating the data, rather than the correctness of it…!;-)

Here’s the scraper (feel free to improve upon it….:-): Scraperwiki: New Year Honours 2012

I then did a little bit of tweaking in Google Refine, normalising some of the facets and crudely attempting to separate out each person’s role and the contribution for which the award was made.

For example, in the case of Dr Glenis Carole Basiro DAVEY, given column data of the form “The Open University, Science Faculty and Health Education and Training Programme, Africa. For services to Higher and Health Education.“, we can use the following expressions to generate new sub-columns:

– value.match(/.*(For .*)/)[0] to pull out things like “For services to Higher and Health Education.”
– value.match(/(.*)For .*/)[0] to pull out things like “The Open University, Science Faculty and Health Education and Training Programme, Africa.”

I also ran each person’s record through Reuters Open Calais service using Google Refine’s ability to augment data with data from a URL (“Add column by fetching URLs”), pulling the data back as JSON. Here’s the URL format I used (polling once every 500ms in order to stay with the max. 4 calls per limit threshold mandated by the API.)

"http://api.opencalais.com/enlighten/rest/?licenseID=<strong>MY_LICENSE_KEY</strong>&content=" + escape(value,'url') + "&paramsXML=%3Cc%3Aparams%20xmlns%3Ac%3D%22http%3A%2F%2Fs.opencalais.com%2F1%2Fpred%2F%22%20xmlns%3Ardf%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%3E%20%20%3Cc%3AprocessingDirectives%20c%3AcontentType%3D%22TEXT%2FRAW%22%20c%3AoutputFormat%3D%22Application%2FJSON%22%20%20%3E%20%20%3C%2Fc%3AprocessingDirectives%3E%20%20%3Cc%3AuserDirectives%3E%20%20%3C%2Fc%3AuserDirectives%3E%20%20%3Cc%3AexternalMetadata%3E%20%20%3C%2Fc%3AexternalMetadata%3E%20%20%3C%2Fc%3Aparams%3E"

Unpicking this a little:

– licenseID is set to my license key value
– content is the URL escaped version of the text I wanted to process (in this case, I created a new column from the name column that also pulled in data from a second column (the contribution column). The GREL formula I used to join the columns took the form: value+', '+cells["contribution"].value)
– paramsXML is the URL encoded version of the following parameters, which set the content encoding for the result to be JSON (the default is XML):

<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<c:processingDirectives c:contentType="TEXT/RAW" c:outputFormat="Application/JSON"  >
</c:processingDirectives>
<c:userDirectives>
</c:userDirectives>
<c:externalMetadata>
</c:externalMetadata>
</c:params>

So much for process – now where are the stories? That’s left, for now, as an exercise for the reader. An obvious starting point is just to see who received honours in your locale. Remember, Google Fusion Tables lets you generate all sorts of filtered views, so it’s not too hard to map where the MBEs vs OBEs are based, for example, or have a stab at where awards relating to services to Higher Education went. Some awards also have a high correspondence with a particular location, as for example in the case of Enfield…

If you do generate any interesting views from the New Year Honours 2012 Fusion Table, please post a link in the comments. And if you find a problem with/fix for the data or the scraper, please post that info in a comment too:-)

A Quick Peek at Three Content Analysis Services

A long, long time ago, I tinkered with a hack called Serendipitwitterous (long since rotted, I suspect), that would look through a Twitter stream (personal feed, or hashtagged tweets), use the Yahoo term extraction service to try to identify concepts or key words/phrases in each tweet, and then use these as a search term on Slideshare, Youtube and so on to find content that may or may not be loosely related to each tweet.

The Yahoo Term Extraction is still hanging in there – just – but I think it finally gets deprecated early next year. From my feeds today, however, it seems there may be a replacement in the form of a new content analysis service via YQL – Yahoo! Opens Content Analysis Technology to all Developers:

[The Y! COntent Analysis service will] extract key terms from the content, and, more importantly, rank them based on their overall importance to the content. The output you receive contains the keywords and their ranks along with other actionable metadata.
On top of entity extraction and ranking, developers need to know whether key terms correspond to objects with existing rich metadata. Having this entity/object connection allows for the creation of highly engaging user experiences. The Y! Content Analysis output provides related Wikipedia IDs for key terms when they can be confidently identified. This enables interoperability with linked data on the semantic Web.

What this means is that you can push a content feed through the service, and get an annotated version out that includes identifier based hooks into other domains (i.e. little-l, little-d linked data). You can find the documentation here: Content Analysis Documentation for Yahoo! Search

So how does it fare? As I’ve previously explored using the Reuters Open Calais service to annotate OU/BBC programme listings (e.g. Augmenting OU/BBC Co-Pro Programme Data With Semantic Tags), I thought I’d use a programme feed from The Bottom Line again…

To start, we need to open the YQL developer console: http://developer.yahoo.com/yql/console/

We can then pull in an example programme description from the BBC using a YQL query of the form:

select long_synopsis from xml where url='http://www.bbc.co.uk/programmes/b00vy3l1.xml'

For reference, the text looks like this:

The view from the top of business. Presented by Evan Davis, The Bottom Line cuts through confusion, statistics and spin to present a clearer view of the business world, through discussion with people running leading and emerging companies.
In the week that Facebook launched its own new messaging service, Evan and his panel of top business guests discuss the role of email at work, amid the many different ways of messaging and communicating.
And location, location, location. It’s a cliche that location can make or break a business, but how true is it really? And what are the advantages of being next door to the competition?
Evan is joined in the studio by Chris Grigg, chief executive of property company British Land; Andrew Horton, chief executive of insurance company Beazley; Raghav Bahl, founder of Indian television news group Network 18.
Producer: Ben Crighton
Last in the series. The Bottom Line returns in January 2011.

The content analysis query example provided looks like this:

select * from contentanalysis.analyze where text="Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration"

but we can nest queries in order to pass the long_synposis from the BBC programme feed through the service:

select * from contentanalysis.analyze where text in (select long_synopsis from xml where url='http://www.bbc.co.uk/programmes/b00vy3l1.xml')

Here’s the result:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="2" yahoo:created="2011-12-22T11:03:51Z" yahoo:lang="en-US">
    <diagnostics>
        <publiclyCallable>true</publiclyCallable>
        <url execution-start-time="2" execution-stop-time="370"
            execution-time="368" proxy="DEFAULT"><![CDATA[http://www.bbc.co.uk/programmes/b00vy3l1.xml]]></url>
        <user-time>572</user-time>
        <service-time>565</service-time>
        <build-version>24402</build-version>
    </diagnostics> 
    <results>
        <categories xmlns="urn:yahoo:cap">
            <yct_categories>
                <yct_category score="0.536">Business &amp; Economy</yct_category>
                <yct_category score="0.421652">Finance</yct_category>
                <yct_category score="0.418182">Finance/Investment &amp; Company Information</yct_category>
            </yct_categories>
        </categories>
        <entities xmlns="urn:yahoo:cap">
            <entity score="0.979564">
                <text end="57" endchar="57" start="48" startchar="48">Evan Davis</text>
                <wiki_url>http://en.wikipedia.com/wiki/Evan_Davis</wiki_url>
                <types>
                    <type region="us">/person</type>
                    <type region="us">/place/place_of_interest</type>
                    <type region="us">/place/us/town</type>
                </types>
                <related_entities>
                    <wikipedia>
                        <wiki_url>http://en.wikipedia.com/wiki/Don%27t_Tell_Mama</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Lenny_Dykstra</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Los_Angeles_Police_Department</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Today_%28BBC_Radio_4%29</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Chrisman,_Illinois</wiki_url>
                    </wikipedia>
                </related_entities>
            </entity>
            <entity score="0.734099">
                <text end="265" endchar="265" start="258" startchar="258">Facebook</text>
                <wiki_url>http://en.wikipedia.com/wiki/Facebook</wiki_url>
                <types>
                    <type region="us">/organization</type>
                    <type region="us">/organization/domain</type>
                </types>
                <related_entities>
                    <wikipedia>
                        <wiki_url>http://en.wikipedia.com/wiki/Mark_Zuckerberg</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Social_network_service</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Twitter</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Social_network</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Digital_Sky_Technologies</wiki_url>
                    </wikipedia>
                </related_entities>
            </entity>
            <entity score="0.674621">
                <text end="477" endchar="477" start="450" startchar="450">location, location, location</text>
            </entity>
            <entity score="0.651227">
                <text end="79" endchar="79" start="60" startchar="60">The Bottom Line cuts</text>
                <types>
                    <type region="us">/other/movie/movie_name</type>
                </types>
            </entity>
            <entity score="0.646818">
                <text end="799" endchar="799" start="789" startchar="789">Raghav Bahl</text>
                <wiki_url>http://en.wikipedia.com/wiki/Raghav_Bahl</wiki_url>
                <types>
                    <type region="us">/person</type>
                </types>
                <related_entities>
                    <wikipedia>
                        <wiki_url>http://en.wikipedia.com/wiki/Network_18</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Superpower</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Deng_Xiaoping</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/The_Amazing_Race</wiki_url>
                        <wiki_url>http://en.wikipedia.com/wiki/Hare</wiki_url>
                    </wikipedia>
                </related_entities>
            </entity>
            <entity score="0.644349">
                <text end="144" endchar="144" start="133" startchar="133">clearer view</text>
            </entity>
            <entity score="0.54609">
                <text end="675" endchar="675" start="665" startchar="665">Chris Grigg</text>
                <types>
                    <type region="us">/person</type>
                </types>
            </entity>
        </entities>
    </results>
</query>

So, some success in pulling out person names, and limited success on company names. The subject categories look reasonably appropriate too.

[UPDATE: I should have run the desc contentanalysis.analyze query before publishing this post to pull up the docs/examples… As well as the where text= argument, there is a where url= argument that will pul back semantic information about a URL. Running the query over the OU homepage, for example, using select * from contentanalysis.analyze where url=”http://www.open.ac.uk” identifies the OU as an organisation, with links out to Wikipedia, as well as geo-information and a Yahoo woe_id.]

Another related service in this area that I haven’t really explored yet is TSO’s Data Enrichment Service (API).

Here’s how it copes with the same programme synposis:

Pretty good… and links in to dbpedia (better for machine readability) compared to the Wikipedia links that the Yahoo service offers.

For completeness, here’s what the Reuters Open Calais service comes up with:

The best of the bunch on this sample of one, I think, albeit admittedly in the domain the Reuters focus on?

But so what…? What are these services good for? Automatic metadata generation/extraction is one thing, as I’ve demonstrated in Visualising OU Academic Participation with the BBC’s “In Our Time”, where I generated a quick visualisation that showed the sorts of topics that OU academics had talked about as guests on Melvyn Bragg’s In Our Time, along with the topics that other universities had been engaged with on that programme.

The rise of local media sales partnerships and 19 other recent hyper-local developments you may have missed

1 Reply

In this guest post Ofcom’s Damian Radcliffe cross-publishes his latest presentation on developments in hyperlocal publishing for September-October, and highlights how partnerships are increasingly important for hyper-local, regional and national media in terms of “making it pay”.

When producing my latest bi-monthly update on hyper-local media, I was struck by the fact that media sales partnerships suddenly seem to be all the rage.

In a challenging economic climate, a number of media providers – both big and small – have recently come together to announce initiatives aimed at maximising economies of scale and potentially reducing overheads.

At a hyperlocal level, the launch on 1^st November of the Chicago Independent Advertising Network (CIAN), saw 15 Chicago community news sites coming together to offer a single point of contact for advertisers. These sites “collectively serve more than 1 million page views each month.”

This initiative follows in the footsteps of other small scale advertising alliances including the Seattle Indie Ad Network and Boston Blogs.

These moves – bringing together a range of small scale location based websites – can help address concerns that hyper-local sites are not big enough (on their own) to unlock funding from large advertisers.

CIAN also aims to address a further hyper-local concern: that of sales skills. Rather than having a hyperlocal practitioner add media sales to an ever expanding list of duties, funding from the Chicago Community Trust and the Knight Community Information Challenge allows for a full-time salesperson.

Big Media is also getting in on this act.

In early November Microsoft, Yahoo! and AOL agreed to sell each other’s unsold display ads. The move is a response to Google and Facebook’s increasing clout in this space.

Reuters reported that both Facebook and Google are expected to increase their share of online display advertising in the United States in 2011 by 9.3% and 16.3%.

In contrast, AOL, Microsoft and Yahoo are forecast to lose share, with Facebook expected to surpass Yahoo for the first time.

Similarly in the UK, DMGT’s Northcliffe Media, home to 113 regional newspapers, recently announced it was forging a joint partnership with Trinity Mirror’s regional sales house, AMRA.

This will create a commercial proposition encompassing over 260 titles, including nine of the UK’s 10 biggest regional paid-for titles. Like The Microsoft, Yahoo! and AOL arrangement, this new partnership comes into effect in 2012.

These examples all offer opportunities for economies of scale for media outlets and potentially larger potential reach and impact for advertisers. Given these benefits, I wouldn’t be surprised if we didn’t see more of these types of partnership in the coming months and years.

Damian Radcliffe is writing in a personal capacity.

Other topics in his current hyperlocal slides include Sky’s local pilot in NE England and research into the links between tablet useand local news consumption. As ever, feedback and suggestions for future editions are welcome.

Hyper-local Update: Sept-Oct 2011

View more presentations from Damian Radcliffe

More Dabblings With Local Sentencing Data

In Accessing and Visualising Sentencing Data for Local Courts I posted a couple of quick ways in to playing with Ministry of Justice sentencing data for the period July 2010-June 2011 at the local court level. At the end of the post, I wondered about how to wrangle the data in R so that I could look at percentage-wise comparisons between different factors (Age, gender) and offence type and mentioned that I’d posted a related question to to the Cross Validated/Stats Exchange site (Casting multidimensional data in R into a data frame).

Courtesy of Chase, I have an answer🙂 So let’s see how it plays out…

To start, let’s just load the Isle of Wight court sentencing data into RStudio:

require(ggplot2) require(reshape2) iw = read.csv("http://dl.dropbox.com/u/1156404/wightCrimRecords.csv")

Now we’re going to shape the data so that we can plot the percentage of each offence type by gender (limited to Male and Female options):

iw.m = melt(iw, id.vars = "sex", measure.vars = "Offence_type") iw.sex = ddply(iw.m, "sex", function(x) as.data.frame(prop.table(table(x$value)))) ggplot(subset(iw.sex,sex=='Female'|sex=='Male')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~sex)+ opts(axis.text.x=theme_text(angle=-90)) + xlab('Offence Type')

Here’s the result:

We can also process the data over a couple of variables. So for example, we can look to see how female recorded sentences break down by offence type and age range, displaying the results as a percentage of how often each offence type on its own was recorded by age:

iw.m2 = melt(iw, id.vars = c("sex","Offence_type" ), measure.vars = "AGE") iw.off=ddply(iw.m2, c("sex","Offence_type"), function(x) as.data.frame(prop.table(table(x$value))))

ggplot(subset(iw.off,sex=='Female')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~Offence_type) + opts(axis.text.x=theme_text(angle=-90)) + xlab('Age Range (Female)')

Note that this graphic may actually be a little misleading because percentage based reports donlt play well with small numbers…: whilst there are multiple Driving Offences recorded, there are only two Burglaries, so the statistical distribution of convicted female burglars is based over a population of size two… A count would be a better way of showing this

PS I was hoping to be able to just transmute the variables and generate a raft of other charts, but I seem to be getting an error, maybe because some rows are missing? So: anyone know where I’m supposed to post R library bug reports?