Monthly Archives: January 2012

Different Speeches? Digital Skills Aren’t just About Coding…

Secretary of State for Education, Michael Gove, gave a speech yesterday on rethinking the ICT curriculum in UK schools. You can read a copy of the speech variously on the Department for Education website, or, err, on the Guardian website.

Seeing these two copies of what is apparently the same speech, I started wondering:

a) which is the “best” source to reference?
b) how come the Guardian doesn’t add a disclaimer about the provenance of, and link, to the DfE version? [Note the disclaimer in the DfE version – “Please note that the text below may not always reflect the exact words used by the speaker.”]
c) is the Guardian version an actual transcript, maybe? That is, does the Guardian reprint the “exact words” used by the speaker?

And that made me think I should do a diff… About which, more below…

Before that, however, here’s a quick piece of reflection on how these two things – the reinvention of the the IT curriculum, and the provenance of, and value added to, content published on news and tech industry blog sites – collide in my mind…

So for example, I’ve been pondering what the role of journalism is, lately, in part because I’m trying to clarify in my own mind what I think the practice and role of data journalism are (maybe I should apply for a Nieman-Berkman Fellowship in Journalism Innovation to work on this properly?!). It seems to me that “communication” is one important part (raising awareness of particular issues, events, or decisions), and holding governments and companies to account is another. (Actually, I think Paul Bradshaw has called me out on that, before, suggesting it was more to do with providing an evidence base through verification and triangulation, as well as comment, against which governments and companies could be held to account (err, I think? As an unjournalist, I don’t have notes or a verbatim quote against which to check that statement, and I’m too lazy to email/DM/phone Paul to clarify what he may or may not have said…(The extent of my checking is typically limited to what I can find on the web or in personal archives…which appear to be lacking on this point…))

Another thing I’ve been mulling over recently in a couple of contexts relates to the notion of what are variously referred to as digital or information skills.

The first context is “data journalism”, and the extent to which data journalists need to be able to do programming (in the sense of identifying the steps in a process that can be automated and how they should be sequenced or organised) versus writing code. (I can’t write code for toffee, but I can read it well enough to copy, paste and change bits that other people have written. That is, I can appropriate and reuse other people’s code, but can’t write it from scratch very well… Partly because I can’t ever remember the syntax and low level function names. I can also use tools such as Yahoo Pipes and Google Refine to do coding like things…) Then there’s the question of what to call things like URL hacking or (search engine) query building?

The second context is geeky computer techie stuff in schools, the sort of thing covered by Michael Gove’s speech at the BETT show on the national ICT curriculum (or lack thereof), and about which the educational digerati were all over on Twitter yesterday. Over the weekend, houseclearing my way through various “archives”, I came across all manner of press clippings from 2000-2005 or so about the activities of the OU Robotics Outreach Group, of which I was a co-founder (the web presence has only recently been shut down, in part because of the retirement of the sys admin on whose server the websites resided.) This group ran an annual open meeting every November for several years hosting talks from the educational robotics community in the UK (from primary school to HE level). The group also co-ordinated the RoboCup Junior competition in the UK, ran outreach events, developed various support materials and activities for use with Lego Mindstorms, and led the EPSRC/AHRC Creative Robotics Research Network.

At every robotics event, we’d try to involve kids and/or adults in elements of problem solving, mechanical design, programming (not really coding…) based around some sort of themed challenge: a robot fashion show, for example, or a treasure hunt (both variants on edge following/line following;-) Or a robot rescue mission, as used in a day long activity in the “Engineering: An Active Introduction” (TXR120) OU residential school, or the 3 hour “Robot Theme Park” team building activity in the Masters level “Team Engineering” (T885) weekend school. [If you’re interested, we may be able to take bookings to run these events at your institution. We can make them work at a variety of difficulty levels from KS3-4 and up;-)]

Given that working at the bits-atoms interface is where the a lot of the not-purely-theoretical-or-hardcore-engineering innovation and application development is likely to take place over the next few years, any mandate to drop the “boring” Windows training ICT stuff in favour of programming (which I suspect can be taught in not only a really tedious way, but a really confusing and badly delivered way too) is probably Not the Best Plan.

Slightly better, and something that I know is currently being mooted for reigniting interest in computing, is the Raspberry Pi, a cheap, self-contained, programmable computer on a board (good for British industry, just like the BBC Micro was…;-) that allows you to work at the interface between the real world of atoms and the virtual world of bits that exists inside the computer. (See also things like the OU Senseboard, as used on the OU course “My Digital Life” (TU100).)

If schools were actually being encouraged to make a financial investment on a par with the level of investment around the introduction of the BBC Micro, back in the day, I’d suggest a 3D printer would have more of the wow factor…(I’ll doodle more on the rationale behind this in another post…) The financial climate may not allow for that (but I bet budget will manage to get spent anyway…) but whatever the case, I think Gove needs to be wary about consigning kids to lessons of coding hell. And maybe take a look at programming in a wider creative context, such as robotics (the word “robotics” is one of the reason why I think it’s seen as a very specialised, niche subject; we need a better phrase, such as “Creative Technologies”, which could combine elements of robotics, games programming, photoshop, and, yex, Powerpoint too… Hmm… thinks.. the OU has a couple of courses that have just come to the end of their life that between them provide a couple of hundred hours of content and activity on robotics (T184) and games programming (T151), and that we delivered, in part, to 6th formers under the OU’s Young Applicants in Schools Scheme.

Anyway, that’s all as maybe… Because there are plenty of digital skills that let you do coding like things without having to write code. Such as finding out whether there are any differences between the text in the DfE copy of Gove’s BETT speech, and the Guardian copy.

Copy the text from each page into a separate text file, and save it. (You’ll need a text editor for that..) Then, if you haven’t already got one, find yourself a good text editor. I use Text Wrangler on a Mac. (Actually, I think MS Word may have a diff function?)

FInding diffs between txt doccs in Text Wrangler

The difference’s all tend to be in the characters used for quotation marks (character encodings are one of the things that can make all sorts of programmes fall over, or misbehave. Just being aware that they may cause a problem, as well as how and why, would be a great step in improving the baseline level understanding of folk IT. Some of the line breaks don’t quite match up either, but other than that, the text is the same.

Now, this may be because Gove was a good little minister and read out the words exactly as they had been prepared. Or it may be the case that the Guardian just reprinted the speech without mentioning provenance, or the disclaimer that he may not actually have read the words of that speech (I have vague memories of an episode of Yes, Minister, here…;-)

Whatever the case, if you know: a) that it’s even possible to compare two documents to see if they are different (a handy piece of folk IT knowledge); and b) know a tool that does it (or how to find a tool that does it, or a person that may have a tool that can do it), then you can compare the texts for yourself. And along the way, maybe learn that churnalism, in a variety of forms, is endemic in the media. Or maybe just demonstrate to yourself when the media is acting in a purely comms, rather than journalistic, role?

PS other phrases in the area: “computational thinking”. Hear, for example: A conversation with Jeannette Wing about computational thinking

PPS I just remembered – there’s a data journalism hook around this story too… from a tweet exchange last night that I was reminded of by an RT:

josiefraser: RT @grmcall: Of the 28,000 new teachers last year in the UK, 3 had a computer-related degree. Not 3000, just 3.
dlivingstone: @josiefraser Source??? Not found it yet. RT @grmcall: 28000 new UK teachers last year, 3 had a computer-related degree. Not 3000, just 3
josiefraser: That ICT qualification teacher stat RT @grmcall: Source is the Guardian http://www.guardian.co.uk/education/2012/jan/09/computer-studies-in-schools

I did a little digging and found the following document on the General Teaching Council of England website – Annual digest of statistics 2010–11 – Profiles of registered teachers in England [PDF] – that contains demographic stats, amongst others, for UK teachers. But no stats relating to subject areas of degree level qualifications held, which is presumably the data referred to in the tweet. So I’m thinking: this is partly where the role of data journalist comes in… They may not be able to verify the numbers by checking independent sources, but they may be able to shed some light on where the numbers came from and how they were arrived at, and maybe even secure their release (albeit as a single point source?)

The test of data journalism: checking the claims of lobbyists via government

Day 341 - Pull The Wool Over My Eyes - image by Simon James

Day 341 - Pull The Wool Over My Eyes - image by Simon James

While the public image of data journalism tends to revolve around big data dumps and headline-grabbing leaks, there is a more important day-to-day application of data skills: scrutinising the claims regularly made in support of spending public money.

I’m blogging about this now because I recently came across a particularly good illustration of politicians being dazzled by numbers from lobbyists (that journalists should be checking) in this article by Simon Jenkins, from which I’ll quote at length:

“This government, so draconian towards spending in public, is proving as casual towards dodgy money in private as were Tony Blair and Gordon Brown. Earlier this month the Olympics boss, Lord Coe, moseyed into Downing Street and said that his opening and closing ceremonies were looking a bit mean at £40m. Could he double it to £81m for more tinsel? Rather than scream and kick him downstairs, David Cameron said: my dear chap, but of course. I wonder what the prime minister would have said if his lordship had been asking for a care home, a library or a clinic.

“Much of the trouble comes down to the inexperience of ingenue ministers, and their susceptibility to the pestilence of lobbying now infecting Westminster. On this occasion the hapless Olympics minister, Hugh Robertson, claimed that the extra £41m was “worth £2-5bn in advertising revenue alone”, a rate of return so fanciful as to suggest a lobbyist’s lunch beyond all imagining. Robertson also claimed to need another £271m for games security (not to mention 10,000 troops, warships and surface-to-air missiles), despite it being “not in response to any specific security threat”. It was just money.

“This was merely the climax of naivety. In their first month in office, ministers were told – and believed – that it would be “more expensive” to cancel two new aircraft carriers than to build them. Ministers were told it would cost £2bn to cancel Labour’s crazy NHS computer rather than dump it in the nearest skip. Chris Huhne, darling of the renewables industry, wants to give it £8bn a year to rescue the planet, one of the quickest ways of transferring money from poor consumer to rich landowner yet found. The chancellor, George Osborne, was told by lobbyists he could save £3bn a year by giving away commercial planning permissions. All this was statistical rubbish.

“If local government behaved as credulously as Whitehall it would be summoned before the audit commission and subject to surcharge.”

And if you want to keep an eye on such claims, try a Google News search like this one.

The problem with defining ‘a journalist’

Cleland Thom writes in Press Gazette today about the list of requirements specified by an Oregon judge before a person could claim protection as a journalist in his court.

  1. Journalism education.
  2. Credentials or proof of any affiliation with any recognized news entity.
  3. Proof of adherence to journalistic standards such as editing, fact-checking, or disclosures of conflicts of interest.
  4. Keeping notes of conversations and interviews conducted.
  5. Mutual understanding or agreement of confidentiality between the defendant and his/her sources.
  6. Creation of an independent product rather than assembling writings and postings of others.
  7. Contacting “the other side” to get both sides of a story.

This seems a reasonable enough list of criteria – I’m interpreting the phrasing of the judge’s opinion as indicating that any single of these criteria would suit, rather than all 7 (as is the case in the Reynolds defence mentioned by Thom).

But I think there’s a broader problem (unrelated to the specific case in Oregon, which was about a protection from being sued for libel only afforded to journalists) with trying to certify individuals as journalists when more  journalism is done collaboratively. If, for example, one person researches the regulations relating to an issue, another FOIs key documents; a third speaks to a victim; a fourth speaks to an expert; a fifth to the person resposible; and a sixth writes it all up into a coherent narrative – which one is the journalist?

20 free ebooks on journalism (for your Xmas Kindle)

For some reason there are two versions of this post on the site – please check the more up to date version here.

20 free ebooks on journalism (for your Xmas Kindle) {updated to 65}

Journalism 2.0 cover

As many readers of this blog will have received a Kindle for Christmas I thought I should share my list of the free ebooks that I recommend stocking up on.

Online journalism and multimedia ebooks

Starting with more general books, Mark Briggs‘s book Journalism 2.0 (PDF*) is a few years old but still provides a good overview of online journalism to have by your side. Mindy McAdams‘s 42-page Reporter’s Guide to Multimedia Proficiency (PDF) adds some more on that front, and Adam Westbrook‘s Ideas on Digital Storytelling and Publishing (PDF) provides a larger focus on narrative, editing and other elements.

After the first version of this post, MA Online Journalism student Franzi Baehrle suggested this free book on DSLR Cinematography, as well as Adam Westbrook on multimedia production (PDF). And Guy Degen recommends the free ebook on news and documentary filmmaking from ImageJunkies.com.

The Participatory Documentary Cookbook [PDF] is another free resource on using social media in documentaries.

A free ebook on blogging can be downloaded from Guardian Students when you register with the site, and Swedish Radio have produced this guide to Social Media for Journalists (in English).

The Traffic Factories is an ebook that explores how a number of prominent US news organisations use metrics, and Chartbeat’s role in that. You can download it in mobi, PDF or epub format here.

Continue reading

Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grabbed Using Google Refine

What do my Facebook friends have in common in terms of the things they have Liked, or in terms of their music or movie preferences? (And does this say anything about me?!) Here’s a recipe for visualising that data…

After discovering via Martin Hawksey that the recent (December, 2011) 2.5 release of Google Refine allows you to import JSON and XML feeds to bootstrap a new project, I wondered whether it would be able to pull in data from the Facebook API if I was logged in to Facebook (Google Refine does run in the browser after all…)

Looking through the Facebook API documentation whilst logged in to Facebook, it’s easy enough to find exemplar links to things like your friends list (https://graph.facebook.com/me/friends?access_token=A_LONG_JUMBLE_OF_LETTERS) or the list of likes someone has made (https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS); replacing me with the Facebook ID of one of your friends should pull down a list of their friends, or likes, etc.

(Note that validity of the access token is time limited, so you can’t grab a copy of the access token and hope to use the same one day after day.)

Grabbing the link to your friends on Facebook is simply a case of opening a new project, choosing to get the data from a Web Address, and then pasting in the friends list URL:

Google Refine - import Facebook friends list

Click on next, and Google Refine will download the data, which you can then parse as a JSON file, and from which you can identify individual record types:

Google Refine - import Facebook friends

If you click the highlighted selection, you should see the data that will be used to create your project:

Google Refine - click to view the data

You can now click on Create Project to start working on the data – the first thing I do is tidy up the column names:

Google Refine - rename columns

We can now work some magic – such as pulling in the Likes our friends have made. To do this, we need to create the URL for each friend’s Likes using their Facebook ID, and then pull the data down. We can use Google Refine to harvest this data for us by creating a new column containing the data pulled in from a URL built around the value of each cell in another column:

Google Refine - new column from URL

The Likes URL has the form https://graph.facebook.com/me/likes?access_token=A_LONG_JUMBLE_OF_LETTERS which we’ll tinker with as follows:

Google Refine - crafting URLs for new column creation

The throttle control tells Refine how often to make each call. I set this to 500ms (that is, half a second), so it takes a few minutes to pull in my couple of hundred or so friends (I don’t use Facebook a lot;-). I’m not sure what limit the Facebook API is happy with (if you hit it too fast (i.e. set the throttle time too low), you may find the Facebook API stops returning data to you for a cooling down period…)?

Having imported the data, you should find a new column:

Google Refine - new data imported

At this point, it is possible to generate a new column from each of the records/Likes in the imported data… in theory (or maybe not..). I found this caused Refine to hang though, so instead I exprted the data using the default Templating… export format, which produces some sort of JSON output…

I then used this Python script to generate a two column data file where each row contained a (new) unique identifier for each friend and the name of one of their likes:

import simplejson,csv

writer=csv.writer(open('fbliketest.csv','wb+'),quoting=csv.QUOTE_ALL)

fn='my-fb-friends-likes.txt'

data = simplejson.load(open(fn,'r'))
id=0
for d in data['rows']:
	id=id+1
	#'interests' is the column name containing the Likes data
	interests=simplejson.loads(d['interests'])
	for i in interests['data']:
		print str(id),i['name'],i['category']
		writer.writerow([str(id),i['name'].encode('ascii','ignore')])

[I think this R script, in answer to a related @mhawksey Stack Overflow question, also does the trick: R: Building a list from matching values in a data.frame]

I could then import this data into Gephi and use it to generate a network diagram of what they commonly liked:

Sketching common likes amongst my facebook friends

Rather than returning Likes, I could equally have pulled back lists of the movies, music or books they like, their own friends lists (permissions settings allowing), etc etc, and then generated friends’ interest maps on that basis.

[See also: Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I and how to visualise Google+ networks]

PS dropping out of Google Refine and into a Python script is a bit clunky, I have to admit. What would be nice would be to be able to do something like a “create new rows with new column from column” pattern that would let you set up an iterator through the contents of each of the cells in the column you want to generate the new column from, and for each pass of the iterator: 1) duplicate the original data row to create a new row; 2) add a new column; 3) populate the cell with the contents of the current iteration state. Or something like that…

PPS Related to the PS request, there is a sort of related feature in the 2.5 release of Google Refine that lets you merge data from across rows with a common key into a newly shaped data set: Key/value Columnize. Seeing this, it got me wondering what a fusion of Google Refine and RStudio might be like (or even just R support within Google Refine?)

PPPS this could be interesting – looks like you can test to see if a friendship exists given two Facebook user IDs.

2011: the UK hyper-local year in review

In this guest post, Damian Radcliffe highlights some topline developments in the hyper-local space during 2011. He also asks for your suggestions of great hyper-local content from 2011. His more detailed slides looking at the previous year are cross-posted at the bottom of this article.

2011 was a busy year across the hyper-local sphere, with a flurry of activity online as well as more traditional platforms such as TV, Radio and newspapers.

The Government’s plans for Local TV have been considerably developed, following the Shott Review just over a year ago. We now have a clearer indication of the areas which will be first on the list for these new services and how Ofcom might award these licences. What we don’t know is who will apply for these licences, or what their business models will be. But, this should become clear in the second half of the year.

Whilst the Leveson Inquiry hasn’t directly been looking at local media, it has been a part of the debate. Claire Enders outlined some of the challenges facing the regional and local press in a presentation showing declining revenue, jobs and advertising over the past five years. Her research suggests that the impact of “the move to digital” has been greater at a local level than at the nationals.

Across the board, funding remains a challenge for many. But new models are emerging, with Daily Deals starting to form part of the revenue mix alongside money from foundations and franchising.

And on the content front, we saw Jeremy Hunt cite a number of hyper-local examples at the Oxford Media Convention, as well as record coverage for regional press and many hyper-local outlets as a result of the summer riots.

I’ve included more on all of these stories in my personal retrospective for the past year.

One area where I’d really welcome feedback is examples of hyper-local content you produced – or read – in 2011. I’m conscious that a lot of great material may not necessarily reach a wider audience, so do post your suggestions below and hopefully we can begin to redress that.

Mapping the New Year Honours List – Where Did the Honours Go?

When I get a chance, I’ll post a (not totally unsympathetic) response to Milo Yiannopoulos’post The pitiful cult of ‘data journalism’, but in the meantime, here’s a view over some data that was released a couple of days ago – a map of where the New Year Honours went [link]

New Year Honours map

[Hmm… so WordPress.com doesn’t seem to want to let me embed a Google Fusion Table map iframe, and Google Maps (which are embeddable) just shows an empty folder when I try to view the Fusion Table KML… (the Fusion Table export KML doesn’t seem to include lat/lng data either? Maybe I need to explore some hosting elsewhere this year…]

Note that I wouldn’t make the claim that this represents an example of data journalism. It’s a sketch map showing which parts of the country various recipients of honours this time round presumably live. Just by posting the map, I’m not reporting any particular story. Instead, I’m trying to find a way of looking at the day to see whether or not there may be any interesting stories that are suggested by viewing the data in this way.

There was a small element of work involved in generating the map view, though… Working backwards, when I used Google Fusion tables to geocode the locations of the honoured, some of the points were incorrectly located:

Google Fusion Tables - correcting fault geocoding

(It would be nice to be able to force a locale to the geocoder, maybe telling it to use maps.google.co.uk as the base, rather than (presumably) maps.google.com?)

The approach I took to tidying these was rather clunky, first going into the table view and filtering on the mispositioned locations:

Google Fusion Tables - correcting geocoding errors

Then correcting them:

Google Fusion Table, Correct Geocode errors

What would be really handy would be if Google Fusion Tables let you see a tabular view of data within a particular map view – so for example, if I could zoom in to the US map and then get a tabular view of the records displayed on that particular local map view… (If it does already support this and I just missed it, please let me know via the comments..;-)

So how did I get the data into Google Fusion Tables? The original data was posted as a PDF on the DirectGov website (New Year Honours List 2012 – in detail)…:

New Year Honours data

…so I used Scraperwiki to preview and read through the PDF and extract the honours list data (my scraper is a little clunky and doesnlt pull out 100% of the data, missing the occasional name and contribution details when it’s split over several lines; but I think it does a reasonable enough job for now, particularly as I am currently more interested in focussing on the possible high level process for extracting and manipulating the data, rather than the correctness of it…!;-)

Here’s the scraper (feel free to improve upon it….:-): Scraperwiki: New Year Honours 2012

I then did a little bit of tweaking in Google Refine, normalising some of the facets and crudely attempting to separate out each person’s role and the contribution for which the award was made.

For example, in the case of Dr Glenis Carole Basiro DAVEY, given column data of the form “The Open University, Science Faculty and Health Education and Training Programme, Africa. For services to Higher and Health Education.“, we can use the following expressions to generate new sub-columns:

value.match(/.*(For .*)/)[0] to pull out things like “For services to Higher and Health Education.”
value.match(/(.*)For .*/)[0] to pull out things like “The Open University, Science Faculty and Health Education and Training Programme, Africa.”

I also ran each person’s record through Reuters Open Calais service using Google Refine’s ability to augment data with data from a URL (“Add column by fetching URLs”), pulling the data back as JSON. Here’s the URL format I used (polling once every 500ms in order to stay with the max. 4 calls per limit threshold mandated by the API.)

"http://api.opencalais.com/enlighten/rest/?licenseID=<strong>MY_LICENSE_KEY</strong>&content=" + escape(value,'url') + "&paramsXML=%3Cc%3Aparams%20xmlns%3Ac%3D%22http%3A%2F%2Fs.opencalais.com%2F1%2Fpred%2F%22%20xmlns%3Ardf%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%3E%20%20%3Cc%3AprocessingDirectives%20c%3AcontentType%3D%22TEXT%2FRAW%22%20c%3AoutputFormat%3D%22Application%2FJSON%22%20%20%3E%20%20%3C%2Fc%3AprocessingDirectives%3E%20%20%3Cc%3AuserDirectives%3E%20%20%3C%2Fc%3AuserDirectives%3E%20%20%3Cc%3AexternalMetadata%3E%20%20%3C%2Fc%3AexternalMetadata%3E%20%20%3C%2Fc%3Aparams%3E"

Unpicking this a little:

licenseID is set to my license key value
content is the URL escaped version of the text I wanted to process (in this case, I created a new column from the name column that also pulled in data from a second column (the contribution column). The GREL formula I used to join the columns took the form: value+', '+cells["contribution"].value)
paramsXML is the URL encoded version of the following parameters, which set the content encoding for the result to be JSON (the default is XML):

<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<c:processingDirectives c:contentType="TEXT/RAW" c:outputFormat="Application/JSON"  >
</c:processingDirectives>
<c:userDirectives>
</c:userDirectives>
<c:externalMetadata>
</c:externalMetadata>
</c:params>

So much for process – now where are the stories? That’s left, for now, as an exercise for the reader. An obvious starting point is just to see who received honours in your locale. Remember, Google Fusion Tables lets you generate all sorts of filtered views, so it’s not too hard to map where the MBEs vs OBEs are based, for example, or have a stab at where awards relating to services to Higher Education went. Some awards also have a high correspondence with a particular location, as for example in the case of Enfield…

If you do generate any interesting views from the New Year Honours 2012 Fusion Table, please post a link in the comments. And if you find a problem with/fix for the data or the scraper, please post that info in a comment too:-)