Monthly Archives: April 2011

A First Quick Viz of UK University Fees

Regular readers will know how I do quite like to dabble with visual analysis, so here are a couple of doodles with some of the university fees data that is starting to appear.

The data set I’m using is a partial one, taken from the Guardian Datastore: Tuition fees 2012: what are the universities charging?. (If you know where there’s a full list of UK course fees data by HEI and course, please let me know in a comment below, or even better, via an answer to this Where’s the fees data? question on GetTheData.)

My first thought was to go for a proportional symbol map. (Does anyone know of a javascript library that can generate proportional symbol overlays on a Google Map or similar, even better if it can trivially pull in data from a Google spreadsheet via the Google visualisation? I have an old hack (supermarket catchment areas), but there must be something nicer to use by now, surely? [UPDATE: ah – forgot this: Polymaps])

In the end, I took the easy way out, and opted for Geocommons. I downloaded the data from the Guardian datastore, and tidied it up a little in Google Refine, removing non-numerical entries (including ranges, such 4,500-6,000) in the Fees column and replacing them with minumum fee values. Sorting the fees column as a numerical type with errors at the top made the columns that needed tweaking easy to find:

The Guardian data included an address column, which I thought Geocommons should be able to cope with. It didn’t seem to work out for me though (I’m sure I checked the UK territory, but only seemed to get US geocodings?) so in the end I used a trick posted to the OnlineJournalism blog to geocode the addresses (Getting full addresses for data from an FOI response (using APIs); rather than use the value.parseJson().results[0].formatted_address construct, I generated a couple of columns from the JSON results column using value.parseJson().results[0].geometry.location.lng and value.parseJson().results[0]

Uploading the data to Geocommons and clicking where prompted, it was quite easy to generate this map of the fees to date:

Anyone know if there’s a way of choosing the order of fields in the pop-up info box? And maybe even a way of selecting which ones to display? Or do I have to generate a custom dataset and then create a map over that?

What I had hoped to be able to do was use coloured proportional symbols to generate a two dimensional data plot, e.g. comparing fees with drop out rates, but Geocommons doesn’t seem to support that (yet?). It would also be nice to have an interactive map where the user could select which numerical value(s) are displayed, but again, I missed that option if it’s there…

The second thing I thought I’d try would be an interactive scatterplot on Many Eyes. Here’s one view that I thought might identify what sort of return on value you might get for you course fee…;-)

Click thru’ to have a play with the chart yourself;-)

PS I can;t not say this, really – you’ve let me down again, @datastore folks…. where’s a university ID column using some sort of standard identifier for each university? I know you have them, because they’re in the Rosetta sheet… although that is lacking a HESA INST-ID column, which might be handy in certain situations… 😉 [UPDATE – apparently, HESA codes are in the spreadsheet…. ;-0]

PPS Hmm… that Rosetta sheet got me thinking – what identifier scheme does the JISC MU API use?

PPPS If you’re looking for a degree, why not give the Course Detective search engine a go? It searches over as many of the UK university online prospectus web pages that we could find and offer up as a sacrifice to a Google Custom search engine 😉

Twitter & DataSift launch live social data services for under £1 (useful)

Journalists with an interest in realtime data should keep an eye on a forthcoming service from DataSift which promises to allow users to access a feed of Twitter tweets filtered along any combination of over 40 qualities.

In addition – and perhaps more interestingly – the service will also offer extra context:

“from services including Klout (influence metrics), PeerIndex (influence), Qwerly (linked social media accounts) and Lexalytics (text and sentiment analysis). Storage, post-processing and historical snapshots will also be available.”

The pricing puts this well within the reach of not only professional journalists but student ones too: for less than 20p per hour (30 cents) you will be able to apply as many as 10,000 keyword filters.

ReadWriteWeb describe a good example of how this may work out journalistically:

“Want a feed of negative Tweets written by C-level execs about any of 10,000 keywords? Trivial! Basic level service, Halstead says! Want just the Tweets that fit those criteria and are from the North Eastern United States? That you’ll have to pay a little extra for.”

The Charlie Sheen Twitter intern hoax – how it could be avoided

Hoax email Charlie Sheen

image from JonnyCampbell

Various parts of the media were hoaxed this week by Belfast student Jonny Campbell’s claim to have won a Twitter internship with Charlie Sheen. The hoax was well planned, and to be fair to the journalists, they did chase up documentation to confirm it. Where they made mistakes provides a good lesson in online verification.

This post is a duplicate version – see the original in full here.

Blocking content sites by ‘self-regulation’ – a recipe for easy censorship

At the start of this month I said that journalists were failing to “protect the public sphere”. Well, here’s just one example of this in action that we need to be watching.

Ed Vaizey, Minister for Culture, Communications and Creative Industries, has confirmed to the Open Rights Group “that discussion are ongoing between rights-holders and Internet Service Providers about ‘self-regulatory’ site-blocking measures.”

For journalists any move in this direction should be particularly concerning, as it provides a non-legal avenue (i.e. without due process) for anyone to suppress information they don’t like.

The point is not blocking sites, but the ease with which it might be done. If distribution van drivers ‘self-regulated’ to stop delivering newspapers whenever anyone complained, publishers and journalists would have a problem. An avenue to appeal doesn’t solve it, because by then the editorial moment will likely have passed – not to mention the extra costs it incurs for content producers.

Here are some precedents from elsewhere:

If you want to write to your MP, you can do so here.

Communities of practice: teaching students to learn in networks

One of the problems in teaching online journalism is that what you teach today may be out of date by the time the student graduates.

This is not just a technological problem (current services stop running; new ones emerge that you haven’t taught; new versions of languages and software are released) but also a problem of medium: genres such as audio slideshows, mapping, mashups, infographics and liveblogging have yet to settle down into an established ‘formula’.

In short, I don’t believe it’s wise to simply ‘teach online journalism’. You have to combine basic principles as they are now with an understanding of how to continue to learn the medium as it develops.

This year I set MA Online Journalism students at Birmingham City University an assignment which attempts to do this.

It’s called ‘Communities of Practice’ (the brief is here). The results are in, and they are very encouraging. Here’s what emerged:

Continue reading

The Charlie Sheen Twitter intern hoax – how it could be avoided

Jonny Campbell's Charlie Sheen internship hoax

Image from

Various parts of the media were hoaxed this week by Belfast student Jonny Campbell’s claim to have won a Twitter internship with Charlie Sheen. The hoax was well planned, and to be fair to the journalists, they did chase up documentation to confirm it. Where they made mistakes provides a good lesson in online verification.

Where did the journalist go wrong? They asked for the emails confirming the internship, but accepted a screengrab. This turned out to be photoshopped.

They then asked for further emails from earlier in the process, and he sent those (which were genuine) on.

They should have asked the source to forward the original email.

Of course, he could have faked that pretty easily as well (I’m not going to say how here), so you would need to check the IP address of the email against that of the company it was supposed to be from.

An IP address is basically the location of a computer (server). This may be owned by the ISP you are using, or the company which employs you and provides your computer and internet access.

This post explains how to find IP addresses in an email using email clients including Gmail, Yahoo! Mail and Outlook – and then how to track the IP address to a particular location.

This website will find out the IP address for a particular website – the IP address for is, for example. So you’re looking for a match (assuming the same server is used for mail). You could also check other emails from that company to other people, or ideally to yourself (Watch out for fake websites as well, of course).

And of course, finally, it’s always worth looking at the content the hoaxer has provided and clues that they may have left in it – as Jonny did (see image, left).

For more on verifying online information see Content, context and code: verifying information online, which I’ll continue to update with examples.