Monthly Archives: May 2012

Now available by distance learning: my MA in Online Journalism

The MA in Online Journalism which I established at Birmingham City University in 2009 is now available via distance learning.

The MA in Online Journalism by distance learning is primarily aimed at people who are already working in a content- or technology-related role.

Students can use their current work as part of their studies, or use their studies to explore ideas and skills that they have been unable to explore as part of their role.

The course requires self-discipline and motivation, and I look for evidence of that in the application process. You will be communicating regularly both with myself and other students on both the distance learning and ‘with attendance’ versions of the course, so there will be plenty of support, but like any Masters level course you will be expected to learn independently with guidance to develop your own areas of expertise.

I’ve actually been teaching the distance learning version of the course since last September, but hadn’t publicised the fact (I wanted to ‘soft-launch’ the first year with a small group first, and use agile principles to continue to develop it).

But now the secret’s out: The Guardian reported on the course last month, and student Robyn Bateman has written about her experience of studying via distance learning in Wannabe Hacks this week.

I’ll be blogging further about how the distance learning course has changed how I teach the MA as a whole, and changes in education more generally, but that’s for another post. In the meantime, I’m particularly welcoming applications from individuals with good experience as a working journalist, or as a web developer, or who are running or considering launching their own journalism enterprise.

Advertisements

Online journalism jobs – from the changing subeditor to the growth of data roles

The Guardian’s Open Door column today describes the changes to the subeditor’s role in a multiplatform age in some detail:

“A subeditor preparing an article for our website will, among other things, be expected to write headlines that are optimised for search engines so the article can be easily seen online, add keywords to make sure it appears in the right places on the website, create packages to direct readers to related articles, embed links, attach pictures, add videos and think about how the article will look when it is accessed on mobile phones and other digital platforms. Continue reading

F1 Championship Points as a d3.js Powered Sankey Diagram

d3.js crossed my path a couple of times yesterday: firstly, in the form of an enquiry about whether I’d be interested in writing a book on d3.js (I’m not sure I’m qualified: as I responded, I’m more of a script kiddie who sees things I can reuse, rather than have any understanding at all about how d3.js does what it does…); secondly, via a link to d3.js creator Mike Bostock’s new demo of Sankey diagrams built using d3.js:

Hmm… Sankey diagrams are good for visualising flow, so to get to grips myself with seeing if I could plug-and-play with the component, I needed an appropriate data set. F1 related data is usually my first thought as far as testbed data goes (no confidences to break, the STEM/innovation outreach/tech transfer context, etc etc) so what things flow in F1? What quantities are conserved whilst being passed between different classes of entity? How about points… points are awarded on a per race basis to drivers who are members of teams. It’s also a championship sport, run over several races. The individual Driver Championship is a competition between drivers to accumulate the most points over the course of the season, and the Constructor Chanmpionship is a battle between teams. Which suggests to me that a Sankey plot of points from races to drivers and then constructors might work?

So what do we need to do? First up, look at the source code for the demo using View Source. Here’s the relevant bit:

Data is being pulled in from a relatively addressed file, energy.json. Let’s see what it looks like:

Okay – a node list and an edge list. From previous experience, I know that there is a d3.js JSON exporter built into the Python networkx library, so maybe we can generate the data file from a network representation of the data in networkx?

Here we are: node_link_data(G) “[r]eturn data in node-link format that is suitable for JSON serialization and use in Javascript documents.”

Next step – getting the data. I’ve already done a demo of visualising F1 championship points sourced from the Ergast motor racing API as a treemap (but not blogged it? Hmmm…. must fix that) that draws on a JSON data feed constructed from data extracted from the Ergast API so I can clone that code and use it as the basis for constructing a directed graph that represents points allocations: race nodes are linked to driver nodes with edges weighted by points scored in that race, and driver nodes are connected to teams by edges weighted according to the total number of points the driver has earned so far. (Hmm, that gives me an idea for a better way of coding the weight for that edge…)

I don’t have time to blog the how to of the code right now – train and boat to catch – but will do so later. If you want to look at the code, it’s here: Ergast Championship nodelist. And here’s the result – F1 Chanpionship 2012 Points as a Sankey Diagram:

See what I mean about being a cut and paste script kiddie?!;-)

Inter-Council Payments and the Google Fusion Tables Network Graph

One of the great things about aggregating local spending data from different councils in the same place – such as on OpenlyLocal – is that you can start to explore structural relations in the way different public bodies of a similar type spend money with each other.

On the local spend with corporates scraper on Scraperwiki, which I set up to scrape how different councils spent money with particular suppliers, I realised I could also use the scraper to search for how councils spent money with other councils, by searching for suppliers containing phrases such as “district council” or “town council”. (We could also generate views to to see how councils wre spending money with different police authorities, for example.)

(The OpenlyLocal API doesn’t seem to work with the search, so I scraped the search results HTML pages instead. Results are paged, with 30 results per page, and what seems like a maximum of 1500 (50 pages) of results possible.)

The publicmesh table on the scraper captures spend going to a range of councils (not parish councils) from other councils. I also uploaded the data to Google Fusion tables (public mesh spending data), and then started to explore it using the new network graph view (via the Experiment menu). So for example, we can get a quick view over how the various county councils make payments to each other:

Hovering over a node highlights the other nodes its connected to (though it would be good if the text labels from the connected nodes were highlighted and labels for unconnected nodes were greyed out?)

(I think a Graphviz visualisation would actually be better, eg using Canviz, because it can clearly show edges from A to B as well as B to A…)

As with many exploratory visualisations, this view helps us identify some more specific questions we might want to ask of the data, rather than presenting a “finished product”.

As well as the experimental network graph view, I also noticed there’s a new Experimental View for Google Fusion Tables. As well as the normal tabular view, we also get a record view, and (where geo data is identified?) a map view:

What I’d quite like to see is a merging of map and network graph views…

One thing I noticed whilst playing with Google Fusion Tables is that getting different aggregate views is rather clunky and relies on column order in the table. So for example, here’s an aggregated view of how different county councils supply other councils:

In order to aggregate by supplied council, we need to reorder the columns (the aggregate view aggregates columns as thet appear from left to right in the table view). From the Edit column, Modify Table:

(In my browser, I then had to reload the page for the updated schema to be reflected in the view). Then we can get the count aggregation:

It would be so much easier if the aggregation view allowed you to order the columns there…

PS no time to blog this properly right now, but there are a couple of new javascript libraries that are worth mentioning in the datawrangling context.

In part coming out of the Guardian stable, Misoproject is “an open source toolkit designed to expedite the creation of high-quality interactive storytelling and data visualisation content”. The initial dataset library provides a set of routines for: loading data into the browser from a variety of sources (CSV, Google spreadsheets, JSON), including regular polling; creating and managing data tables and views of those tables within the browser, including column operations such as grouping, statistical operations (min, max, mean, moving average etc); playing nicely with a variety of client side graphics libraries (eg d3.js, Highcharts, Rickshaw and other JQuery graphics plugins).

Recline.js is a library from Max Ogden and the Open Knowledge Foundation that if its name is anything to go by is positioning itself as an alternative (or complement?) to Google Refine. To my mind though, it’s more akin to a Google Fusion Tables style user interface (“classic” version) wherever you need it, via a Javascript library. The data explorer allows you to import and preview CSV, Excel, Google Spreadsheet and ElasticSearch data from a URL, as well as via file upload (so for example, you can try it with the public spend mesh data CSV from Scraperwiki). Data can be sorted, filtered and viewed by facet, and there’s a set of integrated graphical tools for previewing and displaying data too. Refine.js views can also be shared and embedded, which makes this an ideal tool for data publishers to embed in their sites as a way of facilitating engagement with data on-site, as I expect we’ll see on the Data Hub before too long.

More reviews of these two libraries later…

PPS These are also worth a look in respect of generating visualisations based on data stored in Google spreadsheets: DataWrapper and Freedive (like my old Guardian Datastore explorer, but done properly… Wizard led UI that helps you create your own searchable and embeddable database view direct from a Google Spreadsheet).

Journalism Reloaded – What journalists need for the future

In a guest post Alexandra StarkSwiss journalist and Head of Studies at MAZ – the Swiss School of Journalismargues that it’s time for journalists to take action on business models for supporting journalism. Stark proposes a broadened set of skills and a new structure to enable greater involvement from journalists, while also fostering further teaching of such skills.

Ask a journalist if his or her job will remain important in the future: “Of course,” he or she will answer while privately thinking, “What a stupid question!” Try changing this stupid question just a bit, asking: “How will it be possible that you’ll still be able to do a good job in the future?” It’s likely you won’t receive an answer at all. Continue reading

German social TV project “Rundshow”: merging internet and television

In a guest post for OJB, cross-posted from her blog, Franzi Baehrle reviews a new German TV show which operates across broadcast, web and mobile.

There’s a big experiment going on in German television. And I have to admit that I was slightly surprised that the rather conservative “Bayerischer Rundfunk” (BR, a public service broadcaster in Bavaria), would be the one to start it.

Blogger and journalist Richard Gutjahr was approached by BR to develop a format merging internet and TV. On Monday night the “Rundshow” was aired for the first time at 11pm German time, and will be running Mondays-Thursdays for the next four weeks. Continue reading

Dispatches’ Watching the Detectives: why journalists should be worried about the Communications Data Bill

Consider these two unrelated events:

  1. A bill is proposed to record every contact (and possibly search) made by every UK citizen, to be available to law enforcement agencies and stored by communication service providers
  2. An inquiry into press standards and a leaked Home Office report both uncover the ease with which private investigators can access personal records through law enforcement and other agencies

I’m worried about 1. because of 2. And tonight’s Dispatches: Watching the Detectives does a particularly good job of illustrating why. It is “the ease and extent to which the unregulated private investigation industry is willing to acquire personal data for a price” – not just from the police services, but the health services, benefits system, and other bodies, including commercial ones such as communications service providers (for an illustration of the data security of private companies, witness the Information Commissioner’s Office targeting them after a series of data protection breaches).

If you’re a journalist, student journalist or blogger with any interest in protecting your sources, you should be watching the Communications Data Bill closely and understanding how it affects your job.

In the meantime, it’s also worth developing some good habits to protect your stories and your sources against unwanted snooping. More on my Delicious bookmarks under ‘security’.