Author Archives: Paul Bradshaw

VIDEO: Tim Ireland on the importance of networks in SEO

Last month I invited Tim Ireland to take questions from students at City University about his experiences in SEO and related issues. One particular section, when he spoke of the role of networks in the legend of Paul Revere, and the significance of the Daily Mail’s false Amanda Knox report, struck me as particularly interesting, so I’m republishing it here.

The video is Creative Commons licensed – feel free to remix it with other video.

A style guide for collaborative journalism: what I’ve learned from the first weeks of Help Me Investigate: Networks

Collaborative journalism as relay race?

Collaborative journalism as relay race? Image by Phil Roeder

 

It’s a few weeks into the Help Me Investigate: Networks project and I’ve been learning a lot about how community management and investigative journalism can support each other.

Some of this is building on experiences on the original Help Me Investigate, but one thing in particular is emerging from this project. It’s about how you should write when your intention is to make it easy for others to get involved – a different approach to traditional newswriting, but not too different to good blogging practice.

It’s a style guide, of sorts. And so far I’ve identified 3 key qualities:

1. Write ‘news that I can use’

Pull out the practical aspects of what you’re writing about. Even if your post is just a link to an article elsewhere, pull out the most useful quote: the facts, the legalities, the implications. If you’re writing about an investigation, tell us about the process; link to the full data; translate relevant documents and reports.

Make it useful, because users can build on that to help you in return.

2. End your posts with a baton that others can pick up

If it’s a work-in-progress, outline what questions you need to answer next. This will also help keep your own planning on track. If you’re linking to something else, highlight the gaps that need filling – what information is missing?

Already on Help Me Investigate Welfare, for example, one investigation has moved from one user’s initial blog post, to my setting up the site, to a third person supplying extra contextual data, and a fourth contributor mapping the results. That wouldn’t have been possible if the first person had waited and waited until they felt that they were ‘finished’. Online, it’s the unfinished article that is easier to help with.

3. Create momentum by posting small things, often, as you move towards your target

Rather than waiting for things to be perfect, publish as you go. This provides multiple opportunities for others to discover your work, and multiple ways in which to enter it (one post may talk about documents that someone has expertise on; another may profile a particular individual, and so on).

It also makes it clear that the investigation is going somewhere, and the user may be more inclined to help it get there as a result.

Interestingly, one of the journalists on National Public Radio in the US talks of a similar approach:

“[Rina Palta] became a leading reporter on the story, not by writing one big investigative piece but by filing frequent, incremental updates, [NPR’s Matt] Thompson said. (Even Stephen Colbert cited her work.) Thompson calls it the quest: The body of work makes a bigger impact than any single post.”

So there are editorial benefits too.

This style guide works in tandem with a wider content strategy which I’ll blog about at some point.

Meanwhile, what other points would you add to a style guide for collaborative journalism? (Better still, why not join the project and find out for yourself)

Tools or Tales?

Christmas gifts image by Michael Wyszomierski

Christmas gifts image by Michael Wyszomierski

This month’s Carnival of Journalism asks what journalists want for Christmas from programmers, and vice versa. Here’s my take.

Programmers and developers have already given journalists enough presents to last a century of Christmases. Programmers created content management systems and blogging platforms; they wrapped up networks of contacts in social networks, and parcelled up fast-moving updates on Twitter and SMS. They tied media in ribbons of metadata, making it easier to verify. They digitised content, making it possible to mix it with other content.

But I think it’s time for journalists to start giving back.

All of these gifts have made it easier for journalists to report stories. But that’s only part of publishing.

Technology’s place in journalism

Traditionally, journalism’s technology came after the story: sub-editors or designers laid the story out in the way they judged to be the most effective; printers gave it physical form; and distributors made sure it reached people.

Each stage in that process considers the next person. The inverted pyramid, for example, helps subs trim copy to fit available space. Subs talk to printers. Printers work with distributors. Processes are designed to reduce friction. The journalist’s work – whether they realise it or not – is a compromise reached over decades between different parties. An exchange of gifts, if you like.

But when it comes to publishing online, there’s been very little Christmas spirit.

Stories as a vehicle

Stories help us connect with current issues; they act as a vehicle for information that allows us to participate in society, whether that’s politically, socially, or economically.

The job of a journalist is to find stories in current events.

But those stories do not have to be told in one particular way. And if we were to try to tell them in some different ways (adding important metadata; publishing raw data; linking to supporting material; flagging false information), we could be giving a gift much desired by developers.

Here are some things that they could do with that gift – it is, if you like, my own fantasy Christmas list:

They’re just ideas – and will remain so as long as journalists assume they’re only writing for newspapers, and newspaper readers.

The newspaper is a tool: a way for groups of people to exchange information. In the 19th century those groups might have been political activists, or merchants who needed to know the latest trading conditions.

The web is a tool too – a different tool. We can use it to ask information to come to us, or to seek out supplementary information; we can use it to draw connections; and we can act on what we find in the same space. Stories need to adapt to the possibilities of the new tool they sit in.

This year, put a developer on your Christmas list. It’s the gift that keeps on giving.

The rise of local media sales partnerships and 19 other recent hyper-local developments you may have missed

In this guest post Ofcom’s Damian Radcliffe cross-publishes his latest presentation on developments in hyperlocal publishing for September-October, and highlights how partnerships are increasingly important for hyper-local, regional and national media in terms of “making it pay”.

When producing my latest bi-monthly update on hyper-local media, I was struck by the fact that media sales partnerships suddenly seem to be all the rage.

In a challenging economic climate, a number of media providers – both big and small – have recently come together to announce initiatives aimed at maximising economies of scale and potentially reducing overheads.

At a hyperlocal level, the launch on 1st November of the Chicago Independent Advertising Network (CIAN), saw 15 Chicago community news sites coming together to offer a single point of contact for advertisers. These sites “collectively serve more than 1 million page views each month.”

This initiative follows in the footsteps of other small scale advertising alliances including the Seattle Indie Ad Network and Boston Blogs.

These moves – bringing together a range of small scale location based websites – can help address concerns that hyper-local sites are not big enough (on their own) to unlock funding from large advertisers.

CIAN also aims to address a further hyper-local concern: that of sales skills. Rather than having a hyperlocal practitioner add media sales to an ever expanding list of duties, funding from the Chicago Community Trust and the Knight Community Information Challenge allows for a full-time salesperson.

Big Media is also getting in on this act.

In early November Microsoft, Yahoo! and AOL agreed to sell each other’s unsold display ads. The move is a response to Google and Facebook’s increasing clout in this space.

Reuters reported that both Facebook and Google are expected to increase their share of online display advertising in the United States in 2011 by 9.3% and 16.3%.

In contrast, AOL, Microsoft and Yahoo are forecast to lose share, with Facebook expected to surpass Yahoo for the first time.

Similarly in the UK, DMGT’s Northcliffe Media, home to 113 regional newspapers, recently announced it was forging a joint partnership with Trinity Mirror’s regional sales house, AMRA.

This will create a commercial proposition encompassing over 260 titles, including nine of the UK’s 10 biggest regional paid-for titles. Like The Microsoft, Yahoo! and AOL arrangement, this new partnership comes into effect in 2012.

These examples all offer opportunities for economies of scale for media outlets and potentially larger potential reach and impact for advertisers.  Given these benefits, I wouldn’t be surprised if we didn’t see more of these types of partnership in the coming months and years.

Damian Radcliffe is writing in a personal capacity.

Other topics in his current hyperlocal slides  include Sky’s local pilot in NE England and research into the links between tablet useand local news consumption. As ever, feedback and suggestions for future editions are welcome.

 

Magazine Editing – 3rd edition now out (disclosure: I edited it)

Magazine Editing 3rd edition

UPDATE: Readers of this blog can now get a 20% discount off the book by using the code ME1211 when ordering on the Routledge site.

Magazine Editing is one of those books that I’ve used for years in my teaching. Unlike most books in the field, it has a healthy focus on the less glamorous aspects of running magazines, such as managing teams and budgets, editorial strategy, and the significant proportion of the industry – B2B, contract publishing, controlled-circulation, subscription-based – that you don’t see on supermarket shelves.

For the third edition, publishers Routledge approached me to update the book for a multiplatform age. That work is now done – and the new edition is now out.

Although it now has my name on it, the book remains primarily the work of John Morrish, who wrote the first two editions of the book. Editing his work gave me a fresh appreciation of just what a timeless job he has done in identifying the skills needed by magazine editors – as I write in the introduction:

“It is striking how much of the advice in the book is more important than ever. In a period of enormous change it is key to focus on the core skills of magazine editing: clear leadership, effective management, people skills and creative thinking around what exactly it is that your readers are buying into – whether that’s printed on paper, pixels on a screen, or something intangible like a sense of community and belonging.”

So if you can find one of the older editions cheap, you’ll still find it useful.

So what did I add to the new edition of Magazine Editing? It goes without saying that digital magazines (web-only, apps) are now covered. The diversification of revenue models – the increased importance of events, merchandising, data, mobile and apps – is now explored, as well as how online advertising works, and how it differs from traditional advertising. How to use online resources, including web analytics, to better understand your audience and inform your editorial strategy; and how magazine campaigns are changed by the dynamics of the web.

The chapter on leading and managing now includes sections on managing information overload, social bookmarking and social media policies, and there’s a new section on legal guidance on placements and internships. The budgeting sections now include online considerations, and there’s an exploration of the pros and cons of using free or minimal cost third party services against building tools in-house. A passage from the section on ‘Making money online’ is illustrative of the shifts facing the industry:

“Like so much else on the web, it is becoming difficult to see where content ends and commerce begins. The concept of a ‘magazine’ blurs when, online, it can also be a shop, a game, or a tool. It helps to think of how the business model of magazines has traditionally worked: gathering a community of people in the same place (on your pages) where companies can then advertise their products and services. The same principle applies now, but the barriers to selling products and services yourself have been significantly lowered, just as the barriers to publishing content have been significantly lowered for those companies whose advertising used to fund print publishing. Integrity is no less important in this context: users will desert your website if your content is only concerned with selling them your products, just as they will desert if your events are badly organised, your merchandise poor quality, or your service shoddy. Publishers increasingly talk of a ‘brand experience’ of which the content is just one part. In many ways this makes the reader – as they also become a consumer – more powerful, and the advertiser less so. Your insights into what they are talking and reading about may be of increasing interest to those who are searching for new revenue streams.”

The chapter on writing covers considerations in evaluating online sources of information and the debates in online journalism around objectivity versus transparency, and the values of a ‘web-first’ strategy. I also cover online tools for organising diaries and monitoring social media. There’s an exploration of best practice guidelines in writing for the web, and when multimedia is appropriate or preferable.

The chapter on pictures and design now includes advice on dealing with web designers and developers, multiplatform design and branding, sourcing video for the web, copyright and Creative Commons, infographics, and image considerations for online publication. And ‘Managing Production’ covers search engine optimisation, scheduling online production, and online distribution. The penultimate chapter on legal considerations adds data protection, the role of archives in contempt of court, and website terms and conditions.

I end the book with a list of tools that allows the reader to get publishing right now. And aside from the legal developments, the new considerations, roles and stages in the production cycle, this is perhaps the most important change from previous editions: a student reading this book is no longer waiting for their first job in publishing: they should be creating it.

If you have read the book and want to receive updates on developments in the magazine industry, please Like the book’s Facebook page. I’d also welcome any comments on areas you think are well covered – or need to be covered further.

New UK open data moves: following the money and other curiosities

Tim Davies has done a wonderful job of combing through the fine print of the UK government’s Autumn statement open data measures (PDF), highlighting the dynamics that appear to be driving it, and the data conspicuous by its absence.

Here are the passages most relevant for journalists. Firstly, following the money and accountability:

“The [Data Strategy Board] body seeking public data will be reliant upon the profitability of the PDG [Public Data Group] in order to have the funding it needs to secure the release of data that, if properly released in free forms, would likely undermine the current trading revenue model of the PDG. That doesn’t look like the foundation for very independent and effective governance or regulation to open up core reference data!

“Furthermore, whilst the proposed terms for the DSB [Data Strategy Board] terms state that “Data users from outside the public sector, including representatives of commercial re-users and the Open Data community, will represent at least 30% of the members of DSB”, there are also challenges ahead to ensure data users from civil society interests are represented on the board”

Secondly, the emphasis on clinical data and issues surrounding privacy and the sale of personal data:

“The first measures in the Cabinet Office’s paper are explicitly not about open data as public data, but are about the restricted sharing of personal medical records with life-science research firms – with the intent of developing this sector of the economy. With a small nod to “identifying specified datasets for open publication and linkage”, the proposals are more centrally concerned with supporting the development of a Clinical Practice Research Datalink (CPRD) which will contain interlinked ‘unidentifiable, individual level’ health records, by which I interpret the ability to identify a particular individual with some set of data points recorded on them in primary and secondary care data, without the identity of the person being revealed.

“The place of this in open data measures raises a number of questions, such as whether the right constituencies have been consulted on these measures and why such a significant shift in how the NHS may be handing citizens personal data is included in proposals unlikely to be heavily scrutinised by patient groups? In the past, open data policies have been very clear that ‘personal data’ is out of scope – and the confusion here raises risks to public confidence in the open data agenda. Leaving this issue aside for the moment, we also need to critically explore the evidence that the release of detailed health data will “reinforce the UK’s position as a global centre for research and analytics and boost UK life sciences”. In theory, if life science data is released digitally and online, then the firms that can exploit it are not only UK firms – but the return on the release of UK citizens personal data could be gained anywhere in the world where the research skills to work with it exist.”

UPDATE: More on that in The Guardian.

Thirdly, it looks like this data will allow journalists to scrutinise welfare and credit (so plenty of material for the tabloids and mid-market press), but not data that scrutinises corporations or governments:

“When we look at the other administrative datasets proposed for release in the Measures the politicisation of open data release is evident: Fit Note Data; Universal Credit Data; and Welfare Data (again discussed for ‘linking’ implying we’re not just talking about aggregate statistics) are all proposed for increased release, with specific proposals to “increase their value to industry”. By contrast, no mention of releasing more details on the tax share paid by corporations, where the UK issues arms export licenses, or which organisations are responsible for the most employment law violations. Although the stated aims of the Measures include increasing “transparency and accountability” it would not be unreasonable to read the detail of the measures as very one-sided on this point: and emphasising industry exploitation of data far more than good governance and citizen rights with respect to data.

“The blurring of the line between ‘personal data’ and ‘open data’, and the state’s assumption of the right to share personal data for industrial gain should give cause for concern, and highlights the need for build a stronger constituency scrutinising government open data action.”

It’s nice to see a data initiative being greeted with a critical eye rather than Three Cheers for the Numbers.

UPDATE: On a similar note, Access Info Europe highlights problems with the Open Government Partnership, which “must significantly improve its internal access to information policy to meet the standards it is advancing”. Specifically:

“The policy should be reformed to incorporate basic open data principles such as that information will be made available in a machine-readable, electronic format free of restrictions on reuse.”

“A key problem is the lack of detail in the policy, which has the result of leaving important matters to the discretion of the OGP. Other key problems include:
» The failure of the policy to recognise the fundamental human right to information;
» The significantly overbroad and discretionary regime of exceptions;
» The failure of the draft Policy to put in place a system of protections and sanctions.”

FAQ: Niche blogs vs mainstream media outlets

Here’s another collection of questions answered here to avoid duplication. This time from a final year student at UCLAN:

Blogs are often based on niche subject areas and created by individuals from a community. Do you think mainstream media outlets are limited by resources to compete? Or are there signs they are adapting?

I think they are more limited by passion, and by commercial imperatives. Niche blogs tend to be driven by passion initially, and sometimes by the commercial imperative to target those niches, whereas mainstream outlets are built on scale and mass audiences – or affluent audiences who still don’t really qualify as a niche.

They are adapting as the commercial drive changes and advertisers look for measurements of engagement, but it’s hard, as your next question fleshes out…

Communities by nature need conversation, and this often visible online in forums, blog comments etc. Can it be argued niche blogs are better at engaging communities and providing a platform for conversation?

…yes, but more because they often build those communities from the ground up, whereas established media platforms are having to start with a mass audience and carve niches out of those. It’s like trying to hold a community meeting in the middle of a busy high street, compared to doing it in a community centre.

… If so, do you think the success of blogs are as a result of people wanting conversation instead of a ‘lecture from journalists?

Not necessarily – I think blogs succeed (and fail) for all sorts of reasons. One of those is that blogs have made it easier to connect with likeminded people across the platform (in comments, for example, without having to fight through hundreds of comments from idiots), another is the ability for users to input into the journalistic process rather than merely consuming a story, and another is the ability to focus on elements of an issue which may not be accessible enough to justify coverage by a mass audience publication – and I’m sure there are as many other reasons as there are blogs.

Finally, with the emergence of Twitter, along with other methods of contact, are journalists now becoming more involved in conversation with communities of interest or is there still a reluctance from journalists to be involved?

Some recent research in the US suggested that Twitter is still being used overwhelmingly as a broadcast platform by journalists and news brands. But there are also an increasing number of journalists who are using it particularly effectively as a way to talk with users. My own research into blogging suggested a similar effect. So yes, there is reluctance (talking to sources is hard work, after all, whether it’s on Twitter, the phone, or face to face – and for many journalists it’s easier to avoid it) but the culture is changing slowly.

Teaching liveblogging

Liveblogging exercise trending on Twitter

Liveblogging exercise trending on Twitter

In the final part of a trilogy of articles on liveblogging I wanted to talk about a recent experiment I conducted in teaching liveblogging, where I decided to abandon most of my planned lecture on the topic and stage a live ‘event’ instead.

I’d also like to this post to provide a space to share your own experiences of teaching liveblogging and mobile journalism.

One of the biggest problems in teaching liveblogging – and of much of online journalism in fact – is getting students to ‘unlearn’ assumptions about journalism production learned in an analogue context. You can talk about the need to operate across a network, to multitask and to look for where the need lies – but there’s nothing like experience to drill that home.

image by @mattclinch81

Casting the panel: image by @mattclinch81

The event

I decided to recreate one of the less interesting events to liveblog: a committee hearing. I could have chosen to recreate a demonstration or a riot, but aside from the obvious potential for things to go horribly wrong, recreating something less ‘eventful’ meant I could communicate some important lessons about those sorts of events – more on which below.

Specifically, I took the transcript from one of the committee hearings into the MPs’ expenses scandal in the UK. Specifically, I chose the evidence of a husband and wife, providing as it did a little extra colour.

Image by @andrewstuart

Image by @andrewstuart

Precautions

Because the event was going to be tweeted live and in public, I had to make sure that there was no chance of libel. And so the names of all participants were changed to quite obviously false ones: the MP was Alan Fiction (Fiction, Al – see what I did there?) and the various committee members had names that made them sound like Mr Men characters (“Dr Fashionabletrousers”).

Normally hashtags emerge organically but I decided to specify a hashtag up front to make the nature of the event explicit, and so #FAKEevent was born.

With those precautions in place I needed to give the event some dynamics that would show the students the issues they would have to deal with in a live situation. Specifically: multiple sources of information; unexpected events; and incomplete information.

Image by @iamdjcarlo

Image by @iamdjcarlo

The roles

The room (over 200 students) was split into 4 main groups: over half made up a group playing the role of journalists. These were asked to move so that they were all sat in the central column of seats. To further mix things up, I gave them different editorial contexts: one quarter was working for a left-leaning broadsheet; another for a right-leaning one; a third quarter was working for a public broadcaster; and a final one for a commercial broadcaster.

20 more students each made up a pro-MP group, and an anti-MP group, who occupied the left and right columns of seats respectively. A final group of 10 or so students were ‘bystanders‘, occupying the back row.

In addition, a group of 10 or so took the roles of the committee itself, the MP and his ‘wife’.

These groups were now given the following materials:

  • The committee/MP/wife: an edited transcript of the hearing which they were to use as a script. Also: instructions for particular actions that individuals should do at specific times (more below)
  • The journalists: briefing notes: the members of the panel; background on the MP
  • Pro-MP group: instructions that they should try to steer coverage in a positive direction, and details of the website that they could use to do so.
  • Anti-MP group: instructions that they should try to steer coverage in a negative direction, and details of the website that they could use to do so.
  • The bystanders: instructions on who they were, and the roles they would play (more below).

I had also approached 3 students beforehand to play specific roles within those groups: one student each as the ‘editor’ of the pro- and anti-MP websites, who had already been assigned admin access to their particular blog and so could give other students publishing rights; and a third student who would act as the major ‘disruption’ to the event.

And I had told all students ahead of the event to bring either a laptop or mobile phone from which they could publish to the web.

A series of unfortunate events

The transcript formed the backdrop to a number of other events which I wanted to use as a device for demonstrating the skills they would need as livebloggers:

  • One member of the panel would begin to fall asleep after a minute. This was to test how many were only paying attention to the testimony.
  • Another member would shout ‘Snake!’ after 2 minutes, waking the first person up. Again, who would be paying attention? Would they have made a note of who he was?
  • A third member would stare intently at the wife throughout – a small detail; who would notice?
  • After 5 minutes or so, my ‘plant’ would storm into the back of the room and shout a loud accusation at the MP, then be calmly escorted out. Most journalists would not have seen what happened (because it was behind them), and so would have to reconstruct events from the bystanders in the back row, some of whom had their own agendas and some of whom had recorded it.

In all, the exercise took some time to organise (here are my notes): around 20-25 minutes to get everyone into their groups and around 7 minutes for the event itself (actually longer as my interruption held back for some time, waiting for a nod). A livestream of tweets (using Twitterfall) was put up on the projector – if you had a phone set up with Qik or Bambuser you could also stream the video.

image by @nicky_henderson

image of sleeping panel member by @nicky_henderson

The lessons

Choosing a staged event like a committee hearing that wasn’t particularly eventful meant that the students had to do a number of things over and above reacting to events.

Firstly, they had to concentrate on what was taking place because it was easy to lose concentration when nothing interesting was happening.

Secondly, they had to make things interesting. Many resorted to opinion and wit – entertaining, but not particularly informative, although that was excusable given that the event and the actors were fictional, and there was no background knowledge (other than that in the briefing notes) to draw on.

Still, the point wasn’t what they did but rather what they learned, and the frustrations of needing that background were a useful teaching tool in themselves.

Finally, they had to be proactive: seek out information, find out what had happened.

At the end of the exercise I asked them what they had learned, and pointed out some things I’d noticed myself about how they’d dealt with the challenge:

  • Some noted the difficulties of taking in information from both the event itself and on Twitter. This is a skill that comes from practice – or if you have the resources, partnering up with another journalist.
  • Not a single student got up from their seat and moved – either to hear the proceedings more clearly (at least one tweeted that they couldn’t hear what was being said) or to speak to the bystanders
  • Only one found out the name of the protestor. None picked up on his hashtagged tweets. None traced his blog where his accusations were fleshed out.
  • Most journalists did not follow what was being said about the event, and put it into context
  • Few took images or other multimedia

Once again: the point wasn’t that they do things right; in many ways they were set up to fail, and the discussion at the end was about reflecting on those rather than playing a blame game.

‘Failure’ was used as a teaching tool: instead of telling them what they should do, expecting them to remember, and giving them an exercise to do that, I wanted to give them an exercise up front, to experience and internalise that desire to do better, and use that as the context for the lessons, so they could connect it to their own experience of liveblogging rather than experiences of, for example, live broadcast or print reporting. (It seemed to work – a couple of students took the time to express their thanks for the nature of the lesson.)

So although that left me much less time to pass on a lesson, it did, I hope, leave the students learning more and with a higher motivation to continue learning (the full presentation, by the way, was available for those who wanted to go through it).

On the motivation side, the hashtag for the event also trended not only in the UK but in the US too, which I think the students rather enjoyed.

More Dabblings With Local Sentencing Data

In Accessing and Visualising Sentencing Data for Local Courts I posted a couple of quick ways in to playing with Ministry of Justice sentencing data for the period July 2010-June 2011 at the local court level. At the end of the post, I wondered about how to wrangle the data in R so that I could look at percentage-wise comparisons between different factors (Age, gender) and offence type and mentioned that I’d posted a related question to to the Cross Validated/Stats Exchange site (Casting multidimensional data in R into a data frame).

Courtesy of Chase, I have an answer🙂 So let’s see how it plays out…

To start, let’s just load the Isle of Wight court sentencing data into RStudio:

require(ggplot2)
require(reshape2)
iw = read.csv("http://dl.dropbox.com/u/1156404/wightCrimRecords.csv")

Now we’re going to shape the data so that we can plot the percentage of each offence type by gender (limited to Male and Female options):

iw.m = melt(iw, id.vars = "sex", measure.vars = "Offence_type")
iw.sex = ddply(iw.m, "sex", function(x) as.data.frame(prop.table(table(x$value))))
ggplot(subset(iw.sex,sex=='Female'|sex=='Male')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~sex)+ opts(axis.text.x=theme_text(angle=-90)) + xlab('Offence Type')

Here’s the result:

Splitting down offences by percentage and gender

We can also process the data over a couple of variables. So for example, we can look to see how female recorded sentences break down by offence type and age range, displaying the results as a percentage of how often each offence type on its own was recorded by age:

iw.m2 = melt(iw, id.vars = c("sex","Offence_type" ), measure.vars = "AGE")
iw.off=ddply(iw.m2, c("sex","Offence_type"), function(x) as.data.frame(prop.table(table(x$value))))

ggplot(subset(iw.off,sex=='Female')) + geom_bar(aes(x=Var1,y=Freq)) + facet_wrap(~Offence_type) + opts(axis.text.x=theme_text(angle=-90)) + xlab('Age Range (Female)')

Offence type broken down by age and gender

Note that this graphic may actually be a little misleading because percentage based reports donlt play well with small numbers…: whilst there are multiple Driving Offences recorded, there are only two Burglaries, so the statistical distribution of convicted female burglars is based over a population of size two… A count would be a better way of showing this

PS I was hoping to be able to just transmute the variables and generate a raft of other charts, but I seem to be getting an error, maybe because some rows are missing? So: anyone know where I’m supposed to post R library bug reports?