Monthly Archives: September 2010

Time to talk about legal

As a lone blogger how much legal protection do you have? No more than anyone else, when it comes to libel, contempt of court law and so on, except that people are more likely to pay attention to large media organisations.

But there are many instances where bloggers have lost a lot of time and money over legal disputes. Last week, for example, journalist and blogger Dave Osler finally saw an end to a legal battle that consumed three years of his life, after he was sued for libel by the political activist Johanna Kaschke. Despite being refused the right to appeal the strike-out of the Osler case, she is still planning to appeal another High Court decision that ended her libel claim against Alex Hilton and John Gray.

If all individual bloggers worried about getting into trouble too much, we’d write much less than we do. Even big scary cases aren’t a deterrent: Dave Osler is still blogging. I was personally surprised by the results of my survey of 71 small online publishers this summer. Not that only 27 per cent had been involved in legal disputes (that was about what I expected) but that over half were satisfied with the number of legal resources available.

Personally, the grey areas of law trouble me and I don’t think there could be enough support: I’d like to see more organised structures for legal help, a sort of Citizens Advice Bureau for bloggers, if you like. Informal advice is already spreading via social networks, as lawyers increasingly use Twitter and blogs to join the conversation.

As I reported on my site Meeja Law, one hyperlocal blogger who was accused of breach of copyright asked for legal advice via Twitter: “Two separate media lawyers confirmed (for free) that I’d done nothing wrong. I also contacted [hyperlocal organisation] Talk About Local for advice, and they told me the same.”

Talk About Local has published several media law guides online (eg. this one on defamation) and the organisation’s founder William Perrin offers some frank legal advice ahead of a legal session at last weekend’s London Local Neighbourhoods Online Unconference:

…just about the best legal advice, which very few follow is to set up a 
limited company and keep the website inside that. Then you don’t lose 
your house to a nutter under defamation law….

Another concern of mine is the lack of transparency of courts data, something I’ve discussed at length here. I think bloggers should be able to access more information about cases; at the very least, the Ministry of Justice needs to consider its outmoded contempt of court law that is ill-equipped to deal with the online age.

In the coming months, I’d like to build up the conversation in this area and think about how we might approach some of these issues. If you’d like to be part of this informal online ‘working group’ please consider joining the Help Me Investigate challenge at this link (request membership here), or discussing via the OJB Facebook group.

UPDATE [Paul Bradshaw]: I’ve created a LinkedIn group as a place for people to more openly discuss how to take this forward.

Judith Townend (@jtownend on Twitter) is a PhD research student at City University London and freelance journalist.

Hyperlocal Voices: Julia Larden (Acocks Green Focus Group)

Hyperlocal voices - Acocks Green Focus Group blog

Today’s Hyperlocal Voices interview is with Julia Larden, chair of the Acocks Green Focus Group blog, which campaigns to make Acocks Green a “better place to live, work and shop”. The group was established in 2004 and the blog followed in 2007. “We are less likely to get confused or get our facts slightly muddled” than professional journalists, says Julia. Here’s the full interview:

Who were the people behind the blog, and what were their backgrounds before setting it up?

That’s a bit complicated. Originally the blog was set up, more as a straight website, by a member who has long since left the area. It was not working very well at that time, and the ex-member was also asking for quite a lot of money to carry it on. I don’t think the member had any particular background in IT – he was in education, although he has set up a few small websites of his own. I had done some work for it, written some materials and supplied some photographs. My son, who runs a small software company, agreed to take the whole thing into his care for a bit.

Things lay dormant and then, when my son had time he simply picked the content up and plonked the whole thing into a WordPress blog – one of the slightly posher ones that you have to pay a bit for, but he has some sort of contract and can get quite a few of these blogs, so the group just pays him a very nominal sum each year.

It then sat there for a bit longer with not very much happening except the occasional comment, and then several members pointed out that it was a valuable resource which we were not using properly.

One of the members had web experience (running her own online teaching company) and started to make it into a far more interesting blog, asking for more materials, creating new pages and adding in bits and pieces and an opinion survey of the area – as a launch gimmick. (We have kept that – it still gets a lot of interest – more since I shifted it to another page, for some reason.) Continue reading

"The mass market was a hack": Data and the future of journalism

The following is an unedited version of an article written for the International Press Institute report ‘Brave News Worlds (PDF)

For the past two centuries journalists have dealt in the currency of information: we transmuted base metals into narrative gold. But information is changing.

At first, the base metals were eye witness accounts, and interviews. Later we learned to melt down official reports, research papers, and balance sheets. And most recently our alloys have been diluted by statements and press releases.

But now journalists are having to get to grips with a new type of information: data. And this is a very rich seam indeed.

Data: what, how and why

Data is a broad term so I should define it here: I am not talking here about statistics or numbers in general, because those are nothing new to journalists. When I talk about data I mean information that can be processed by computers.

This is a crucial distinction: it is one thing for a journalist to look at a balance sheet on paper; it is quite another to be able to dig through those figures on a spreadsheet, or to write a programming script to analyse that data, and match it to other sources of information. We can also more easily analyse new types of data, such as live data, large amounts of text, user behaviour patterns, and network connections.

And that, for me, is hugely important. Indeed, it is potentially transformational. Adding computer processing power to our journalistic arsenal allows us to do more, faster, more accurately, and with others. All of which opens up new opportunities – and new dangers. Things are going to change. Continue reading

Why did you get into data journalism?

In researching my book chapter (UPDATE: now published) I asked a group of journalists who worked with data what led them to do so. Here are their answers:

Jonathon Richards, The Times:

The flood of information online presents an amazing opportunity for journalists, but also a challenge: how on earth does one keep up with; make sense of it? You could go about it in the traditional way, fossicking in individual sites, but much of the journalistic value in this outpouring, it seems, comes in aggregation: in processing large amounts of data, distilling them, and exploring them for patterns. To do that – unless you’re superhuman, or have a small army of volunteers – you need the help of a computer.

I ‘got into’ data journalism because I find this mix exciting. It appeals to the traditional journalistic instinct, but also calls for a new skill which, once harnessed, dramatically expands the realm of ‘stories I could possibly investigate…’ Continue reading

Hyperlocal Voices: Robin Byles (Sheffieldblog.com and Crosspool.info)

Hyperlocal voices: Sheffield blog

Here’s another hyperlocal voice: Robin Byles set up Sheffieldblog in 2008 when he returned to the city after working for the BBC. The site focuses on “The kind of stuff that may get featured as an aside in the local papers, but actually people are quite interested in and in the context of online, works really well.” More recently he’s also been involved in Crosspool.info. Here’s the full interview:

Who were the people behind the blog, and what were their backgrounds before setting it up?

I set the blog up on my own. I studied Media and Communications at UCE [now Birmingham City University], moved to London where I worked at the BBC for 8 years as a web editor and have now moved back north where I’m a digital editor for the University of Sheffield.

What made you decide to set up the blog?

A mixture of things really. I had seen one or two local blogs and knew that there wasn’t a major one covering my home town of Sheffield, so quite fancied setting something up.

I think living away from the area had given me a yearning for local news but not just the traditional stuff that I could read in the local paper or local news website.

I was also interested in the stories that people were talking about that didn’t always make the normal news outlets. This interesting stuff was out there on the internet and I liked the idea of being able to collate all this content and promote it from one place – a non-automated aggregator, I suppose.

I’m very fond of my home city and the pending move back home seemed like a good excuse to get something up and running.

I was also on the lookout for jobs at the time and knew that the more varied stuff that my CV had on it – in particular a place where I could do a bit of writing – the more it would help me find work. So part of the motivation was also a professional one. Continue reading

Hyperlocal Voices: Philip John (The Lichfield Blog)

Hyperlocal voices - Lichfield Blog

In another Hyperlocal Voices post, Philip John talks about how The Lichfield Blog was launched to address a gap in local news reporting. In less than 2 years it has taken on a less opinionated tone and more “proper reporting”, picking up national recognition and covering its costs along the way.

Who were the people behind Lichfield Blog, and what were their backgrounds before setting it up?

Ross Hawkes founded the blog in January. Ross is a senior lecturer in journalism at Staffs Uni and previously worked at BPM. He started his journalistic career at the now defunct Lichfield Post. There’s also Nick, a semi-professional photographer who helps out with the creative side of things and I look after the techy side of the web site as looking after WordPress is where I specialise. We also have a good group of contributors and a couple of advisors, many of whom are either current or former journalists at local newspapers.

What made you decide to set up the blog?

Ross’ wife heard sirens going past their house one day and was curious as to where they were going. Ross realised no-one was reporting those kind of low-level goings on and that with the beat reporter disappearing there was a gap for community-focused news. Continue reading

My Henry Stewart talk about 'Blogging, Twitter and Journalism'

I’ve recorded a 48 minute presentation covering ‘Blogging, Twitter and Journalism‘ for the Henry Stewart series of talks. It’s designed for journalism students and covers

  • How blogging differs from other journalism platforms;
  • Key developments in journalism blogging history;
  • What makes a successful blog
  • What is Twitter and how is it useful for journalists and publishers? and
  • Why RSS is central to blogging and Twitter and how it works

The BBC and missed data journalism opportunities

Bar chart: UN progress on eradication of world hunger

I’ve tweeted a couple of times recently about frustrations with BBC stories that are based on data but treat it poorly. As any journalist knows, two occasions of anything in close proximity warrants an overreaction about a “worrying trend”. So here it is.

“One in four council homes fails ‘Decent Homes Standard'”

This is a good piece of newsgathering, but a frustrating piece of online journalism. “Almost 100,000 local authority dwellings have not reached the government’s Decent Homes Standard,” it explained. But according to what? Who? “Government figures seen by BBC London”. Ah, right. Any chance of us seeing those too? No.

The article is scattered with statistics from these figures “In Havering, east London, 56% of properties do not reach Decent Homes Standard – the highest figure for any local authority in the UK … In Tower Hamlets the figure is 55%.”

It’s a great story – if you live in those two local authorities. But it’s a classic example of narrowing a story to fit the space available. This story-centric approach serves readers in those locations, and readers who may be titillated by the fact that someone must always finish bottom in a chart – but the majority of readers will not live in those areas, and will want to know what the figures are for their own area. The article does nothing to help them do this. There are only 3 links, and none of them are deep links: they go to the homepages for Havering Council, Tower Hamlets Council, and the Department of Communities and Local Government.

In the world of print and broadcast, narrowing a story to fit space was a regrettable limitation of the medium; in the online world, linking to your sources is a fundamental quality of the medium. Not doing so looks either ignorant or arrogant.

“Uneven progress of UN Millennium Development Goals”

An impressive piece of data journalism that deserves credit, this looks at the UN’s goals and how close they are to being achieved, based on a raft of stats, which are presented in bar chart after bar chart (see image above). Each chart gives the source of the data, which is good to see. However, that source is simply given as “UN”: there is no link either on the charts or in the article (there are 2 links at the end of the piece – one to the UN Development Programme and the other to the official UN Millennium Development Goals website).

This lack of a link to the specific source of the data raises a number of questions: did the journalist or journalists (in both of these stories there is no byline) find the data themselves, or was it simply presented to them? What is it based on? What was the methodology?

The real missed opportunity here, however, is around visualisation. The relentless onslaught on bar charts makes this feel like a UN report itself, and leaves a dry subject still looking dry. This needed more thought.

Off the top of my head, one option might have been an overarching visualisation of how funding shortfalls overall differ between different parts of the world (allowing you to see that, for example, South America is coming off worst). This ‘big picture’ would then draw in people to look at the detail behind it (with an opportunity for interactivity).

Had they published a link to the data someone else might have done this – and other visualisations – for them. I would have liked to try it myself, in fact.

UPDATE: After reading this post, a link has now been posted to the report (PDF).

Compare this article, for example, with the Guardian Datablog’s treatment of the coalition agreement: a harder set of goals to measure, and they’ve had to compile the data themselves. But they’re transparent about the methodology (it’s subjective) and the data is there in full for others to play with.

It’s another dry subject matter, but The Guardian have made it a social object.

No excuses

The BBC is not a print outlet, so it does not have the excuse of these stories being written for print (although I will assume they were researched with broadcast as the primary outlet in mind).

It should also, in theory, be well resourced for data journalism. Martin Rosenbaum, for example, is a pioneer in the field, and the team behind the BBC website’s Special Reports section does some world class work. The corporation was one of the first in the world to experiment with open innovation with Backstage, and runs a DataArt blog too. But the core newsgathering operation is missing some basic opportunities for good data journalism practice.

In fact, it’s missing just one basic opportunity: link to your data. It’s as simple as that.

On a related note, the BBC Trust wants your opinions on science reporting. On this subject, David Colquhoun raises many of the same issues: absence of links to sources, and anonymity of reporters. This is clearly more a cultural issue than a technical one.

Of all the UK’s news organisations, the BBC should be at the forefront of transparency and openness in journalism online. Thinking politically, allowing users to access the data they have spent public money to acquire also strengthens their ideological hand in the Big Society bunfight.

UPDATE: Credit where it’s due: the website for tonight’s Panorama on public pay includes a link to the full data.

When crowdsourcing is your only option

Crowdsourced map - the price of weed

PriceOfWeed.com is a great example of when you need to turn to crowdsourcing to obtain data for your journalism. As Paul Kedrosky writes, it’s “Not often that you get to combine economics, illicit substances, map mashups and crowd-sourcing in one post like this.” The resulting picture is surprisingly clear.

And news organisations could learn a lot from the way this has been executed. Although the default map view is of the US, the site detects your location and offers you prices nearest to you. It’s searchable and browsable. Sadly, the raw data isn’t available – although it would be relatively straightforward to scrape it.

As the site expands globally it is also adding extra data on the social context – tolerance and  law enforcement. (via)

A First – Not Very Successful – Look at Using Ordnance Survey OpenLayers…

What’s the easiest way of creating a thematic map, that shows regions coloured according to some sort of measure?

Yesterday, I saw a tweet go by from @datastore about Carbon emissions in every local authority in the UK, detailing those emissions for a list of local authorities (whatever they are… I’ll come on to that in a moment…)

Carbon emissions data table

The dataset seemed like a good opportunity to try out the Ordnance Survey’s OpenLayers API, which I’d noticed allows you to make use of OS boundary data and maps in order to create thematic maps for UK data:

OS thematic map demo

So – what’s involved? The first thing was to try and get codes for the authority areas. The ONS make various codes available (download here) and the OpenSpace website also makes available a list of boundary codes that it can render (download here), so I had a poke through the various code files and realised that the Guardian emissions data seemed to identify regions that were coded in different ways? So I stalled there and looked at another part f the jigsaw…

…specifically, OpenLayers. I tried the demo – Creating thematic boundaries – got it to work for the sample data, then tried to put in some other administrative codes to see if I could display boundaries for other area types… hmmm…. No joy:-) A bit of digging identified this bit of code:

boundaryLayer = new OpenSpace.Layer.Boundary("Boundaries", {
strategies: [new OpenSpace.Strategy.BBOX()],
area_code: ["EUR"],
styleMap: styleMap });

which appears to identify the type of area codes/boundary layer required, in this case “EUR”. So two questions came to mind:

1) does this mean we can’t plot layers that have mixed region types? For example, the emissions data seemed to list names from different authority/administrative area types?
2) what layer types are available?

A bit of digging on the OpenLayers site turned up something relevant on the Technical FAQ page:

OS OpenSpace boundary DESCRIPTION, (AREA_CODE) and feature count (number of boundary areas of this type)

County, (CTY) 27
County Electoral Division, (CED) 1739
District, (DIS) 201
District Ward, (DIW) 4585
European Region, (EUR) 11
Greater London Authority, (GLA) 1
Greater London Authority Assembly Constituency, (LAC) 14
London Borough, (LBO) 33
London Borough Ward, (LBW) 649
Metropolitan District, (MTD) 36
Metropolitan District Ward, (MTW) 815
Scottish Parliament Electoral Region, (SPE) 8http://ouseful.wordpress.com/wp-admin/edit.php
Scottish Parliament Constituency, (SPC) 73
Unitary Authority, (UTA) 110
Unitary Authority Electoral Division, (UTE) 1334
Unitary Authority Ward, (UTW) 1464
Welsh Assembly Electoral Region, (WAE) 5
Welsh Assembly Constituency, (WAC) 40
Westminster Constituency, (WMC) 632

so presumably all those code types can be used as area_code arguments in place of “EUR”?

Back to one of the other pieces of the jigsaw: the OpenLayers API is called using official area codes, but the emissions data just provides the names of areas. So somehow I need to map from the area names to an area code. This requires: a) some sort of lookup table to map from name to code; b) a way of doing that.

Normally, I’d be tempted to use a Google Fusion table to try to join the emissions table with the list of boundary area names/codes supported by OpenSpace, but then I recalled a post by Paul Bradshaw on using the Google spreadsheets VLOOKUP formula (to create a thematic map, as it happens: Playing with heat-mapping UK data on OpenHeatMap), so thought I’d give that a go… no joy:-( For seem reason, the vlookup just kept giving rubbish. Maybe it was happy with really crappy best matches, even if i tried to force exact matches. It almost felt like formula was working on a differently ordered column to the one it should have been, I have no idea. So I gave up trying to make sense of it (something to return to another day maybe; I was in the wrong mood for trying to make sense of it, and now I am just downright suspicious of the VLOOKUP function!)…

…and instead thought I’d give the openheatmap application Paul had mentioned a go…After a few false starts (I thought I’d be able to just throw a spreadsheet at it and then specify the data columns I wanted to bind to the visualisation, (c.f. Semantic reports), but it turns out you have to specify particular column names, value for the data value, and one of the specified locator labels) I managed to upload some of the data as uk_council data (quite a lot of it was thrown away) and get some sort of map out:

openheatmap demo

You’ll notice there are a few blank areas where council names couldn’t be identified.

So what do we learn? Firstly, the first time you try out a new recipe, it rarely, if ever, “just works”. When you know what you’re doing, and “all you have to do is…”, all is a little word. When you don’t know what you’re doing, all is a realm of infinite possibilities of things to try that may or may not work…

We also learn that I’m not really that much closer to getting my thematic map out… but I do have a clearer list of things I need to learn more about. Firstly, a few hello world examples using the various different OpenLayer layers. Secondly, a better understanding of the differences between the various authority types, and what sorts of mapping there might be between them. Thirdly, I need to find a more reliable way of reconciling data from two tables and in particular looking up area codes from area names (in two ways: code and area type from area name; code from area name and area type). VLOOKUP didn’t work for me this time, so I need to find out if that was my problem, or an “issue”.

Something else that comes to mind is this: the datablog asks: “Can you do something with this data? Please post your visualisations and mash-ups on our Flickr group”. IF the data had included authority codes, I would have been more likely to persist in trying to get them mapped using OpenLayers. But my lack of understanding about how to get from names to codes meant I stumbled at this hurdle. There was too much friction in going from area name to OpenLayer boundary code. (I have no idea, for example, whether the area names relate to one administrative class, or several).

Although I don’t think the following is the case, I do think it is possible to imagine a scenario where the Guardian do have a table that includes the administrative codes as well as names for this data, or an environment/application/tool for rapidly and reliably generating such a table, and that they know this makes the data more valuable because it means they can easily map it, but others can’t. The lack of codes means that work needs to be done in order to create a compelling map from the data that may attract web traffic. If it was that easy to create the map, a “competitor” might make the map and get the traffic for no real effort. The idea I’m fumbling around here is that there is a spectrum of stuff around a data set that makes it more or less easy to create visualiations. In the current example, we have area name, area code, map. Given an area code, it’s presumably (?) easy enough to map using e.g. OpenLayers becuase the codes are unambiguous. Given an area name, if we can reliably look up the area code, it’s presumably easy to generate the map from the name via the code. Now, if we want to give the appearance of publishing the data, but make it hard for people to use, we can make it hard for them to map from names to codes, either by messing around with the names, or using a mix of names that map on to area codes of different types. So we can taint the data to make it hard for folk to use easily whilst still be being seen to publish the data.

Now I’m not saying the Guardian do this, but a couple of things follow: firstly, obfuscating or tainting data can help you prevent casual use of it by others whilst at the same time ostensibly “open it up” (it can also help you track the data; e.g. mapping agencies that put false artefacts in their maps to help reveal plagiarism); secondly, if you are casual with the way you publish data, you can make it hard for people to make effective use of that data. For a long time, I used to hassle folk into publishing RSS feeds. Some of them did… or at least thought they did. For as soon as I tried to use their feeds, they turned out to be broken. No-one had ever tried to consume them. Same with data. If you publish your data, try to do something with it. So for example, the emissions data is illustrated with a Many Eyes visualisation of it; it works as data in at least that sense. From the place names, it would be easy enough to vaguely place a marker on a map showing a data value roughly in the area of each council. But for identifying exact administrative areas – the data is lacking.

It might seem as is if I’m angling against the current advice to councils and government departments to just “get their data out there” even if it is a bit scrappy, but I’m not… What I am saying (I think) is that folk should just try to get their data out, but also:

– have a go at trying to use it for something themselves, or at least just demo a way of using it. This can have a payoff in at least a three ways I can think of: a) it may help you spot a problem with the way you published the data that you can easily fix, or at least post a caveat about; b) it helps you develop your own data handling skills; c) you might find that you can encourage reuse of the data you have just published in your own institution…

– be open to folk coming to you with suggestions for ways in which you might be able to make the data more valuable/easier to use for them for little effort on your own part, and that in turn may help you publish future data releases in an ever more useful way.

Can you see where this is going? Towards Linked Data… 😉

PS just by the by, a related post (that just happens to mention OUseful.info:-) on the Telegraph blogs about Open data ‘rights’ require responsibility from the Government led me to a quick chat with Telegraph data hack @coneee and the realisation that the Telegraph too are starting to explore the release of data via Google spreadsheets. So for example, a post on Councils spending millions on website redesigns as job cuts loom also links to the source data here: Data: Council spending on websites.