Monthly Archives: November 2010

Making magazine awards more user-friendly

Given I’ve already linked to Tony Hirst twice this week I thought I’d make it a hat-trick. Last month Tony wrote two blog posts which I thought were particularly instructive for magazine publishers organising blog awards.

In the first post Tony complained after seeing Computer Weekly’s shortlist:

“Why, oh why, don’t publishers of blog award nomination lists see them as potentially useful collections on a particular subject that can be put to work for the benefit of that community?

“… There are umpteen categories – each category has it’s own web page – and umpteen nominations per award. To my mind, lists of nominations for an award are lists of items on a related topic. Where the items relate to blogs, presumably with an RSS feed associated with each, the lists should be published as an OPML file, so you can at-a-click subscribe to all the blogs on a list in a reader such as Google Reader, or via a dashboard such as netvibes. Where there are multiple awards, I’d provide an OPML file for each award, and a meta-bundle that collects nominations for all the awards together in a single OPML file, though with each category in its own nested outline element.”

I’d suggest something even more simple: an aggregator widget pulling together the RSS feeds for each category, or a new Twitter account, or a Google Reader bundle.

In a second post the following day Tony finds a further way to extract value from the list: use Google Custom Search to create a custom search engine limited to those sites you have shortlisted as award-worthy. His post explains exactly how to do that.

Tony’s approach demonstrates the difference between story-centred and data-centred approaches to journalism. Computer Weekly are approaching the awards as a story (largely because of limitations of platform and skills – see comments), with the ultimate ending ‘Blog publisher wins award’. Tony, however, is looking at the resources being gathered along the way: a list of blogs, each of which has an RSS feed, and each of which will be useful to readers and journalists. Both are valid, but ignoring either is to miss something valuable in your journalism.

Statistics and data journalism: seasonal adjustment for journalists

seasonal adjustment image from Junk Charts

When you start to base journalism around data it’s easy to overlook basic weaknesses in that data – from the type of average that is being used, to distribution, sample size and statistical significance. Last week I wrote about inflation and average wages. A similar factor to consider when looking at any figures is seasonal adjustment.

Kaiser Fung recently wrote a wonderful post on the subject:

“What you see [in the image above] is that almost every line is an inverted U. This means that no matter what year, and what region, housing starts peak during the summer and ebb during the winter.

“So if you compare the June starts with the October starts, it is a given that the October number will be lower than June. So reporting a drop from June to October is meaningless. What is meaningful is whether this year’s drop is unusually large or unusually small; to assess that, we have to know the average historical drop between October and June.

“Statisticians are looking for explanations for why housing starts vary from month to month. Some of the change is due to the persistent seasonal pattern. Some of the change is due to economic factors or other factors. The reason for seasonal adjustments is to get rid of the persistent seasonal pattern, or put differently, to focus attention on other factors deemed more interesting.

“The bottom row of charts above contains the seasonally adjusted data (I have used the monthly rather than annual rates to make it directly comparable to the unadjusted numbers.)  Notice that the inverted U shape has pretty much disappeared everywhere.”

The first point is not to think you’ve got a story because house sales are falling this winter – they might fall every winter. In fact, for all you know they may be falling less dramatically than in previous years.

The second point is to be aware of whether the figures you are looking at have been seasonally adjusted or not.

The final – and hardest – point is to know how to seasonally adjust data if you need to.

For that last point you’ll need to go elsewhere on the web. This page on analysing time series takes you through the steps in Excel nicely. And Catherine Hood’s tipsheet on doing seasonal adjustment on a short time series in Excel (PDF) covers a number of different types of seasonal variation. For more on how and where seasonal adjustment is used in UK government figures check out the results of this search (adapt for your own county’s government domain).

Hyperlocal Voices: lovelevenshulme’s Tim Simmonds

Hyperlocal voices: Love Levenshulme

The latest in the Hyperlocal Voices series looks at love levenshulme. When its founder moved on the site was handed on to two other people – this year the blog won the Manchester Blog Award for ‘Blog of the Year’.

Who were the people behind the blog, and what were their backgrounds?

Lovelevenshulme was started by a gentlemen call Matt Clements who I have never met! He wanted to be positive about where he lived and so set up a blog.

What made you decide to set up the blog?

I was a reader of lovelevenshulme and liked the countercultural feel of being positive about a locality. I suppose I thought it was different from the standard moany English mentality.

Matt Clements wrote one day that he was moving out of the area and wanted someone else to take it over. So myself and Helen Power offered to talk it over.

When did you set up the blog and how did you go about it?

We took over the blog and decided to carry on with his positive take. We looked around our area and decided to write about the things we love. This can range from kebab houses, poetry nights, film clubs and cafes. We also try and promote any local event or group.

We use Blogger because it is simple and easy.

What other blogs, bloggers or websites influenced you?

Levenshulme Daily Photograph, Inside the M60, Manchester Mule, Manchizzle, Fat Roland, Sounds Good to me Too

How did – and do – you see yourself in relation to a traditional news operation?

I don’t see us a news operation. We are very biased in our love of Levenshulme and have decided that we won’t write about things that aren’t positive. There is enough of that in the blogosphere already.

What have been the key moments in the blog’s development editorially?

We won blog of the year at Manchester Blog Award 2010. I think that helped us to realize that being hyperlocal and positive is actually quite unusual and powerful.

Linking properly with a Twitter feed and a Facebook fan page have helped us develop the community side of the blog.

What sort of traffic do you get and how has that changed over time?

Our traffic has only been tracked properly since August 1st 2010. We have seen our numbers double every month so far. I think we may now be at (or near) our peak (roughly 1500 hits a month)

How did you find taking on a blog that was already running?

Easy to be honest. The guy who set it up didn’t want to do it anymore and was happy for us to take it in whatever direction we wanted. In fact, he has emailed us since and been very complimentary indeed.

I guess the only problem we have is finding the information or local events but as the blog’s profile has grown people have been sending stuff through to us.

Using Yahoo! Clues to target your headlines by demographic

Yahoo! Search Clues - Emma Watson hair

Tony Hirst points my attention (again) to Yahoo! Clues, a tool that, like Google’s Insights For Search, allows you to see what search terms are most popular. However, unlike Insights, Yahoo! Clues gives much deeper demographic information about who is searching for particular terms.

Tony’s interest is in how libraries might use it. I’m obviously interested in the publishing side – and search engine optimisation (SEO). And here’s where the tool is really interesting.

Until now SEO has generally taken a broad brush approach. You use tools like Insights to get an idea – based on the subject of your journalism – of what terms people are using, related terms, and rising terms. But what if your publication is specifically aimed at women – or men? Or under-25s? Or over-40s? Or the wealthy?

With Yahoo! Clues, if the search term is popular enough you can drill down to those groups with a bit more accuracy (US-only at the moment, though). Taking “Emma Watson haircut”, for example, you can see that a girls’ magazine and one aimed at boys may take different SEO approaches based on what they find from Yahoo! Clues.

Apart from anything else, it demonstrates just what an immature discipline web writing and SEO is. As more and more user data is available, processed at faster speeds, we should see this area develop considerably in the next decade.

UPDATE: After reading this post, Tony has written a follow-up post on other tools for seeing demographics around search behaviour.

Yahoo! Search Clues - Emma Watson haircut - oops/katie leung

A mob is just a crowd you’re not part of

Mobs have been very much back in the spotlight over the past couple of weeks. The Cooks Source saga was followed up by the lesser Dairy Goat Journal bunfight, while in the physical world students demonstrated in London and the Fitwatch blog providing advice to those students was shut down by police. In each case onlookers conjured up the spectre of “the mob” – a term whose primary definition – “A large disorderly crowd or throng” – belies the array of discourses that underpin it, partly related to its secondary definition – “The mass of common people; the populace”.

In other words, “mob” is a term used to frame a debate in emotional terms, to dismiss what may be a genuine outpouring of anger or resentment as invalid or illegitimate. For those reasons, I flinch when people talk about mobs instead of crowds.

Last week the video above was uploaded to YouTube. It shows a presentation describing the events leading up to a fatal flash mob. The story is fictional but the events it is constructed from are real*.

The prospect of such a series of events happening is terrifying and rightly thought-provoking: I would recommend it as a way of exploring journalism ethics in a networked age.

But the video is a syllogism – it makes an apparently logical (implied) argument that because these events have all happened and they are connected in technical terms, they could, eventually, all happen together.

The obvious flaw here is statistical: the probability of all those events leading to another is of a different order. We can all imagine all possible worlds.

It reminds me of Eric Morecambe’s joke, when pulled up on his piano playing, that he was playing all the right notes “but not necessarily in the right order”.

But the major flaw is logical. We are led from cause to effect, major to minor premise, but the ultimate event really has no connection with its beginnings. When large numbers of people gather in one place, sometimes it turns into a riot and people get killed. Technology has not changed that. Perhaps it makes it easier to do so – perhaps it makes it easier to disperse crowds when things go awry, or to call for assistance. Most likely all of the above. It’s the same technologically determinist mindset that blames Google Maps for terrorist attacks. As Douglas Adams put it in 1999:

“Newsreaders still feel it is worth a special and rather worrying mention if, for instance, a crime was planned by people ‘over the Internet.’ They don’t bother to mention when criminals use the telephone or the M4, or discuss their dastardly plans ‘over a cup of tea,’ though each of these was new and controversial in their day.”

In short: Don’t Panic.

*In the comments on YouTube Danosuke points out: “The one instance they cite as “already happened” was not a riot at all. There was no reported property damage or injuries. This is pointless fear mongering.”

Solving buggy behaviour when scraping data into Google spreadsheets

Tony Hirst has identified some bugs in the way Google spreadsheets ‘scrapes’ tables from other sources. In particular, when the original data is of mixed types (e.g. numbers and text). The solution is summed up as follows:

“When using the =QUERY() formula, make sure that you’re importing data of the same datatype in each cell; and when using the =ImportData()formula, cast the type of the columns yourself… (I’m assuming this persists, and doesn’t get reset each time the spreadsheet resynchs the imported data from the original URL?)”

Extractiv: crawl webpages and make semantic connections

Extractiv screenshot

Here’s another data analysis tool which is worth keeping an eye on. Extractiv “lets you transform unstructured web content into highly-structured semantic data.” Eyes glazing over? Okay, over to ReadWriteWeb:

“To test Extractive, I gave the company a collection of more than 500 web domains for the top geolocation blogs online and asked its technology to sort for all appearances of the word “ESRI.” (The name of the leading vendor in the geolocation market.)

“The resulting output included structured cells describing some person, place or thing, some type of relationship it had with the word ESRI and the URL where the words appeared together. It was thus sortable and ready for my analysis.

“The task was partially completed before being rate limited due to my submitting so many links from the same domain. More than 125,000 pages were analyzed, 762 documents were found that included my keyword ESRI and about 400 relations were discovered (including duplicates). What kinds of patterns of relations will I discover by sorting all this data in a spreadsheet or otherwise? I can’t wait to find out.”

What that means in even plainer language is that Extractiv will crawl thousands of webpages to identify relationships and attributes for a particular subject.

This has obvious applications for investigative journalists: give the software a name (of a person or company, for example) and a set of base domains (such as news websites, specialist publications and blogs, industry sites, etc.) and set it going. At the end you’ll have a broad picture of what other organisations and people have been connected with that person or company. Relationships you can ask it to identify include relationships, ownership, former names, telephone numbers, companies worked for, worked with, and job positions.

It won’t answer your questions, but it will suggest some avenues of enquiry, and potential sources of information. And all within an hour.

Time and cost

ReadWriteWeb reports that the process above took around an hour “and would have cost me less than $1, after a $99 monthly subscription fee. The next level of subscription would have been performed faster and with more simultaneous processes running at a base rate of $250 per month.”

As they say, the tool represents “commodity level, DIY analysis of bulk data produced by user generated or other content, sortable for pattern detection and soon, Extractiv says, sentiment analysis.”

Which is nice.

New UK site launches to tackle lobbying data

Who's Lobbying treemap

I’ve been waiting for the launch of Who’s Lobbying ever since they stuck up that little Post-It note on a holding page in the run-up to the general election. Well now the site is live – publishing and visualising lobbying data, beginning with information about “ministerial meetings with outside interests, based on the reports released by UK government departments in October.”

This information is presented on the homepage very simply: with 3 leaderboards and a lovely search interface.

Who's Lobbying homepage

There are also a couple of treemaps to explore, for a more visual (and clickable) kick.

These allow you to see more quickly any points of interest in particular areas. The Who’s Lobbying blog notes, for instance, that “the treemap shows about a quarter of the Department of Energy and Climate Change meetings are with power companies. Only a small fraction are with environmental or climate change organisations.”

It also critically notes in another post that

“The Number 10 flickr stream calls [its index to transparency] a “searchable online database of government transparency information”. However it is really just a page of links to department reports. Each report containing slightly different data. The reports are in a mix of PDF, CSV, and DOC formats.

“Unfortunately Number 10 and the Cabinet Office have not mandated a consistent format for publishing ministerial meeting information.

“The Ministry of Defence published data in a copy-protected PDF format, proventing copy and paste from the document.

DEFRA failed to publish the name of each minister in its CSV formatted report.

“The Department for Transport is the only department transparent enough to publish the date of each meeting.

“All other departments only provided the month of each meeting – was that an instruction given centrally to departments? Because of this it isn’t possible to determine if two ministers were at the same meeting. Our analysis is likely to be double counting meetings with two ministers in attendance.

“Under the previous Labour government, departments had published dates for individual meetings. In this regard, are we seeing less transparency under the Conservative/Lib Dem coalition?”

When journalists start raising these questions then something will really have been achieved by the open data movement. In the meantime, we can look at Who’s Lobbying as a very welcome addition to a list of sites that feels quite weighty now: MySociety’s family of tools as the grandaddy, and ElectionLeaflets.org (formerly The Straight Choice), OpenlyLocal, Scraperwiki, Where Does My Money Go? and OpenCharities as the new breed (not to mention all the data-driven sites that sprung up around this year’s election). When they find their legs, they could potentially be quite powerful.

Hyperlocal Voices: Hedon Blog (Ray Duffill)

Hyperlocal voices: Hedon Blog

The Hedon Blog covers communities in Hedon, East Yorkshire. Established by Ray Duffill at the beginning of last year, he has since gone on to launch the HU12 site as well. This post is part on the ongoing Hyperlocal Voices series.

Who were the people behind the blog, and what were their backgrounds?

I set the Hedon Blog up after being made redundant from a career in Community Development.

What made you decide to set up the blog?

The Hedon Blog was set up as a hobby to keep my ‘hand-in’ with new social media tools I’d discovered on the web whilst working in my previous job as a Community Development Manager in Blackpool.

Specifically, I wanted to find out if Hedon had any community and voluntary groups operating in the area. On the surface it seemed that very little community activity was going on in the town. That was my initial impression and a view shared by neighbours and relatives who had lived in the area much longer.

The process of setting up the blog and nurturing its development has enabled me to re-discover my home town. Hedon is no longer just the place I live – it’s a place I’m proud of and love!

When did you set up the blog and how did you go about it?

I set up the blog on WordPress.com. It took me two minutes to set up and think of the highly original “Hedon Blog” title.

The first post was written in February 2009. I pressed ‘publish’ and thought “What next?”. I had no plan and no real objectives or goals to aim towards. This is not a model to follow!

Using my legs, eyes and ears I explored and unearthed the ‘undiscovered country’ of a small but thriving community infrastructure in the town. I reported back on my findings on the blog. And, as the ‘word-of-mouth’ spread, then people began sending me in notices of community events and other activities in the town.

What other blogs, bloggers or websites influenced you?

Whilst working in Blackpool I had found about Nick Booth‘s (Podnosh) ‘Social Media Surgeries‘ taking place in Birmingham. Inspired by those, I made an early commitment that I would only use social-media tools that were free, easy to use and share, and that could be easily taught to others.

The internet should be about liberating community news and information. I abide by these ideas with the Hedon Blog. Any community can do what I do – you don’t need shed-loads of dosh in order to obtain an effective online voice. Having financial backing and a friendly geek obviously helps – but they are not essential.

The next major influence was Talk About Local and its first ‘un-conference’ in Stoke. From being an isolated individual I was suddenly part of a major phenomenon that involved people from across the country and the world. We even had a name for what we were doing – hyperlocal!

Adam Westbrook has been the other major influence on the blog’s development. I heard him speak and was inspired by his views on the future of journalism.

Locally, in Hull, digital developer Jon Moss has helped through setting up Hull Digital. Individuals met through this network have offered me enormous encouragement and support.

How did – and do – you see yourself in relation to a traditional news operation?

I obtained Adam Westbrook’s e-book on Newsgathering for Hyperlocal Websites and now run the site as a news gathering operation.

Learning from some of the journalistic methods described in that publication has enabled me to put the blog on a professional footing and achieve a credibility in the eyes of public and private sector organisations (as well as voluntary and community groups) who now regularly supply me with press releases and other material.

In this sense I have ‘borrowed’ from the traditional media those things that can help me promote, inform and help build communities in my town.

What have been the key moments in the blog’s development editorially?

The Hedon Blog now sits as part of a wider website family under the www.hu12.net banner. This means I can concentrate community news via the Hedon Blog but now have an outlet for more contentious and controversial material – and a means to obtain some advertising income.

What sort of traffic do you get and how has that changed over time?

I have grown a local audience largely by word of mouth. In my fist month of operation I got 213 visits (WordPress stats) but get those figures and more every day now with occasional daily spikes of over 500 – 800 visits.

I never approached this from a business or journalist point of view – but rather as a civic duty or community activity. The downside of this approach is the obvious: a lack of income to re-invest in the project and to pay for its main motivating force: Me!

This activity has brought me great pleasure but has been draining on time and personal resources.

An exercise in interactive thinking

I’ve just run through an exercise with my class of students from the MA in Television and Interactive Content at Birmingham City University. The exercises are intended to get them to think about the web as more than just a repository of content, but a platform that people use in different ways depending on who they are and what they want to do.

I thought I would share them here – for my own record if nothing else.

What’s your topic – who is your userbase?

Most editorial productions begin with a topic, and an angle on that topic. They also have a particular audience in mind, which dictates the tone that is taken in its production: a documentary aimed at 5-year-olds is going to have a very different tone to one aimed at 25-year-olds, even if the topic is the same.

This gives you the People bit of the POST method I’ve written about previously – the starting point for everything that follows.

From there then you can identify

  • The objectives of those people*.
  • How you might help those people meet those objectives (the strategy and technology)
  • Which of those strategies match your own objectives – or those of the person you’re pitching to

*In the original POST method the objectives are yours, but I would suggest starting with users’ objectives because you need a mutually beneficial outcome.

Here’s how it works in practice: Continue reading