From journalist to blogger: the experience of The Lichfield Blog’s Ross Hawkes

Although I’ve already published an interview with The Lichfield Blog’s Philip John (as part of the Hyperlocal Voices series) I recently returned to ask the site’s editor, Ross Hawkes, about how his own approach as a professional journalist has been changed through running the site. I thought it worth publishing his response in full – here it is:

My background has been in regional journalism in Staffordshire and the West Midlands. I began at the Lichfield Post as a fresh-faced 16-year-old, so it’s quite ironic that I’ve pretty much gone full circle in the space of 12 or 13 years, yet have never been happier. I started off as a sports reporter, then branched out into page design, edited a weekly paper in Coventry before making the move to the dailies at the Birmingham Mail as a page planner and sub-editor. So I’ve had a fairly varied career even though it hasn’t taken me a million miles from my own doorstep. It also skilled me for The Lichfield Blog because I got to see some patch reporters in the greatest sense of the word – people who lived and breathed a community. My integration into the online landscape came after the opportunity arose to take on their web operation.

My time in this role saw me eventually become Senior Multimedia Editor for the Midlands. I’ve been lucky as a journalist in changing times – I’ve been able to spend time learning about the positives and negatives of online work, what works and what doesn’t etc, while many of my colleagues in the industry have had a timescale imposed on them.

But for a variety of reasons the chance to teach online journalism at Staffordshire University came up and here I am today. One of the things I’m keen to stress to students is that I’m not a geek (I leave that to Phil!) but a journalist who has found practical uses for technology etc. During my time at Trinity Mirror I saw plenty of great things, but in a busy newsroom only so much of it could really be of benefit. So that’s what I try to get across to my guys and girls here.

Anyway, back to journalism. Coming to Staffordshire I was really keen that I didn’t want to become rusty – but at the same time I didn’t want to burden myself with freelance concerns, especially in a market which didn’t offer many opportunities anyway. I was also mindful that there were plenty of out-of-work journalists who needed paid employment more than I did. So I decided that I’d write about what I know – basically, where I live. It astonished me to discover that for a city (albeit a small one) there was nowhere to get a regular taste of life here online. Even the newspapers were struggling to fill the void for anyone interested in ye olde city. Although the early versions of The Lichfield Blog were crap, with nothing more than me trying to provoke a response, I soon found that there was a desire for somewhere to discuss Lichfield. Crucially, there was an audience.

Admission time – I never got the value of Twitter as a full-time journalist. But in wanting to grow an audience for TLB I learned how to use it to my benefit. In effect it has been the driving force behind the site. It was at a Tweetup in the early days that I discovered the appetite for the site. It was also where I was able to hook up with my professional other-half – Phil. And herein lies the first journalistic lesson I picked up from The Lichfield Blog. I quickly acknowledged that I wasn’t an expert in everything and that other people held the key to the success of TLB. By working with people like Phil I’ve been able to pull ideas and take suggestions and feedback from a non-journalistic source. I suppose it was collaboration in its rawest form. And we’ve worked like that ever since. Phil has been invaluable and anyone thinking of going hyperlocal needs to find a Phil. With his expertise in the technical side of it, it has allowed me to concentrate on my strengths. So what did Phil get in return? Well, I recommended a good hairdresser once…

So what have I learned from my hyperlocal experience? The Lichfield Blog allows me to enjoy what I do. I’m my own boss, I can try random things, if it doesn’t work I don’t have a news editor kicking my backside. It’s allowed me to be experimental and enjoy the career I’ve got. I like to think I’ve gone back to the future in terms of how I operate. Yes, it’s a new platform and it’s new media, but the basic skills are more needed than ever. It’s about knowing your patch inside out, it’s about attending community meetings and knowing local decision-makers, it’s about getting away from deadline and target driven writing – it’s about being a journalist. I’ve always loved local journalism deep down, that ability to know what makes a community tick. The Lichfield Blog has allowed me to do that and more. It’s given me the opportunity to see that partnerships are the way forward. I’ve also re-evaluated what I think (and that’s the crucial bit – my thoughts) media should be doing. We try to combine news and info. We try and make advertising affordable to local businesses. We try to do exactly the sort of things local newspapers did once upon a time. It’s perhaps not the formula to get me rich, but I never got into journalism for the money, so why should I change that now?

Government Spending Data Explorer

So… the UK Gov started publishing spending data for at least those transactions over £25,0000. Lots and lots of data. So what? My take on it was to find a quick and dirty way to cobble a query interface around the data, so here’s what I spent an hour or so doing in the early hours of last night, and a couple of hours this morning… tinkering with a Gov spending data spreadsheet explorer:

Guardian/gov datastore explorer

The app is a minor reworking of my Guardian datastore explorer, which put some of query front end onto the Guardian Datastore’s Google spreadsheets. Once again, I’m exploiting the work of Simon Rogers and co. at the Guardian Datablog, a reusing the departmental spreadsheets they posted last night. I bookmarked the spreadsheets to delicious (here) and use these feed to populate a spreadsheet selector:

Guardian datastore selector - gov spending data

When you select a spreadsheet, you can preview the column headings:

Datastore explorer - preview

Now you can write queries on that spreadsheet as if it was a database. So for example, here are Department for Education spends over a hundred million:

Education spend - over 100 million

The query is built up in part by selecting items from lists of options – though you can also enter values directly into the appropriate text boxes:

Datstrore explorer - build a query

You can bookmark and share queries in the datastore explorer (for example, Education spend over 100 million), and also get URLs that point directly to CSV and HTML versions of the data via Google Spreadsheets.

Several other example queries are given at the bottom of the data explorer page.

For certain queries (e.g. two column ones with a label column and an amount column), you can generate charts – such as Education spends over 250 million:

Education spend - over 250 million

Here’s how we construct the query:

Education - query spend over 250 million

If you do use this app, and find some interesting queries, please bookmark them and tag them with wdmmg-gde10, or post a link in a comment below, along with a description of what the query is and why its interesting. I’ll try to add interesting examples to the app’s list of example queries.

Notes: the datastore explorer is an example of a single web page application, though it draws on several other external services – delicious for the list of spreadsheets, Google spreadsheets for the database and query engine, Google charts for the charts and styled tabular display. The code is really horrible (it evolved as a series of bug fixes on bug fixes;-), but if anyone would like to run with the idea, start coding afresh maybe, and perhaps make a production version of the app, I have a few ideas I could share;-)

Libel advice for bloggers

Sense About Science – along with a whole raft of other organisations* – have published a libel guide for bloggers: ‘So you’ve had a threatening letter. What can you do?’ Below is the animated button they’ve created that practically begs you to click it and download the PDF.

I’m curious why they haven’t published it as a series of webpages to make it easier to find via search, and to link to – maybe I’m missing something. In the meantime, the PDF is well worth a download.

*Index on Censorship, English PEN, the Media Legal Defence Initiative, the Association of British Science Writers and the World Federation of Science Journalists.

Making magazine awards more user-friendly

Given I’ve already linked to Tony Hirst twice this week I thought I’d make it a hat-trick. Last month Tony wrote two blog posts which I thought were particularly instructive for magazine publishers organising blog awards.

In the first post Tony complained after seeing Computer Weekly’s shortlist:

“Why, oh why, don’t publishers of blog award nomination lists see them as potentially useful collections on a particular subject that can be put to work for the benefit of that community?

“… There are umpteen categories – each category has it’s own web page – and umpteen nominations per award. To my mind, lists of nominations for an award are lists of items on a related topic. Where the items relate to blogs, presumably with an RSS feed associated with each, the lists should be published as an OPML file, so you can at-a-click subscribe to all the blogs on a list in a reader such as Google Reader, or via a dashboard such as netvibes. Where there are multiple awards, I’d provide an OPML file for each award, and a meta-bundle that collects nominations for all the awards together in a single OPML file, though with each category in its own nested outline element.”

I’d suggest something even more simple: an aggregator widget pulling together the RSS feeds for each category, or a new Twitter account, or a Google Reader bundle.

In a second post the following day Tony finds a further way to extract value from the list: use Google Custom Search to create a custom search engine limited to those sites you have shortlisted as award-worthy. His post explains exactly how to do that.

Tony’s approach demonstrates the difference between story-centred and data-centred approaches to journalism. Computer Weekly are approaching the awards as a story (largely because of limitations of platform and skills – see comments), with the ultimate ending ‘Blog publisher wins award’. Tony, however, is looking at the resources being gathered along the way: a list of blogs, each of which has an RSS feed, and each of which will be useful to readers and journalists. Both are valid, but ignoring either is to miss something valuable in your journalism.

Statistics and data journalism: seasonal adjustment for journalists

seasonal adjustment image from Junk Charts

When you start to base journalism around data it’s easy to overlook basic weaknesses in that data – from the type of average that is being used, to distribution, sample size and statistical significance. Last week I wrote about inflation and average wages. A similar factor to consider when looking at any figures is seasonal adjustment.

Kaiser Fung recently wrote a wonderful post on the subject:

“What you see [in the image above] is that almost every line is an inverted U. This means that no matter what year, and what region, housing starts peak during the summer and ebb during the winter.

“So if you compare the June starts with the October starts, it is a given that the October number will be lower than June. So reporting a drop from June to October is meaningless. What is meaningful is whether this year’s drop is unusually large or unusually small; to assess that, we have to know the average historical drop between October and June.

“Statisticians are looking for explanations for why housing starts vary from month to month. Some of the change is due to the persistent seasonal pattern. Some of the change is due to economic factors or other factors. The reason for seasonal adjustments is to get rid of the persistent seasonal pattern, or put differently, to focus attention on other factors deemed more interesting.

“The bottom row of charts above contains the seasonally adjusted data (I have used the monthly rather than annual rates to make it directly comparable to the unadjusted numbers.)  Notice that the inverted U shape has pretty much disappeared everywhere.”

The first point is not to think you’ve got a story because house sales are falling this winter – they might fall every winter. In fact, for all you know they may be falling less dramatically than in previous years.

The second point is to be aware of whether the figures you are looking at have been seasonally adjusted or not.

The final – and hardest – point is to know how to seasonally adjust data if you need to.

For that last point you’ll need to go elsewhere on the web. This page on analysing time series takes you through the steps in Excel nicely. And Catherine Hood’s tipsheet on doing seasonal adjustment on a short time series in Excel (PDF) covers a number of different types of seasonal variation. For more on how and where seasonal adjustment is used in UK government figures check out the results of this search (adapt for your own county’s government domain).

Hyperlocal Voices: lovelevenshulme’s Tim Simmonds

Hyperlocal voices: Love Levenshulme

The latest in the Hyperlocal Voices series looks at love levenshulme. When its founder moved on the site was handed on to two other people – this year the blog won the Manchester Blog Award for ‘Blog of the Year’.

Who were the people behind the blog, and what were their backgrounds?

Lovelevenshulme was started by a gentlemen call Matt Clements who I have never met! He wanted to be positive about where he lived and so set up a blog.

What made you decide to set up the blog?

I was a reader of lovelevenshulme and liked the countercultural feel of being positive about a locality. I suppose I thought it was different from the standard moany English mentality.

Matt Clements wrote one day that he was moving out of the area and wanted someone else to take it over. So myself and Helen Power offered to talk it over.

When did you set up the blog and how did you go about it?

We took over the blog and decided to carry on with his positive take. We looked around our area and decided to write about the things we love. This can range from kebab houses, poetry nights, film clubs and cafes. We also try and promote any local event or group.

We use Blogger because it is simple and easy.

What other blogs, bloggers or websites influenced you?

Levenshulme Daily Photograph, Inside the M60, Manchester Mule, Manchizzle, Fat Roland, Sounds Good to me Too

How did – and do – you see yourself in relation to a traditional news operation?

I don’t see us a news operation. We are very biased in our love of Levenshulme and have decided that we won’t write about things that aren’t positive. There is enough of that in the blogosphere already.

What have been the key moments in the blog’s development editorially?

We won blog of the year at Manchester Blog Award 2010. I think that helped us to realize that being hyperlocal and positive is actually quite unusual and powerful.

Linking properly with a Twitter feed and a Facebook fan page have helped us develop the community side of the blog.

What sort of traffic do you get and how has that changed over time?

Our traffic has only been tracked properly since August 1st 2010. We have seen our numbers double every month so far. I think we may now be at (or near) our peak (roughly 1500 hits a month)

How did you find taking on a blog that was already running?

Easy to be honest. The guy who set it up didn’t want to do it anymore and was happy for us to take it in whatever direction we wanted. In fact, he has emailed us since and been very complimentary indeed.

I guess the only problem we have is finding the information or local events but as the blog’s profile has grown people have been sending stuff through to us.

Using Yahoo! Clues to target your headlines by demographic

Yahoo! Search Clues - Emma Watson hair

Tony Hirst points my attention (again) to Yahoo! Clues, a tool that, like Google’s Insights For Search, allows you to see what search terms are most popular. However, unlike Insights, Yahoo! Clues gives much deeper demographic information about who is searching for particular terms.

Tony’s interest is in how libraries might use it. I’m obviously interested in the publishing side – and search engine optimisation (SEO). And here’s where the tool is really interesting.

Until now SEO has generally taken a broad brush approach. You use tools like Insights to get an idea – based on the subject of your journalism – of what terms people are using, related terms, and rising terms. But what if your publication is specifically aimed at women – or men? Or under-25s? Or over-40s? Or the wealthy?

With Yahoo! Clues, if the search term is popular enough you can drill down to those groups with a bit more accuracy (US-only at the moment, though). Taking “Emma Watson haircut”, for example, you can see that a girls’ magazine and one aimed at boys may take different SEO approaches based on what they find from Yahoo! Clues.

Apart from anything else, it demonstrates just what an immature discipline web writing and SEO is. As more and more user data is available, processed at faster speeds, we should see this area develop considerably in the next decade.

UPDATE: After reading this post, Tony has written a follow-up post on other tools for seeing demographics around search behaviour.

Yahoo! Search Clues - Emma Watson haircut - oops/katie leung

A mob is just a crowd you’re not part of

Mobs have been very much back in the spotlight over the past couple of weeks. The Cooks Source saga was followed up by the lesser Dairy Goat Journal bunfight, while in the physical world students demonstrated in London and the Fitwatch blog providing advice to those students was shut down by police. In each case onlookers conjured up the spectre of “the mob” – a term whose primary definition – “A large disorderly crowd or throng” – belies the array of discourses that underpin it, partly related to its secondary definition – “The mass of common people; the populace”.

In other words, “mob” is a term used to frame a debate in emotional terms, to dismiss what may be a genuine outpouring of anger or resentment as invalid or illegitimate. For those reasons, I flinch when people talk about mobs instead of crowds.

Last week the video above was uploaded to YouTube. It shows a presentation describing the events leading up to a fatal flash mob. The story is fictional but the events it is constructed from are real*.

The prospect of such a series of events happening is terrifying and rightly thought-provoking: I would recommend it as a way of exploring journalism ethics in a networked age.

But the video is a syllogism – it makes an apparently logical (implied) argument that because these events have all happened and they are connected in technical terms, they could, eventually, all happen together.

The obvious flaw here is statistical: the probability of all those events leading to another is of a different order. We can all imagine all possible worlds.

It reminds me of Eric Morecambe’s joke, when pulled up on his piano playing, that he was playing all the right notes “but not necessarily in the right order”.

But the major flaw is logical. We are led from cause to effect, major to minor premise, but the ultimate event really has no connection with its beginnings. When large numbers of people gather in one place, sometimes it turns into a riot and people get killed. Technology has not changed that. Perhaps it makes it easier to do so – perhaps it makes it easier to disperse crowds when things go awry, or to call for assistance. Most likely all of the above. It’s the same technologically determinist mindset that blames Google Maps for terrorist attacks. As Douglas Adams put it in 1999:

“Newsreaders still feel it is worth a special and rather worrying mention if, for instance, a crime was planned by people ‘over the Internet.’ They don’t bother to mention when criminals use the telephone or the M4, or discuss their dastardly plans ‘over a cup of tea,’ though each of these was new and controversial in their day.”

In short: Don’t Panic.

*In the comments on YouTube Danosuke points out: “The one instance they cite as “already happened” was not a riot at all. There was no reported property damage or injuries. This is pointless fear mongering.”

Solving buggy behaviour when scraping data into Google spreadsheets

Tony Hirst has identified some bugs in the way Google spreadsheets ‘scrapes’ tables from other sources. In particular, when the original data is of mixed types (e.g. numbers and text). The solution is summed up as follows:

“When using the =QUERY() formula, make sure that you’re importing data of the same datatype in each cell; and when using the =ImportData()formula, cast the type of the columns yourself… (I’m assuming this persists, and doesn’t get reset each time the spreadsheet resynchs the imported data from the original URL?)”

Extractiv: crawl webpages and make semantic connections

Extractiv screenshot

Here’s another data analysis tool which is worth keeping an eye on. Extractiv “lets you transform unstructured web content into highly-structured semantic data.” Eyes glazing over? Okay, over to ReadWriteWeb:

“To test Extractive, I gave the company a collection of more than 500 web domains for the top geolocation blogs online and asked its technology to sort for all appearances of the word “ESRI.” (The name of the leading vendor in the geolocation market.)

“The resulting output included structured cells describing some person, place or thing, some type of relationship it had with the word ESRI and the URL where the words appeared together. It was thus sortable and ready for my analysis.

“The task was partially completed before being rate limited due to my submitting so many links from the same domain. More than 125,000 pages were analyzed, 762 documents were found that included my keyword ESRI and about 400 relations were discovered (including duplicates). What kinds of patterns of relations will I discover by sorting all this data in a spreadsheet or otherwise? I can’t wait to find out.”

What that means in even plainer language is that Extractiv will crawl thousands of webpages to identify relationships and attributes for a particular subject.

This has obvious applications for investigative journalists: give the software a name (of a person or company, for example) and a set of base domains (such as news websites, specialist publications and blogs, industry sites, etc.) and set it going. At the end you’ll have a broad picture of what other organisations and people have been connected with that person or company. Relationships you can ask it to identify include relationships, ownership, former names, telephone numbers, companies worked for, worked with, and job positions.

It won’t answer your questions, but it will suggest some avenues of enquiry, and potential sources of information. And all within an hour.

Time and cost

ReadWriteWeb reports that the process above took around an hour “and would have cost me less than $1, after a $99 monthly subscription fee. The next level of subscription would have been performed faster and with more simultaneous processes running at a base rate of $250 per month.”

As they say, the tool represents “commodity level, DIY analysis of bulk data produced by user generated or other content, sortable for pattern detection and soon, Extractiv says, sentiment analysis.”

Which is nice.