Monthly Archives: November 2010

Data cleaning tool relaunches: Freebase Gridworks becomes Google Refine

When I first saw Freebase Gridworks I was a very happy man. Here was a tool that tackled one of the biggest problems in data journalism: cleaning dirty data (and data is invariably dirty). The tool made it easy to identify variations of a single term, and clean them up, to link one set of data to another – and much more besides.

Then Google bought the company that made Gridworks, and now it’s released a new version of the tool under a new name: Google Refine.

It’s notable that Google are explicitly positioning Refine in their video (above) as a “data journalism” tool.

You can download Google Refine here.

Further videos below. The first explains how to take a list on a webpage and convert it into a cleaned-up dataset – a useful alternative to scraping:

The second video explains how to link your data to data from elsewhere, aka “reconciliation” – e.g. extracting latitude and longitude or language.

What inflation has to do with the price of fish

Inflation image by Gregor Rohrig

Inflation image by Gregor Rohrig - click to see source

One of the forms of data that journalists frequently have to deal with is prices. And while it’s one thing to say that things are getting more expensive, making a meaningful comparison between what things cost now and what things cost then is a different kettle of fish altogether.

Factoring in inflation can make all the difference between arbitrary comparisons that provide no insight whatsoever, and genuinely meaningful reporting.

Thanks to computing power it’s actually quite easy for journalists to factor inflation into their reporting – by using an inflation calculator. It’s also easier to find historical price data with data-driven search engines like Wolfram Alpha.

But inflation is only half of the calculation you need. The other is earnings.

Professor Ian Stewart illustrates this perfectly in this article in The Telegraph:

“[A] 1991 pint cost around £1.40, which is £1.80 in today’s money. The current price is around £2.80, so beer really is more expensive. On the other hand, the average salary in 1991 was £19,000, and today it is £38,000. Relative to what we earn, a pint costs exactly the same as it did 19 years ago.

“Our house? That would be £125,000 today, so it has gone up by 84 per cent. Relative to average earnings, however, the increase is only 10 per cent.

“The Guardian knows about inflation, and said that the pub pint has increased by 68 per cent in real terms. But this compares the real increase in new money with the original price in old money. If I did the calculation like that for my house it would have gone up by 850 per cent. Calculated sensibly, the rise in the price of beer is about 55 per cent relative to inflation, and zero per cent relative to earnings.”

Of course the danger in averages is that they only illustrate aggregate change, and if you’re talking about a purchase that a particular section of the population makes – or you’re only talking to a particular region – then a national average may not be as meaningful a comparison to make.

If the poor are getting poorer and the rich richer then a pint of beer really is more expensive for some – and cheaper for others – than it used to be. Likewise, particular parts of the country might be suffering more from house price increases than others because of local average wages and local house prices.

It’s also worth pointing out that, when talking about financial data, a median is a much more useful measure to take than a mean.

Finally, aside from the statistical considerations it’s worth coming back to some of the basics of pricing. Ian again:

“There are two things to remember about prices. One is basic economics: if something gets too expensive for people to buy it, they don’t. So prices and wages have to stay in step, broadly speaking – though with big fluctuations in some commodities, such as housing. The other is inflation. We all know it exists, but we forget that when we start comparing prices. ‘My God! A Ford Anglia cost only £295 in 1940!’ True, but the average salary then was £370. The equivalent price today is £30,000, which will buy you a Jaguar XF.”

Cooks Source anger moves on to Dairy Goat Journal’s Dave Belanger

Cooks Source fake Facebook page discusses Dairy Goat Journal

UPDATE 2 – from Cathy in the comments (Nov 11): Dave Belanger has now paid the fee.

UPDATE – thanks to Vicki in the comments (Nov 11): Dave Belanger has responded to Suzanne, reinstating the image on their website with a credit and link, and offering to pay. However, he has refused to pay the amount requested by Suzanne, and Suzanne is now planning to take the magazine to court. Her reasoning is admirable, and it’s fair to say that contributions of commenters have helped her to make a well-informed stance:

“Countryside Publications is a five million dollar company. He accused me of being opportunistic by asking for an increased fee for the unauthorized and uncredited use.

“This is not about money. I may never see the $2100. If I do, it will be a long time from now. If I wanted to make a quick buck, I’d take the $500 [offered]. (I could use it.) But if I let him not only steal the photo but pay no penalty for it, there’s no reason for him to not steal again. After all, what did it cost him? He can steal photos all he wants and only pay for them (at a price he sets) if he’s caught. Just who is opportunistic? He published my photo without authorization or credit then says, here, take $500 or NOTHING.”

There’s also some detail about the possible impact on the publishers from Internet users:

“P.S. He mentioned receiving phone calls and emails from my readers and said he was not concerned about it. He admitted there had also been some subscription cancellations, but that people cancelled subscriptions and started subscriptions every day and that he had no reason to believe any subscription cancellations were related to his treatment of my work.”

The original post:

Oh dear. It appears another magazine editor is about to feel the force of a thousand emails following a blogger’s complaint of breach of copyright and – more importantly – said editor’s response to their request for fair payment and acknowledgement of authorship.

The editor in question is Dave Belanger who – apparently – hung up on Suzanne McMinn when she called to ask that her photo – used in Dairy Goat Journal – was properly credited.

With 80 comments already – many of them saying they have called and written to the magazine – and the case also being discussed on the fake Cooks Source Facebook page – you can only hope Dave looks at the Cooks Source and reacts quickly.

*All about this that I can find looks credible, but I’m extra cautious of this being an opportunistic hoax.

via Ulrike in the comments.

Facebook ‘mentions’ – a ‘talking point’ engine

Facebook mentions mockup

Mockup by Oliver Chiang - click to see original article

Oliver Chiang reports on a new feature being tested by Facebook that he dubs ‘mentions’. It tells you if more than one of your friends mentions the same thing – a celebrity, for example, or an event, or organisation.

I’ve often said that Facebook is the ultimate news publisher. ‘What my friends are doing’ was powerful enough, and they have since expanded their editorial proposition (with Pages) into ‘What companies and personalities I am interested in are doing’ too. The ‘mentions’ feature appears to extend that concept further, essentially identifying what is a ‘talking point’ in your social circle and circles of interest, and bringing that to your attention.

That is, of course, one of the things that journalists do.

Of additional interest are the obvious commercial applications of this technology. In fact, the focus at the moment on linking these ‘talking points’ to official ‘Pages’ drives that commercial application home rather forcefully. This overly-commercial application may in fact be a weakness – and it will be interesting to see if it is tweaked before being rolled out (Facebook’s history suggests they are more likely to tweak it than withdraw it).

In the meantime, publishers should be watching closely to see how advertisers respond to the potential of ‘mentions’ – and if there is any way they might adapt similar ideas for their own users and advertisers.

Hyperlocal voices: Mike Rawlins, Pits N Pots (Stoke)

The Hyperlocal Voices series continues with a look at Pits n Pots, a site with its own Wikipedia entry. The site – set up in frustration at the lack of an opeb public forum in the local media – is frequently given as an example of the best of hyperlocal blogging.

Who were the people behind the blog?

Tony Walley & Mike Rawlins, we don’t have interesting ‘job’ titles, we are simply Mike & Tony.

Tony Walley is a company director & broadcaster. Tony has run a successful aluminium stockholding firm in Staffordshire for around 20 years and is politically active. He has worked as a local radio broadcaster mainly covering sport, off and on for around 10 years.

Mike Rawlins is a serial web meddler who had been working in Transport & Logistics for around 18 years. He decided to leave the rat race and to live the dream of being a photographer and a full time serial web meddler 2008.

We have 3 other casual writers who cover various subjects for us.

What made you decide to set up the blog?

The coverage of local politics in the local media at the time was poor to say the least. Commenting on anything that was published in the local media was subject to a policy of censorship rather than moderation. Tony decided that there needed to be a forum where people could discuss local politics in an adult manner without being censored.

When did you set up the blog and how did you go about it?

Tony Walley and I founded the site in September 2008.

What other blogs, bloggers or websites influenced you?

Pits n Pots was one of the first sites in the political scrutiny forum so it is difficult to name other sites as being influencers.

How did – and do – you see yourself in relation to a traditional news operation?

We like to think we have the same ethos as traditional news operations, wanting to provide news and views to interested parties, but without the need to sex up or sensationalise stories.

We don’t carry any significant advertising and don’t need to sell papers.

We research and write our articles because we feel there is a gap in the market.

How are we different? We run very light and are able to react to stories very quickly where the local press can only really publish once a day even on their website. We are able to be far more experimental with new platforms and technologies than the traditional media.

How are we the same? – We are the same in so far as we have a desire to put news in to the public domain.

What have been the key moments in the blog’s development editorially?

The article about the Polish Spitfire being used by the BNP was quite a big one that made all the national press.

Our coverage of the EDL rally in Stoke-on-Trent was another key moment. This was the first time we really worked closely with the police. Our YouTube channel was the second most viewed news channel for 2 days after the rally.

What sort of traffic do you get and how has that changed over time?

Depending on the stories on the day we can get anywhere between 3000 & 5000 unique visitors each day.

Flow vs stock – and how people consume news online

I’ve only just come across this post by Robin Sloan applying the economic concepts of flow and stock to online news:

There are two kinds of quantities in the world. Stock is a static value: money in the bank, or trees in the forest. Flow is a rate of change: fifteen dollars an hour, or three-thousand toothpicks a day. Easy. Too easy.

But I actually think stock and flow is the master metaphor for media today. Here’s what I mean:

Flow is the feed. It’s the posts and the tweets. It’s the stream of daily and sub-daily updates that remind people that you exist.

Stock is the durable stuff. It’s the content you produce that’s as interesting in two months (or two years) as it is today. It’s what people discover via search. It’s what spreads slowly but surely, building fans over time.

Sloan’s argument is that ‘flow’ – constant updates – is very much the focus of attention at the moment, but that ‘stock’ should not be neglected.

It’s a very appealing metaphor – not least because research suggests that users’ consumption of online news matches it. A previous study by Associated Press backs Sloan’s argument, finding that users were overwhelmed by constant updates but hungered for more depth. It recommended that publishers invest more resources in deeper reporting – in other words, in ‘stock’.

Pablo Boczkowski’s new book, News At Work, explores many of these issues too. In looking at how people consume news at work he identifies how users check the news online:

“The first visit [of the day] to an online news site is undertaken routinely in a double sense: each individual usually conducts this visit during the same part of the day and looks at news sites according to a navigation process that varies little from one day to the next …

“Subsequent visits differ from the initial visit in several dimensions. They take place at nonroutine times and intervals. They do not occur at fixed times of the day … but depend on the availability of downtime …

“These visits are often motivated by the need for a distraction or for more information after learning about an event … Subsequent visits … are not comprehensive, methodical, long, or marked by the strong presence of print[-originated] content. Instead they are limited, disorganised, brief and focused on breaking news or some other form of novel content.”

This is the focus on flow. However, it’s hard to tell whether that user behaviour might change if some of the context surrounding it changed. For example, Boczkowski’s focus is on consumption at work, where users are taking a quick break from what they should be doing. Elsewhere in the same chapter he identifies how many users avoid consuming news online outside of work because the computer reminds them of their work. Would this change with news consumption on more consumer-focused platforms such as mobile phones, tablets, web-enabled televisions and games consoles? Similarly, as the nature of work itself changes and more people work from home, will that change their behaviour? (Boczkowkski’s own findings suggest this is indeed the case for those people).

h/t Noah Brief.

On publishing – and deleting – allegations online

TechCrunch’s Paul Carr has a thoughtful piece on “cyber-vigilantism” where citizens witness or experience a crime and go online to chase it down, name the alleged perpetrators, or pressure the authorities out of complacency:

“[W]hen that naming happens, the case is over before it’s begun: no matter whether the accused is guilty or innocent, they are handed a life sentence. Until the day they die, whenever a potential employer or a new friend Googles their name – up will come the allegation. And, prison terms notwithstanding, that allegation carries the same punishment as guilt – a lifetime as an unemployable, unfriendable, outcast. There’s a reason why the Internet is a great way to ruin someone with false allegations – and it’s the same reason why falsely accused people are just as likely to harm themselves as guilty people.”

The post was written after TechCrunch decided to delete a story about an alleged sexual assault and is a useful read in provoking us as journalists in any medium to reflect on how we treat stories of this type.

There are no hard rules of course, and associated legal issues vary from country to country.

In the Judith Griggs case, for example, was I right to post on the story? My decision was based on a few factors: firstly, I was reporting on the actions of those on her magazine’s Facebook page, rather than the ‘crime’ itself (which was hardly the first time a publisher has lifted). Secondly, I waited to see if Griggs responded to the allegations before publishing. Thirdly, I evaluated the evidence myself to see the weight of the allegations. Still, I’d be interested in your thoughts.

Cooks Source: What should Judith Griggs have done?

It’s barely 24 hours since the Cooks Source/Judith Griggs saga blew up, but so much has happened in that time that I thought it worth reflecting on how other publishers might handle a similar situation.

Although it’s an extreme example, the story has particular relevance to those publications that rely on Facebook or another web presence to publish material online and communicate with readers, and might at some point face a backlash on that platform.

In the case of Cooks Source, their Facebook page went from 100 ‘likes’ to over 3,000, as people ‘liked’ the page in order to post a critical comment (given the huge numbers of comments it’s fair to say there were many more people who un-‘liked’ the page as soon as their comment was posted). The first question that many publishers looking at this might ask is defensive:

Should you have a Facebook page at all?

It would be easy to take the Cooks Source case as an indication that you shouldn’t have a Facebook page at all – on the basis that it might become hijacked by your critics or enemies. Or that if you do create a page you should do so in a way that does not allow postings to the wall.

The problem with this approach is that it misunderstands the fundamental shift in power between publisher and reader. Just as Monica Gaudio was able to tell the world about Judith’s cavalier attitude to copyright, not having a Facebook page (or blog, etc.) for your publication doesn’t prevent one existing at all.

In fact, if you don’t set up a space where your readers can communicate with you and each other, it’s likely that they’ll set one up themselves – and that introduces further problems.

If you don’t have a presence online, someone else will create a fake one to attack you with

After people heard about the Cooks Source story, it wasn’t long before some took the opportunity to set up fake Twitter accounts and a Facebook user account in Judith’s name. (UPDATE: Someone has registered JudithGriggs.com and pointed it at the Wikipedia entry for ‘public domain’, while a further Cooks Source Facebook page has been set up claiming that the original was “hacked”)

These were used in various ways: to make information available (the Twitter account biography featured Judith’s phone number and email); to satirise Judith’s actions through mock-updates; and to tease easily-annoyed Facebook posters into angry responses.

Some people’s responses on Facebook to the ‘fake’ Judith suggested they did not realise that she was not the real thing, which leads to the next point.

A passive presence isn’t enough – be active

Judith obviously did have a Facebook account, but it was her slowness to respond to the critics that allowed others to impersonate her.

Indeed, it was several hours before Judith Griggs made any response on the Facebook page, and when she did (assuming it is genuine – see comments below) it was through the page’s welcoming message – in other words, it was a broadcast.

This might be understandable given the unmanageable volume of comments that had been posted by this time – but her message was also therefore easily missed in the depths of the conversation, and it meant that the ‘fake’ Judith was able to continue to impersonate her in responses to those messages.

One way to focus her actions in a meaningful way might have been to do a ‘Find’ on “Griggs” and respond there to clarify that this person was an imposter.

Instead, by being passive Judith created a vacuum. The activity that filled that vacuum led in all directions, including investigating the magazine more broadly and contacting advertisers and stockists.

Climb down quickly and unreservedly

While being passive can create a vacuum, being active can – if not done in a considered way – also simply add fuel to the fire.

The message that Judith eventually posted did just that. “I did apologise to Monica via email, but aparently [sic] it wasnt enough for her,” she wrote, before saying “You did find a way to get your “pound of flesh…””.

This “blaming the victim”, as one wall poster described it, compounded the situation and merely confirmed Judith’s misunderstanding of the anger directed at her.

An apology clearly wasn’t what people wanted – or at least, not this sort of reserved apology.

A quicker, fuller response that demonstrated an understanding of her community would have made an enormous difference in channeling the energy that people poured into what became an increasingly aggressive campaign.

UPDATE (Nov 9): As of a few hours ago Cooks Source appear to have published an official statement which includes a more fullsome apology. The statement doesn’t help, however, partly because it doesn’t address the key issues raised by critics about where it gets content and images from, partly because its sense of priorities doesn’t match those of its audience (the apology comes quite late in the statement), and partly because it is internally inconsistent. Commenters on the Facebook page and blogs have already picked these apart.

There’s also a wonderful ‘corrected’ version of the statement which does an impeccable job of illustrating how they should have phrased it.

Engage with criticism elsewhere

The Cooks Source Facebook page wasn’t the only place where people were gathering to criticise and investigate the magazine. On Reddit hundreds of users collaborated to find other breaches of copyright, put up contact details for the copyright holders, and list advertisers that people could contact. Someone also created a Wikipedia entry to document Griggs’ instant notoriety.

Even if Judith had shut down the Facebook page (not a good idea – it would have merely added further fuel to the fire), the discussion – which had now become a campaign and investigation – was taking place elsewhere. Engaging in that in a positive way might have helped.

A magazine is not just content

One of the key principles demonstrated by the whole affair is that magazines are about much more than just the content inside, but about the community around it, and their values. This is what advertisers are buying into. When I asked one of Cooks Source’s advertisers why they decided to withdraw their support, this is what they said:

“I would estimate that between the emails, [Facebook] messages, calls, and people following us on Twitter, we’ve been contacted by more than 100 people since I first heard of this about 5 hours ago. That doesn’t include many many people who commented on fb to our posts stating that we had requested to pull our ads from the publication. We are just simply trying to run our small business, which by most standards is still in its infancy, and being associated with publications like this that don’t respect its readers (who are all our potential customers) is unacceptable to us in light of their practices. What angers me even more is the fact that it is being made light if by the Editor herself. It is disrupting our business and linking us to something we do not support.”

Postscript: How it unfolded, piece by piece

Kathy E Gill has a wonderfully detailed timeline of how the story broke and developed which offers further lessons in how a situation like this develops.

Cooks Source magazine gets Facebook backlash for copying material without permission

UPDATE 7: The official Cooks Source webpage now features a rather confusing statement on the saga, apologising to Monica Gaudio and saying they have made the donation asked for. The page claims that their Facebook page was “cancelled” and “since hacked”. It’s not clear what they mean by these terms as the original Facebook page is still up and, clearly, could not be hacked if it had been “cancelled”. They may be referring to the duplicate Facebook page which also claims (falsely) the original was “hacked”. In addition the statement says they have “cancelled” their website – but as the statement is published on their website it may be that by “cancelled” they mean all previous content has been removed. This discussion thread picks out further inconsistencies and omissions.

UPDATE 5: The magazine’s Facebook page has now been updated with a message from editor Judith saying she “did apologise” but “apparently it wasn’t enough for her”, shown below:

Well, here I am with egg on my face! I did apologise to Monica via email, but aparently it wasnt enough for her. To all of you, thank you for your interest in Cooks Source and Again, to Monica, I am sorry -- my bad! You did find a way to get your

UPDATE 2: Reddit users have been digging further into the magazine’s use of copyrighted content. They’ve also identified a planned sister magazine, whose Facebook page has also been the recipient of a few comments.

UPDATE 6: Edward Champion has chased down the copyright holders of both text and images found in Cooks Source which appear to have been used without permission.

UPDATE 4: A list of mainstream media reports on the story is also being maintained on the magazine’s Facebook page.

***ORIGINAL BLOG POST STARTS HERE***

For much of today people have been tweeting and blogging about the magazine editor with 30 years’ experience demonstrating a by now familiar misunderstanding of copyright law and the ‘public domain’.

The blog post on Tweetmeme - shared over 1500 times

Reddit: Website article gets copied without permission by print magazine - website complains - magazine claims website should pay them for the publicity

To the writer whose material they used without permission she apparently responded that “the web is considered “public domain” and you should be happy we just didn’t “lift” your whole article and put someone else’s name on it!”

What makes this of particular interest is how the affair has blown up not just across Twitter and Reddit but on the magazine’s own Facebook page, demonstrating how this sort of mistake can impact very directly on your own readers – and stockists and advertisers:

As an advertiser, we are disappointed in Cook's Source and we are pulling our ads from this publication. Many of us (as is the case with our business) paid several months in  advance for advertising and are unlikely to get any compensation back.  We ask that you please stop emailing our business, we agree that the  publication made a grave error, but the blame should be placed with  them. Please do not make small businesses like mine pay for their error  in judgment

Facebook comment

Jim Cobb Perhaps someone should obtain a recent copy of the magazine and begin contacting any paid advertisers. Y'know, to clue them in on the business practices of Cooks Source Magazine. They might be interested in hearing about it.

Jon F. Merz If I could draw everyone's attention to the photos down below which contain reprints of magazine pages, that include all of their advertisers. Let's start calling these places up and letting these advertisers know that the money they pay goes to keep a rag like this in business. Hurt 'em where it counts!

Kristine Weil In light of your blatant theft of Monica Gaudio's article and the dismissive response of editor Judith Griggs when called on it by the author, I will be personally speaking to the manager of our local grocery store to encourage them to stop carrying your magazine, and I will continue to speak to them every week until

Meanwhile, others were suggesting investigating the magazine further:

It all adds up to a perfect lesson for magazine editors – not just in copyright, but in PR and community management.

UPDATE 1: It seems that users are going through the latest issue and suggesting where the content may have been taken from.

UPDATE 3 On a separate topics page on the Facebook page the details are being collated.

Open data from the inside: Lichfield Council’s Stuart Harrison

I’m trying to get a feel for what some of the most innovative government departments and local authorities are doing around releasing data. I spoke to Stuart Harrison of Lichfield Council, which is leading the way at a local level.

What has been your involvement with open data so far?

I’ve been interested in open data for a few years now. It all started when I was building a site for food safety inspections in Staffordshire (http://www.ratemyplace.org.uk/), and after seeing the open APIs offered by sites such as Fixmystreet, Theyworkforyou etc, was inspired to add an API (http://www.ratemyplace.org.uk/api). This then got me thinking about all the data we publish on our website, and whether we could publish this in an open format. A trickle quickly turned into a flood and we now have over 50 individual items of open data at http://www.lichfielddc.gov.uk/data.

I think the main thing I’ve learnt is that APIs are great, but they’re not always necessary. My early work was on APIs that link directly into databases, but, as I’ve moved forward, I’ve found that this isn’t always necessary. While an API is nice to have, it’s sometimes much better to just get the data out there in a raw format.

What have people done with the data so far?

As we’re quite a small council, we haven’t had a lot of people doing work (that I know of) with much of our data. The biggest user of our data is probably Chris Taggart at Openly Local – I actually built an API (and extended the functionality of our existing councillor and committees system) to make it easier to republish. To be honest, unless I know the person and they actually told me, I doubt I’d actually know what was going on!

What do you plan to do next – and why?

Because of the problems stated before, we’ve got together with ScraperWiki to organise a Hacks and Hackers day on the 11th November, which will hopefully encourage developers and journalists to do something with our data, and also put the wheels in motion for organising a data-based community, which means that once someone does something with our data, we’re more likely to know about it!