A mob is just a crowd you’re not part of

Mobs have been very much back in the spotlight over the past couple of weeks. The Cooks Source saga was followed up by the lesser Dairy Goat Journal bunfight, while in the physical world students demonstrated in London and the Fitwatch blog providing advice to those students was shut down by police. In each case onlookers conjured up the spectre of “the mob” – a term whose primary definition – “A large disorderly crowd or throng” – belies the array of discourses that underpin it, partly related to its secondary definition – “The mass of common people; the populace”.

In other words, “mob” is a term used to frame a debate in emotional terms, to dismiss what may be a genuine outpouring of anger or resentment as invalid or illegitimate. For those reasons, I flinch when people talk about mobs instead of crowds.

Last week the video above was uploaded to YouTube. It shows a presentation describing the events leading up to a fatal flash mob. The story is fictional but the events it is constructed from are real*.

The prospect of such a series of events happening is terrifying and rightly thought-provoking: I would recommend it as a way of exploring journalism ethics in a networked age.

But the video is a syllogism – it makes an apparently logical (implied) argument that because these events have all happened and they are connected in technical terms, they could, eventually, all happen together.

The obvious flaw here is statistical: the probability of all those events leading to another is of a different order. We can all imagine all possible worlds.

It reminds me of Eric Morecambe’s joke, when pulled up on his piano playing, that he was playing all the right notes “but not necessarily in the right order”.

But the major flaw is logical. We are led from cause to effect, major to minor premise, but the ultimate event really has no connection with its beginnings. When large numbers of people gather in one place, sometimes it turns into a riot and people get killed. Technology has not changed that. Perhaps it makes it easier to do so – perhaps it makes it easier to disperse crowds when things go awry, or to call for assistance. Most likely all of the above. It’s the same technologically determinist mindset that blames Google Maps for terrorist attacks. As Douglas Adams put it in 1999:

“Newsreaders still feel it is worth a special and rather worrying mention if, for instance, a crime was planned by people ‘over the Internet.’ They don’t bother to mention when criminals use the telephone or the M4, or discuss their dastardly plans ‘over a cup of tea,’ though each of these was new and controversial in their day.”

In short: Don’t Panic.

*In the comments on YouTube Danosuke points out: “The one instance they cite as “already happened” was not a riot at all. There was no reported property damage or injuries. This is pointless fear mongering.”

Solving buggy behaviour when scraping data into Google spreadsheets

Tony Hirst has identified some bugs in the way Google spreadsheets ‘scrapes’ tables from other sources. In particular, when the original data is of mixed types (e.g. numbers and text). The solution is summed up as follows:

“When using the =QUERY() formula, make sure that you’re importing data of the same datatype in each cell; and when using the =ImportData()formula, cast the type of the columns yourself… (I’m assuming this persists, and doesn’t get reset each time the spreadsheet resynchs the imported data from the original URL?)”

Extractiv: crawl webpages and make semantic connections

Extractiv screenshot

Here’s another data analysis tool which is worth keeping an eye on. Extractiv “lets you transform unstructured web content into highly-structured semantic data.” Eyes glazing over? Okay, over to ReadWriteWeb:

“To test Extractive, I gave the company a collection of more than 500 web domains for the top geolocation blogs online and asked its technology to sort for all appearances of the word “ESRI.” (The name of the leading vendor in the geolocation market.)

“The resulting output included structured cells describing some person, place or thing, some type of relationship it had with the word ESRI and the URL where the words appeared together. It was thus sortable and ready for my analysis.

“The task was partially completed before being rate limited due to my submitting so many links from the same domain. More than 125,000 pages were analyzed, 762 documents were found that included my keyword ESRI and about 400 relations were discovered (including duplicates). What kinds of patterns of relations will I discover by sorting all this data in a spreadsheet or otherwise? I can’t wait to find out.”

What that means in even plainer language is that Extractiv will crawl thousands of webpages to identify relationships and attributes for a particular subject.

This has obvious applications for investigative journalists: give the software a name (of a person or company, for example) and a set of base domains (such as news websites, specialist publications and blogs, industry sites, etc.) and set it going. At the end you’ll have a broad picture of what other organisations and people have been connected with that person or company. Relationships you can ask it to identify include relationships, ownership, former names, telephone numbers, companies worked for, worked with, and job positions.

It won’t answer your questions, but it will suggest some avenues of enquiry, and potential sources of information. And all within an hour.

Time and cost

ReadWriteWeb reports that the process above took around an hour “and would have cost me less than $1, after a $99 monthly subscription fee. The next level of subscription would have been performed faster and with more simultaneous processes running at a base rate of $250 per month.”

As they say, the tool represents “commodity level, DIY analysis of bulk data produced by user generated or other content, sortable for pattern detection and soon, Extractiv says, sentiment analysis.”

Which is nice.

New UK site launches to tackle lobbying data

Who's Lobbying treemap

I’ve been waiting for the launch of Who’s Lobbying ever since they stuck up that little Post-It note on a holding page in the run-up to the general election. Well now the site is live – publishing and visualising lobbying data, beginning with information about “ministerial meetings with outside interests, based on the reports released by UK government departments in October.”

This information is presented on the homepage very simply: with 3 leaderboards and a lovely search interface.

Who's Lobbying homepage

There are also a couple of treemaps to explore, for a more visual (and clickable) kick.

These allow you to see more quickly any points of interest in particular areas. The Who’s Lobbying blog notes, for instance, that “the treemap shows about a quarter of the Department of Energy and Climate Change meetings are with power companies. Only a small fraction are with environmental or climate change organisations.”

It also critically notes in another post that

“The Number 10 flickr stream calls [its index to transparency] a “searchable online database of government transparency information”. However it is really just a page of links to department reports. Each report containing slightly different data. The reports are in a mix of PDF, CSV, and DOC formats.

“Unfortunately Number 10 and the Cabinet Office have not mandated a consistent format for publishing ministerial meeting information.

“The Ministry of Defence published data in a copy-protected PDF format, proventing copy and paste from the document.

DEFRA failed to publish the name of each minister in its CSV formatted report.

“The Department for Transport is the only department transparent enough to publish the date of each meeting.

“All other departments only provided the month of each meeting – was that an instruction given centrally to departments? Because of this it isn’t possible to determine if two ministers were at the same meeting. Our analysis is likely to be double counting meetings with two ministers in attendance.

“Under the previous Labour government, departments had published dates for individual meetings. In this regard, are we seeing less transparency under the Conservative/Lib Dem coalition?”

When journalists start raising these questions then something will really have been achieved by the open data movement. In the meantime, we can look at Who’s Lobbying as a very welcome addition to a list of sites that feels quite weighty now: MySociety’s family of tools as the grandaddy, and ElectionLeaflets.org (formerly The Straight Choice), OpenlyLocal, Scraperwiki, Where Does My Money Go? and OpenCharities as the new breed (not to mention all the data-driven sites that sprung up around this year’s election). When they find their legs, they could potentially be quite powerful.

Hyperlocal Voices: Hedon Blog (Ray Duffill)

Hyperlocal voices: Hedon Blog

The Hedon Blog covers communities in Hedon, East Yorkshire. Established by Ray Duffill at the beginning of last year, he has since gone on to launch the HU12 site as well. This post is part on the ongoing Hyperlocal Voices series.

Who were the people behind the blog, and what were their backgrounds?

I set the Hedon Blog up after being made redundant from a career in Community Development.

What made you decide to set up the blog?

The Hedon Blog was set up as a hobby to keep my ‘hand-in’ with new social media tools I’d discovered on the web whilst working in my previous job as a Community Development Manager in Blackpool.

Specifically, I wanted to find out if Hedon had any community and voluntary groups operating in the area. On the surface it seemed that very little community activity was going on in the town. That was my initial impression and a view shared by neighbours and relatives who had lived in the area much longer.

The process of setting up the blog and nurturing its development has enabled me to re-discover my home town. Hedon is no longer just the place I live – it’s a place I’m proud of and love!

When did you set up the blog and how did you go about it?

I set up the blog on WordPress.com. It took me two minutes to set up and think of the highly original “Hedon Blog” title.

The first post was written in February 2009. I pressed ‘publish’ and thought “What next?”. I had no plan and no real objectives or goals to aim towards. This is not a model to follow!

Using my legs, eyes and ears I explored and unearthed the ‘undiscovered country’ of a small but thriving community infrastructure in the town. I reported back on my findings on the blog. And, as the ‘word-of-mouth’ spread, then people began sending me in notices of community events and other activities in the town.

What other blogs, bloggers or websites influenced you?

Whilst working in Blackpool I had found about Nick Booth‘s (Podnosh) ‘Social Media Surgeries‘ taking place in Birmingham. Inspired by those, I made an early commitment that I would only use social-media tools that were free, easy to use and share, and that could be easily taught to others.

The internet should be about liberating community news and information. I abide by these ideas with the Hedon Blog. Any community can do what I do – you don’t need shed-loads of dosh in order to obtain an effective online voice. Having financial backing and a friendly geek obviously helps – but they are not essential.

The next major influence was Talk About Local and its first ‘un-conference’ in Stoke. From being an isolated individual I was suddenly part of a major phenomenon that involved people from across the country and the world. We even had a name for what we were doing – hyperlocal!

Adam Westbrook has been the other major influence on the blog’s development. I heard him speak and was inspired by his views on the future of journalism.

Locally, in Hull, digital developer Jon Moss has helped through setting up Hull Digital. Individuals met through this network have offered me enormous encouragement and support.

How did – and do – you see yourself in relation to a traditional news operation?

I obtained Adam Westbrook’s e-book on Newsgathering for Hyperlocal Websites and now run the site as a news gathering operation.

Learning from some of the journalistic methods described in that publication has enabled me to put the blog on a professional footing and achieve a credibility in the eyes of public and private sector organisations (as well as voluntary and community groups) who now regularly supply me with press releases and other material.

In this sense I have ‘borrowed’ from the traditional media those things that can help me promote, inform and help build communities in my town.

What have been the key moments in the blog’s development editorially?

The Hedon Blog now sits as part of a wider website family under the www.hu12.net banner. This means I can concentrate community news via the Hedon Blog but now have an outlet for more contentious and controversial material – and a means to obtain some advertising income.

What sort of traffic do you get and how has that changed over time?

I have grown a local audience largely by word of mouth. In my fist month of operation I got 213 visits (WordPress stats) but get those figures and more every day now with occasional daily spikes of over 500 – 800 visits.

I never approached this from a business or journalist point of view – but rather as a civic duty or community activity. The downside of this approach is the obvious: a lack of income to re-invest in the project and to pay for its main motivating force: Me!

This activity has brought me great pleasure but has been draining on time and personal resources.

An exercise in interactive thinking

I’ve just run through an exercise with my class of students from the MA in Television and Interactive Content at Birmingham City University. The exercises are intended to get them to think about the web as more than just a repository of content, but a platform that people use in different ways depending on who they are and what they want to do.

I thought I would share them here – for my own record if nothing else.

What’s your topic – who is your userbase?

Most editorial productions begin with a topic, and an angle on that topic. They also have a particular audience in mind, which dictates the tone that is taken in its production: a documentary aimed at 5-year-olds is going to have a very different tone to one aimed at 25-year-olds, even if the topic is the same.

This gives you the People bit of the POST method I’ve written about previously – the starting point for everything that follows.

From there then you can identify

  • The objectives of those people*.
  • How you might help those people meet those objectives (the strategy and technology)
  • Which of those strategies match your own objectives – or those of the person you’re pitching to

*In the original POST method the objectives are yours, but I would suggest starting with users’ objectives because you need a mutually beneficial outcome.

Here’s how it works in practice: Continue reading

Data cleaning tool relaunches: Freebase Gridworks becomes Google Refine

When I first saw Freebase Gridworks I was a very happy man. Here was a tool that tackled one of the biggest problems in data journalism: cleaning dirty data (and data is invariably dirty). The tool made it easy to identify variations of a single term, and clean them up, to link one set of data to another – and much more besides.

Then Google bought the company that made Gridworks, and now it’s released a new version of the tool under a new name: Google Refine.

It’s notable that Google are explicitly positioning Refine in their video (above) as a “data journalism” tool.

You can download Google Refine here.

Further videos below. The first explains how to take a list on a webpage and convert it into a cleaned-up dataset – a useful alternative to scraping:

The second video explains how to link your data to data from elsewhere, aka “reconciliation” – e.g. extracting latitude and longitude or language.

What inflation has to do with the price of fish

Inflation image by Gregor Rohrig

Inflation image by Gregor Rohrig - click to see source

One of the forms of data that journalists frequently have to deal with is prices. And while it’s one thing to say that things are getting more expensive, making a meaningful comparison between what things cost now and what things cost then is a different kettle of fish altogether.

Factoring in inflation can make all the difference between arbitrary comparisons that provide no insight whatsoever, and genuinely meaningful reporting.

Thanks to computing power it’s actually quite easy for journalists to factor inflation into their reporting – by using an inflation calculator. It’s also easier to find historical price data with data-driven search engines like Wolfram Alpha.

But inflation is only half of the calculation you need. The other is earnings.

Professor Ian Stewart illustrates this perfectly in this article in The Telegraph:

“[A] 1991 pint cost around £1.40, which is £1.80 in today’s money. The current price is around £2.80, so beer really is more expensive. On the other hand, the average salary in 1991 was £19,000, and today it is £38,000. Relative to what we earn, a pint costs exactly the same as it did 19 years ago.

“Our house? That would be £125,000 today, so it has gone up by 84 per cent. Relative to average earnings, however, the increase is only 10 per cent.

“The Guardian knows about inflation, and said that the pub pint has increased by 68 per cent in real terms. But this compares the real increase in new money with the original price in old money. If I did the calculation like that for my house it would have gone up by 850 per cent. Calculated sensibly, the rise in the price of beer is about 55 per cent relative to inflation, and zero per cent relative to earnings.”

Of course the danger in averages is that they only illustrate aggregate change, and if you’re talking about a purchase that a particular section of the population makes – or you’re only talking to a particular region – then a national average may not be as meaningful a comparison to make.

If the poor are getting poorer and the rich richer then a pint of beer really is more expensive for some – and cheaper for others – than it used to be. Likewise, particular parts of the country might be suffering more from house price increases than others because of local average wages and local house prices.

It’s also worth pointing out that, when talking about financial data, a median is a much more useful measure to take than a mean.

Finally, aside from the statistical considerations it’s worth coming back to some of the basics of pricing. Ian again:

“There are two things to remember about prices. One is basic economics: if something gets too expensive for people to buy it, they don’t. So prices and wages have to stay in step, broadly speaking – though with big fluctuations in some commodities, such as housing. The other is inflation. We all know it exists, but we forget that when we start comparing prices. ‘My God! A Ford Anglia cost only £295 in 1940!’ True, but the average salary then was £370. The equivalent price today is £30,000, which will buy you a Jaguar XF.”

Cooks Source anger moves on to Dairy Goat Journal’s Dave Belanger

Cooks Source fake Facebook page discusses Dairy Goat Journal

UPDATE 2 – from Cathy in the comments (Nov 11): Dave Belanger has now paid the fee.

UPDATE – thanks to Vicki in the comments (Nov 11): Dave Belanger has responded to Suzanne, reinstating the image on their website with a credit and link, and offering to pay. However, he has refused to pay the amount requested by Suzanne, and Suzanne is now planning to take the magazine to court. Her reasoning is admirable, and it’s fair to say that contributions of commenters have helped her to make a well-informed stance:

“Countryside Publications is a five million dollar company. He accused me of being opportunistic by asking for an increased fee for the unauthorized and uncredited use.

“This is not about money. I may never see the $2100. If I do, it will be a long time from now. If I wanted to make a quick buck, I’d take the $500 [offered]. (I could use it.) But if I let him not only steal the photo but pay no penalty for it, there’s no reason for him to not steal again. After all, what did it cost him? He can steal photos all he wants and only pay for them (at a price he sets) if he’s caught. Just who is opportunistic? He published my photo without authorization or credit then says, here, take $500 or NOTHING.”

There’s also some detail about the possible impact on the publishers from Internet users:

“P.S. He mentioned receiving phone calls and emails from my readers and said he was not concerned about it. He admitted there had also been some subscription cancellations, but that people cancelled subscriptions and started subscriptions every day and that he had no reason to believe any subscription cancellations were related to his treatment of my work.”

The original post:

Oh dear. It appears another magazine editor is about to feel the force of a thousand emails following a blogger’s complaint of breach of copyright and – more importantly – said editor’s response to their request for fair payment and acknowledgement of authorship.

The editor in question is Dave Belanger who – apparently – hung up on Suzanne McMinn when she called to ask that her photo – used in Dairy Goat Journal – was properly credited.

With 80 comments already – many of them saying they have called and written to the magazine – and the case also being discussed on the fake Cooks Source Facebook page – you can only hope Dave looks at the Cooks Source and reacts quickly.

*All about this that I can find looks credible, but I’m extra cautious of this being an opportunistic hoax.

via Ulrike in the comments.

Facebook ‘mentions’ – a ‘talking point’ engine

Facebook mentions mockup

Mockup by Oliver Chiang - click to see original article

Oliver Chiang reports on a new feature being tested by Facebook that he dubs ‘mentions’. It tells you if more than one of your friends mentions the same thing – a celebrity, for example, or an event, or organisation.

I’ve often said that Facebook is the ultimate news publisher. ‘What my friends are doing’ was powerful enough, and they have since expanded their editorial proposition (with Pages) into ‘What companies and personalities I am interested in are doing’ too. The ‘mentions’ feature appears to extend that concept further, essentially identifying what is a ‘talking point’ in your social circle and circles of interest, and bringing that to your attention.

That is, of course, one of the things that journalists do.

Of additional interest are the obvious commercial applications of this technology. In fact, the focus at the moment on linking these ‘talking points’ to official ‘Pages’ drives that commercial application home rather forcefully. This overly-commercial application may in fact be a weakness – and it will be interesting to see if it is tweaked before being rolled out (Facebook’s history suggests they are more likely to tweak it than withdraw it).

In the meantime, publishers should be watching closely to see how advertisers respond to the potential of ‘mentions’ – and if there is any way they might adapt similar ideas for their own users and advertisers.