Monthly Archives: March 2012

A useful tool for creating a search interface for your data: freeDive

Here’s a solution to a problem that aspiring data journalists have come up against time and time again: how to quickly create a searchable interface to your dataset.

freeDive is quick and – if you can follow the wizard‘s instructions – easy too. Continue reading

What you need to know about the laws on harassment, data protection and hate speech {UPDATED: Stalking added}

The following is taken from the law chapter of The Online Journalism Handbook. The book blog and Facebook page contain updates and additions – those specifically on law can be found here.


The Protection From Harrassment Act 1997 is occasionally used to prevent journalists on reporting on particular individuals. Specifically, any conduct which amounts to harassment of someone can be considered to a criminal act, for which the victim can seek an injunction (followed by arrest if broken) or damages.

One example of a blogger’s experience is illustrative of the way the act can be used with regard to online journalism, even if no case reaches court. Continue reading

6 ways to get started in community management

Following on from my previous post on the network journalist role, and as part of a wider experiment around the 5 roles in an investigations team, I wanted to flesh out what exactly a community editor role means when adopted as part of a journalism project.

First I need to add a disclaimer: the terms “community editor” (CE) and “community manager” (CM) are used to refer to a very wide range of jobs in a number of industries. I’m not sure what distinction there is – if any – but my hunch is that the title ‘community editor’ has been overtaken by its ‘manager’ variation because it rightly places the focus more on the community than its content.

Even within journalism, the role can vary enormously. This is partly because the communities themselves, and the challenges that they represent, differ so much. For example:

  • A mass market – anonymous and diverse, which a CM must try to somehow ‘convert’ into one or more healthy niche communities
  • No community – the CM is asked to ‘build it from scratch’
  • The CM’s website(s) already have active communities. The CM’s role is to maintain, support, and further develop those.
  • Communities exist, but not on the website(s) of the CM’s employer – the CM is asked to engage with those (this is more of a Community Editor role)

This post will be dealing with the last situation, which is the one in which most journalists first find themselves: with neither a platform nor a community.

With that out of the way, here are 6 things I think an individual can do as part of their first foray into community management/editing as a journalist:

1. Know where the communities are

This seems like a no-brainer but it’s all too easy to miss communities because you can’t find any evidence of them on Twitter or Facebook.

Along with specialist social networks (LinkedIn for professionals; MySpace for musicians; even profession-specific networks such as, there are forums, wikis, mailing lists, and various other places where people gather to share information and support.

The social media prism by Brian Solis (shown below) is one useful tool for checking if you’ve covered every possible angle on this front. For example: have you thought of looking for your community on Flickr? Digg? A locally popular platform? (LiveJournal dominates in eastern Europe, for example, while QQ is China’s answer to Twitter, Mixi is Japan‘s answer to Facebook, and Orkut and Hi5 have a healthy userbase in places like Brazil and India)

social media prism

Look for the communities in every corner of the net – don’t expect them to come pre-labelled. For example, the forums of local football clubs and local newspapers often contain corners devoted to topics other than football and news. And follow people as well as topics: if you find someone in your field, search for their username across the web and see what other places they contribute to.

One other place to look: the physical world. Live events, conferences, meetups and other gatherings are ideal places to build relationships with members of the community – as well as a great opportunity for providing live coverage online that will lead you to others, and others to you.

2. Look for problems to solve

Once you’ve identified and are following the community, try to find your best role within it. Remember that the community is not here to serve you: barging in and asking for case studies will get you the same response as if you did that in any physical social space: blank stares and muttered insults.

The simplest way to find a place in a community is through solving problems.

Listen for questions that people are asking, or complaints that they make. A key skill of a journalist is to find the answers to questions, or get responses to complaints – so that’s likely to be one way you can contribute.

Those answers and responses, of course, also make for good evergreen content (which can help you attract other members of the community), so cross-post them on your blog as well as on the platforms where the community gathers.

You might also see the need for physical meetups or other events – don’t be afraid to get stuck in and organise one.

3. Be interested – listen and ask questions

You will be both a better journalist and community editor if you listen as much as possible, and ask when you want to hear more about something.

It doesn’t have to be newsworthy – in fact, it’s sometimes better when it’s not – because often it’s an understanding of the small details and complex context which makes for better journalism and, by extension, better – and more – relationships with contacts.

4. Create content out of the process of discovery

As you explore a community a good practice to adopt is to record your research in ways that make it easier for others to engage with the community too. This helps you see what is interesting about a community, as well as creating content which can help contacts find you.

Examples of typical content created from the process of community research include:

  • ‘Top 20 people in [an industry] to follow on Twitter’
  • ‘The best forums for [your field or issue]‘
  • ‘The hottest discussions about the [issue] right now’
  • ‘Where do [profession] go for advice on [problem]?’
  • ‘The best blogs about [your field/issue]‘
  • ‘Forums roundup: what people are saying about [issue/question]‘

You may need to make a choice on where to post this content, especially if the community is not a big user of blogs. Don’t publish in a way that is disconnected from the community that you are supposed to be serving: at the very least cross-publish to the platforms where discussion is healthiest. Don’t spam shared spaces with links to external content.

You might also profile members of the community, or – at a later stage – create something that pulls together profiles, points of view, or experiences. For example: this Times Educational Supplement piece collects excluded pupils’ experiences (incomplete version online); We Are The 99 Percent uses Tumblr to pull together the experiences that inspired a protest movement; and Spitalfields Life seeks to document the places and people of the area, while this Guardian interactive allows you to explore the voices of 100 NHS workers on health reforms.

5. Link, retweet, attribute and comment

Finally, it’s important to link to content from your community as often as possible. This does two things: firstly, it demonstrates good attribution and demonstrates that you are not looking to take credit for yourself which belongs to others. And secondly, it makes other people aware of your work: a link to another blog generates a ‘pingback’ which alerts the author to your piece. Twitter users are notified if their tweet is retweeted by you, Facebook users if you ‘like’ their update, and so on. Comments are an extension of the same principle of acknowledgement.

Linking – or ‘linkblogging’ – is also the simplest way to begin engaging with a field and its communities, and a good habit to get into if you want to understand an area and get in the habit of keeping up to date with it. For more on that, 7 ways to follow a field… is a good guide.

6. Read about community management

As you gain in confidence and reputation, you may find yourself doing more and more in your community. Community management is, to my mind, one of the hardest roles in online journalism to do well, and the more insights you can gather from others, the better prepared you will be.

This list of resources from FeverBee is as good as they come. You should also follow blogs in the field – that list contains a section on those, but if you just want 5 to start with, here’s a bundle to subscribe to.

PS: If you want to see explanations of job descriptions of the CM, and other roles such as social media manager, this post by Blaise Grimes-Viort does a very good job of trying to unpick the subtle differences and links to typical job descriptions. More on traits of community managers at ReadWriteWebThe Constant Observer and Business2Community.

Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine

Listening to Chris Taggart talking about OpenCorporates at netzwerk recherche conf – data, research, stories, I figured I really should start to have a play…

Looking through the example data available from an opencorporates company ID via the API, I spotted that registered trademark data was available. So here’s a quick roundabout way of previewing trademarked images using OpenCorporates and Google Refine.

First step is to grab the data – the opencorporates API reference docs give an example URL for grabbing a company’s (i.e. a legal entity’s) data:

Google Refine supports the import of JSON from a URL:

(Hmm, it seems as if we could load in data from several URLs in one go… maybe data from different BP companies?)

Having grabbed the JSON, we can say which blocks we want to import as row items:

We can preview the rows to check we’re bringing in what we expect…

We’ll take this data by clicking on Create Project, and then start to work on it. Because the plan is to grab trademark images, we need to grab data back from OpenCorporates relating to each trademark. We can generate the API call URLs from the datum – id column:

The OpenCorporates data item API calls are of the form, which we can generate as follows:

Here’s what we get back:

If we look through the data, there are several fields that may be interesting: the “representative_name_lines (the person/group that registered the trademark), the representative_address_lines, the mark_image_type and most importantly of all, the international_registration_number. Note that some of the trademarks are not images – we’ll end up ignoring those (for the purposes of this post, at least!)

We can pull out these data items into separate columns by creating columns directly from the trademark data column:

The elements are pulled in using expressions of the following form:

Here are the expressions I used (each expression is used to create a new column from the trademark data column that was imported from automatically constructed URLs):

  • value.parseJson().datum.attributes.mark_image_type – the first part of the expression parses the data as JSON, then we navigate using dot notation to the part of the Javascript object we want…
  • value.parseJson().datum.attributes.mark_text
  • value.parseJson().datum.attributes.representative_address_lines
  • value.parseJson().datum.attributes.representative_name_lines
  • value.parseJson().datum.attributes.international_registration_number

Finding how to get images from international registration numbers was a bit of a faff. In the end, I looked up several records on the WIPO website that displayed trademarked images, then looked at the pattern of their URLs. The ones I checked seemed to have the form:
where typ is gif or jpg and XXYYNN is the international registration number. (This may or may not be a robust convention, but it worked for the examples I tried…)

The following GREL expression generates the appropriate URL from the trademark column:

if( or(value.parseJson().datum.attributes.mark_image_type==’JPG’, value.parseJson().datum.attributes.mark_image_type==’GIF’), ‘’ + splitByLengths(value.parseJson().datum.attributes.international_registration_number, 2)[0] + ‘/’ + splitByLengths(value.parseJson().datum.attributes.international_registration_number, 2, 2)[1] + ‘/’ + value.parseJson().datum.attributes.international_registration_number + ‘.’ + toLowercase (value.parseJson().datum.attributes.mark_image_type), ”)

The first part checks that we have a GIF or JPG image type identified, and if it does, then we construct the URL path, and finally cast the filetype to lower case, else we return an empty string.

Now we can filter the data to only show rows that contain a trademark image URL:

Finally, we can create a template to export a simple HTML file that will let us preview the image:

Here’s a crude template I tried:

The file is exported as a .txt file, but it’s easy enough to change the suffix to .html so that we can view the fie in a browser, or I can cut and paste the html into this page…

null null
null null
“[“MURGITROYD & COMPANY”]“ “[“17 Lansdowne Road”,”Croydon, Surrey CRO 2BX”]“
“[“A.C. CHILLINGWORTH”,”GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON EC2M 7BA”]“
“[“A.C. CHILLINGWORTH”,”GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON EC2M 7BA”]“
“[“A.C. CHILLINGWORTH”,”GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON EC2M 7BA”]“
“[“A.C. CHILLINGWORTH”,”GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON EC2M 7BA”]“
“[“BP GROUP TRADE MARKS”]“ “[“20 Canada Square,”,”Canary Wharf”,”London E14 5NJ”]“
“[“Murgitroyd & Company”]“ “[“Scotland House,”,”165-169 Scotland Street”,”Glasgow G5 8PL”]“
“[“BP GROUP TRADE MARKS”]“ “[“20 Canada Square,”,”Canary Wharf”,”London E14 5NJ”]“
“[“BP Group Trade Marks”]“ “[“20 Canada Square, Canary Wharf”,”London E14 5NJ”]“
“[“ROBERT WILLIAM BOAD”,”BP p.l.c. – GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON, EC2M 7BA”]“
“[“ROBERT WILLIAM BOAD”,”BP p.l.c. – GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON, EC2M 7BA”]“
“[“ROBERT WILLIAM BOAD”,”BP p.l.c. – GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON, EC2M 7BA”]“
“[“ROBERT WILLIAM BOAD”,”BP p.l.c. – GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON, EC2M 7BA”]“
“[“MURGITROYD & COMPANY”]“ “[“17 Lansdowne Road”,”Croydon, Surrey CRO 2BX”]“
“[“MURGITROYD & COMPANY”]“ “[“17 Lansdowne Road”,”Croydon, Surrey CRO 2BX”]“
“[“MURGITROYD & COMPANY”]“ “[“17 Lansdowne Road”,”Croydon, Surrey CRO 2BX”]“
“[“MURGITROYD & COMPANY”]“ “[“17 Lansdowne Road”,”Croydon, Surrey CRO 2BX”]“
“[“A.C. CHILLINGWORTH”,”GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON EC2M 7BA”]“
“[“BP Group Trade Marks”]“ “[“20 Canada Square, Canary Wharf”,”London E14 5NJ”]“
“[“ROBERT WILLIAM BOAD”,”GROUP TRADE MARKS”]“ “[“Britannic House,”,”1 Finsbury Circus”,”LONDON, EC2M 7BA”]“
“[“BP GROUP TRADE MARKS”]“ “[“20 Canada Square,”,”Canary Wharf”,”London E14 5NJ”]“

Okay – so maybe I need to tidy up the registration related columns, but as a recipe, it sort of works. (Note that it took way longer to create this blog post than it did to come up with the recipe…)

A couple of things that came to mind: having used Google Refine to sketch out this hack, we could now move code it up, maybe in something like Scraperwiki. For example, I only found trademarks registered to one legal entity associated with BP, rather than checking for trademarks held by the myriad number of legal entities associated with BP. I also wonder whether it would be possible to “compile” what Google Refine is doing (import from URL, select row items, run operations against columns, export templated data) as code so that it could be run elsewhere (so for example, could all through steps be exported as a single Javascript or Python script, maybe calling on a GREL/Google Refine library that provides some sort of abstraction layer of virtual machine for the script to make use of?)

PS What’s next…? The trademark data also identifies one or more areas in which the trademark applies; I need to find some way of pulling out each of the “en” attribute values from the items listed in the value.parseJson().datum.attributes.goods_and_services_classifications.

FAQ: Trusting ‘the blogosphere’

Note: for those coming from Poynter’s summary of part of this post, the phrase ‘don’t have to be trained’ has an ambiguity that could be misunderstood. I’ve expanded on the relevant section to clarify.

Another set of answers to another set of questions (FAQs). These are posed by a UK university student:

How would you define the blogosphere?

The blogosphere is, technically, all blogs – but those don’t often have much connection to each other. I think it’s better to talk of many ‘blogospheres’ around different topics, e.g. the political blogosphere and so on. Continue reading