Monthly Archives: April 2011

The law, ethics & effectiveness of PR firms offering bloggers prizes-to-post

A PR firm recently invited me to review their client’s product, saying that if I did review it I would be entered into a prize draw with other ‘qualifying’ bloggers to win an iPad 2.

It was a product I might ordinarily have covered, but this approach made me reluctant.

Here’s reason number 1: I asked myself whether the PR firm will have made the same approach to print journalists. I doubt it. Why? Because it would have raised obvious ethical issues, and questioned the journalists’ professionalism.

So were they assuming that bloggers had different ethics? I doubt they thought that hard – more likely was that some bright spark thought that eager, amateur bloggers would jump at the chance to get anything for their hard work.

Here’s reason number 2: other bloggers will have been approached with the same offer. If they saw me review the product they would assume that I had done so in exchange for this prize draw ticket. They would see me as unprofessional, unethical, or both.

In PR terms, then, the approach was counter-productive: it actually made me less likely to give their client coverage.
Continue reading

Advertisements

Data for journalists: JSON for beginners

Following the post earlier this week on XML and RSS for journalists I wanted to look at another important format for journalists working with data: JSON.

JSON is a data format which has been rising in popularity over the past few years. Quite often it is offered alongside – or instead of – XML by various information services, such as Google Maps, the UK Postcodes API and the Facebook Graph API.

Because of this, in practice JSON is more likely to be provided in response to a specific query (“Give me geographical and political data about this location”) than a general file that you access (“Give me all geographical data about everywhere”).

I’ll describe how you supply that query below. Continue reading

Which blog platform should I use? A blog audit

When people start out blogging they often ask what blogging platform they should use – WordPress or Blogger? Tumblr or Posterous? It’s impossible to give an answer, because the first questions should be: who is going to use it, how, and what and who for?

To illustrate how the answers to those questions can help in choosing the best platform, I decided to go through the 35 or so blogs I have created, and why I chose the platforms that they use. As more and more publishing platforms have launched, and new features added, some blogs have changed platforms, while new ones have made different choices to older ones. Continue reading

Guest post: visualising mobile phone data – the data retention app

datarentention_app

In a guest post Lorenz Matzat, editor of ZEIT Online’s Open Data Blog, writes about the background to their online app exploring the issues around data retention by mobile phone companies.

It’s not very often that one can follow the direct impact of an article, let alone a piece of data journalism. But the visualization of the cellphone data of Malte Spitz from the Green party in Germany led to visible repercussions in the US.

Following a piece in the New York Times about Spitz and the data app, some days ago two senators wrote a letter to the 4 main US-carriers for information about their data retention policy.

After publishing the app in German one month ago (and 20 days later the English version), the feedback was overhelming. We didn’t think that so many people would be so interested in it. But Twitter and Facebook in Germany went wild with it for some days – along with coverage in many major tech websites.

Probably this is why data journalism works: Making an abstract notion everybody knows about visible: that every position of you, and every connection of your mobile phone does is – or could be – logged. Every call, text message and data connection.

The background

Around February 1st, ZEIT Online asked me if I had an idea what do do with the dataset of Malte Spitz (read the background story about the legal action of Spitz to get the data here). Continue reading

Tech Tips: Making Sense of JSON Strings – Follow the Structure

Reading through the Online Journalism blog post on Getting full addresses for data from an FOI response (using APIs), the following phrase – relating to the composition of some Google Refine code to parse a JSON string from the Google geocoding API – jumped out at me: “This took a bit of trial and error…”

google-refnie-took-a-bit-of-trial-and-error

Why? Two reasons… Firstly, because it demonstrates a “have a go” attitude which you absolutely need to have if you’re going to appropriate technology and turn it to your own purposes. Secondly, because it maybe (or maybe not…) hints at a missed trick or two…

So what trick’s missing?

Here’s an example of the sort of thing you get back from the Google Geocoder:

{ “status”: “OK”, “results”: [ { “types”: [ “postal_code” ], “formatted_address”: “Milton Keynes, Buckinghamshire MK7 6AA, UK”, “address_components”: [ { “long_name”: “MK7 6AA”, “short_name”: “MK7 6AA”, “types”: [ “postal_code” ] }, { “long_name”: “Milton Keynes”, “short_name”: “Milton Keynes”, “types”: [ “locality”, “political” ] }, { “long_name”: “Buckinghamshire”, “short_name”: “Buckinghamshire”, “types”: [ “administrative_area_level_2”, “political” ] }, { “long_name”: “Milton Keynes”, “short_name”: “Milton Keynes”, “types”: [ “administrative_area_level_2”, “political” ] }, { “long_name”: “United Kingdom”, “short_name”: “GB”, “types”: [ “country”, “political” ] }, { “long_name”: “MK7″, “short_name”: “MK7″, “types”: [ “postal_code_prefix”, “postal_code” ] } ], “geometry”: { “location”: { “lat”: 52.0249136, “lng”: -0.7097474 }, “location_type”: “APPROXIMATE”, “viewport”: { “southwest”: { “lat”: 52.0193722, “lng”: -0.7161451 }, “northeast”: { “lat”: 52.0300728, “lng”: -0.6977000 } }, “bounds”: { “southwest”: { “lat”: 52.0193722, “lng”: -0.7161451 }, “northeast”: { “lat”: 52.0300728, “lng”: -0.6977000 } } } } ] }

The data represents a Javascript object (JSON = JavaScript Object Notation) and as such has a standard form, a hierarchical form.

Here’s another way of writing the same object code, only this time laid out in a way that reveals the structure of the object:

{
  "status": "OK",
  "results": [ {
    "types": [ "postal_code" ],
    "formatted_address": "Milton Keynes, Buckinghamshire MK7 6AA, UK",
    "address_components": [ {
      "long_name": "MK7 6AA",
      "short_name": "MK7 6AA",
      "types": [ "postal_code" ]
    }, {
      "long_name": "Milton Keynes",
      "short_name": "Milton Keynes",
      "types": [ "locality", "political" ]
    }, {
      "long_name": "Buckinghamshire",
      "short_name": "Buckinghamshire",
      "types": [ "administrative_area_level_2", "political" ]
    }, {
      "long_name": "Milton Keynes",
      "short_name": "Milton Keynes",
      "types": [ "administrative_area_level_2", "political" ]
    }, {
      "long_name": "United Kingdom",
      "short_name": "GB",
      "types": [ "country", "political" ]
    }, {
      "long_name": "MK7",
      "short_name": "MK7",
      "types": [ "postal_code_prefix", "postal_code" ]
    } ],
    "geometry": {
      "location": {
        "lat": 52.0249136,
        "lng": -0.7097474
      },
      "location_type": "APPROXIMATE",
      "viewport": {
        "southwest": {
          "lat": 52.0193722,
          "lng": -0.7161451
        },
        "northeast": {
          "lat": 52.0300728,
          "lng": -0.6977000
        }
      },
      "bounds": {
        "southwest": {
          "lat": 52.0193722,
          "lng": -0.7161451
        },
        "northeast": {
          "lat": 52.0300728,
          "lng": -0.6977000
        }
      }
    }
  } ]
}

Making Sense of the Notation

At its simplest, the structure has the form: {“attribute”:”value”}

If we parse this object into the jsonObject, we can access the value of the attribute as jsonObject.attribute or jsonObject[“attribute”]. The first style of notation is called a dot notation.

We can add more attribute:value pairs into the object by separating them with commas: a={“attr”:”val”,”attr2″:”val2″} and address them (that is, refer to them) uniquely: a.attr, for example, or a[“attr2”].

Try it out for yourself… Copy and past the following into your browser address bar (where the URL goes) and hit return (i.e. “go to” that “location”):

javascript:a={"attr":"val","attr2":"val2"}; alert(a.attr);alert(a["attr2"])

(As an aside, what might you learn from this? Firstly, you can “run” javascript in the browser via the location bar. Secondly, the javascript command alert() pops up an alert box:-)

Note that the value of an attribute might be another object.

obj={ attrWithObjectValue: { “childObjAttr”:”foo” } }

Another thing we can see in the Google geocoder JSON code are square brackets. These define an array (one might also think of it as an ordered list). Items in the list are address numerically. So for example, given:

arr[ “item1”, “item2”, “item3” ]

we can locate “item1″ as arr[0] and “item3″ as arr[2]. (Note: the index count in the square brackets starts at 0.) Try it in the browser… (for example, javascript:list=["apples","bananas","pears"]; alert( list[1] );).

Arrays can contain objects too:

list=[ “item1”, {“innerObjectAttr”:”innerObjVal” } ]

Can you guess how to get to the innerObjVal? Try this in the browser location bar:

javascript: list=[ "item1", { "innerObjectAttr":"innerObjVal" } ]; alert( list[1].innerObjectAttr )

Making Life Easier

Hopefully, you’ll now have a sense that there’s structure in a JSON object, and that that (sic) structure is what we rely on if we want to cut down on the “trial an error” when parsing such things. To make life easier, we can also use “tree widgets” to display the hierarchical JSON object in a way that makes it far easier to see how to construct the dotted path that leads to the data value we want.

A tool I have appropriated for previewing JSON objects is Yahoo Pipes. Rather than necessarily using Pipes to build anything, I simply make use of it as a JSON viewer, loading JSON into the pipe from a URL via the Fetch Data block, and then previewing the result:

Another tool (and one I’ve just discovered) is an Air application called JSON-Pad. You can paste in JSON code, or pull it in from a URL, and then preview it again via a tree widget:

Clicking on one of the results in the tree widget provides a crib to the path…

Summary

Getting to grips with writing addresses into JSON objects helps if you have some idea of the structure of a JSON object. Tree viewers make the structure of an object explicit. By walking down the tree to the part of it you want, and “dotting” together* the nodes/attributes you select as you do so, you can quickly and easily construct the path you need.

* If the JSON attributes have spaces or non-alphanumeric characters in them, use the obj[“attr”] notation rather than the dotted obj.attr notation…

PS Via my feeds today, though something I had bookmarked already, this Data Converter tool may be helpful in going the other way… (Disclaimer: I haven’t tried using it…)

If you know of any other related tools, please feel free to post a link to them in the comments:-)

Data for journalists: understanding XML and RSS

If you are working with data chances are that sooner or later you will come across XML – or if you don’t, then, well, you should do. Really.

There are some very useful resources in XML format – and in RSS, which is based on XML – from ongoing feeds and static reference files to XML that is provided in response to a question that you ask. All of that is for future posts – this post attempts to explain how XML is relevant to journalism, and how it is made up.

What is XML?

XML is a language which is used for describing information, which makes it particularly relevant to journalists – especially when it comes to interrogating large sets of data.

If you wanted to know how many doctors were privately educated, or what the most common score was in the Premiership last season, or which documents were authored by a particular civil servant, then XML may be useful to you. Continue reading

UK Journalists on Twitter

A post on the Guardian Datablog earlier today took a dataset collected by the Tweetminster folk and graphed the sorts of thing that journalists tweet about ( Journalists on Twitter: how do Britain’s news organisations tweet?).

Tweetminster maintains separate lists of tweeting journalists for several different media groups, so it was easy to grab the names on each list, use the Twitter API to pull down the names of people followed by each person on the list, and then graph the friend connections between folk on the lists. The result shows that the hacks are follow each other quite closely:

UK Media Twitter echochamber (via tweetminster lists)

Nodes are coloured by media group/Tweetminster list, and sized by PageRank, as calculated over the network using the Gephi PageRank statistic.

The force directed layout shows how folk within individual media groups tend to follow each other more intensely than they do people from other groups, but that said, inter-group following is still high. The major players across the media tweeps as a whole seem to be @arusbridger, @r4today, @skynews, @paulwaugh and @BBCLauraK.

I can generate an SVG version of the chart, and post a copy of the raw Gephi GDF data file, if anyone’s interested…

PS if you’re interested in trying out Gephi for yourself, you can download it from gephi.org. One of the easiest ways in is to explore your Facebook network

PPS for details on how the above was put together, here’s a related approach:
Trying to find useful things to do with emerging technologies in open education
Doodlings Around the Data Driven Journalism Round Table Event Hashtag Community
.

For a slightly different view over the UK political Twittersphere, see Sketching the Structure of the UK Political Media Twittersphere. And for the House and Senate in the US: Sketching Connections Between US House and Senate Tweeps