Category Archives: online journalism

Creating Thematic Maps Based on UK Constituency Boundaries in Google Fusion Tables

I don’t have time to chase this just now, but it could be handy… Over the last few months, several of Alasdair Rae (University of Sheffield) Google Fusion Tables generated maps have been appearing on the Guardian Datablog, including one today showing the UK’s new Parliamentay constituency boundaries.

Looking at Alasdair’s fusion table for English Indices of Deprivation 2010, we can see how it contains various output area codes as well as KML geometry shape files that can be used to draw the boundaries on map.

On the to do list, then, is to a set of fusion tables that we can use to generate maps from datatables containing particular sorts of output area code. Because it’s easy to join two fusion tables by a common column, we’d then have a Google Fusion Tables simple recipe for thematic maps:

1) get data containing output area or constituency codes;
2) join with the appropriate mapping fusion table to annotate original data with appropriate shape files;
3) generate map…

I wonder – have Alasdair or anyone from the Guardian Datablog/Datastore team already published such a tutorial?

PS Ah, here’s one example tutorial: Peter Aldhous: Thematic Maps with Google Fusion Tables [PDF]

PPS for constituency boundary shapefiles as KML see http://www.google.com/fusiontables/DataSource?dsrcid=1574396 or the Guardian Datastore’s http://www.google.com/fusiontables/exporttable?query=select+col0%3E%3E1+from+1474106+&o=kmllink&g=col0%3E%3E1

Gathering data: a flow chart for data journalists

3 Replies

Above is a flow chart that I sketched out during a long car journey to the Balkan Investigative Reporters Network Summer School in Croatia (don’t worry: I wasn’t driving).

It aims to help those doing data journalism identify how best to get hold of and deal with data by asking a series of questions about the information you want to compile and making suggestions on ways both to get hold of it and tools to then get it into a state which makes it easier to ask questions.

It also illustrates at a glance how the process of ‘getting hold of the data’ can vary widely, and how different projects can often involve completely different tools and skillsets from previous ones.

I will have missed obvious things, so please help me improve this. And if you find it useful, let me know.

Click on the image for other sizes.

Using Google Spreadsheets as a Database Source for R

I couldn’t contain myself (other more pressing things to do, but…), so I just took a quick time out and a coffee to put together a quick and dirty R function that will let me run queries over Google spreadsheet data sources and essentially treat them as database tables (e.g. Using Google Spreadsheets as a Database with the Google Visualisation API Query Language).

Here’s the function:

library(RCurl)
gsqAPI = function(key,query,gid=0){ return( read.csv( paste( sep="",'http://spreadsheets.google.com/tq?', 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid) ) ) }

It requires the spreadsheet key value and a query; you can optionally provide a sheet number within the spreadsheet if the sheet you want to query is not the first one.

We can call the function as follows:

gsqAPI('tPfI0kerLllVLcQw7-P1FcQ','select * limit 3')

In that example, and by default, we run the query against the first sheet in the spreadsheet.

Alternatively, we can make a call like this, and run a query against sheet 3, for example:
tmpData=gsqAPI('0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc','select A,C where <= 10',3) tmpData

The real question is, of course, could it be useful.. (or even OUseful?!)?

Here’s another example: a way of querying the Guardian Datastore list of spreadsheets:

gsqAPI('0AonYZs4MzlZbdFdJWGRKYnhvWlB4S25OVmZhN0Y3WHc','select * where A contains "crime" and B contains "href" order by C desc limit 10')

What that call does is run a query against the Guardian Datastore spreadsheet that lists all the other Guardian Datastore spreadsheets, and pulls out references to spreadsheets relating to “crime”.

The returned data is a bit messy and requires parsing to be properly useful.. but I haven’t started looking at string manipulation in R yet…(So my question is: given a dataframe with a column containing things like <a href=”http://example.com/whatever”>Some Page</a>, how would I extract columns containing http://example.com/whatever or Some Page fields?)

[UPDATE: as well as indexing a sheet by sheet number, you can index it by sheet name, but you’ll probably need to tweak the function to look end with '&gid=', curlEscape(gid) so that things like spaces in the sheet name get handled properly I’m not sure about this now.. calling sheet by name works when accessing the “normal” Google spreadsheets application, but I’m not sure it does for the chart query language call??? ]

[If you haven’t yet discovered R, it’s an environment that was developed for doing stats… I use the RStudio environment to play with it. The more I use it (and I’ve only just started exploring what it can do), the more I think it provides a very powerful environment for working with data in quite a tangible way, not least for reshaping it and visualising it, let alone doing stats with in. (In fact, don’t use the stats bit if you don’t want to; it provides more than enough data mechanic tools to be going on with;-)]

PS By the by, I’m syndicating my Rstats tagged posts through the R-Bloggers site. If you’re at all interested in seeing what’s possible with R, I recommend you subscribe to R-Bloggers, or at least have a quick skim through some of the posts on there…

PPS The RSpatialTips post Accessing Google Spreadsheets from R has a couple of really handy tips for tidying up data pulled in from Google Spreadsheets; assuming the spreadsheetdata has been loaded into ssdata: a) tidy up column names using colnames(ssdata) <- c("my.Col.Name1","my.Col.Name2",...,"my.Col.NameN"); b) If a column returns numbers as non-numeric data (eg as a string "1,000") in cols 3 to 5, convert it to a numeric using something like: for (i in 3:5) ssdata[,i] <- as.numeric(gsub(",","",ssdata[,i])) [The last column can be identifed as ncol(ssdata) You can do a more aggessive conversion to numbers (assuming no decimal points) using gsub("[^0-9]“,”",ssdata[,i])]

PPPS via Revolutions blog, how to read the https file into R (unchecked):

require(RCurl)
myCsv = getURL(httpsCSVurl)
read.csv(textConnection(myCsv))

Has investigative journalism found its feet online? (part 3)

3 Replies

Previously this serialised chapter for the forthcoming book Investigative Journalism: Dead or Alive? looked at new business models surrounding investigative journalism and online investigative journalism as a genre. This third and final part looks at how changing supplies of information change the context within which investigative journalism operates.

What next for investigative journalism in a world of information overload?

But this identity crisis does highlight a final, important, question to be asked: in a world where users have direct access to a wealth of information themselves, what is investigative journalism for? I would argue that it comes down to the concept of “uncovering the hidden”, and in exploring this it is useful to draw an analogy with the general journalistic idea of “reporting the new”.

Trainee journalists sometimes see “new” in limited terms – as simply what is happening today. But what is “new” is not limited to that. It can also be what is happening tomorrow, or what happened 30 years ago. It can be something that someone has said about an “old story” days later, or an emerging anger about something that was never seen as “newsworthy” to begin with. The talent of the journalist is to be able to spot that “newness”, and communicate it effectively.

Journalism typically becomes investigative when that newness involves uncovering the hidden – and that can be anything that our audience couldn’t see before – it could be a victim’s story, a buried report, 250,000 cables accessible to 2.5 million people, or even information that is publicly available but has not been connected before (“the hidden” – like “the new” is, of course, a subjective quality, dependent on the talent of a particular journalist for finding something in it – or a way of seeing it – that is newsworthy). Continue reading →

Has investigative journalism found its feet online? (part 2)

3 Replies

The first part of this serialised chapter for the forthcoming book Investigative Journalism: Dead or Alive? looked at new business models surrounding investigative journalism. This second part looks at how new ways of gathering, producing and distributing investigative journalism are emerging online.

Online investigative journalism as a genre

Over many decades print and broadcast investigative journalism have developed their own languages: the spectacular scoop; the damning document; the reporter-goes-undercover; the doorstep confrontation, and so on. Does online investigative journalism have such a language? Not quite. Like online journalism as a whole, it is still finding its own voice. But this does not mean that it lacks its own voice.

For some the internet appears too fleeting for serious journalism. How can you do justice to a complex issue in 140 characters? How can you penetrate the fog of comment thread flame wars, or the “echo chambers” of users talking to themselves? For others, the internet offers something new: unlimited space for expansion beyond the 1,000 word article or 30-minute broadcast; a place where you might take some knowledge, at least, for granted, instead of having to start from a base of zero. A more cooperative and engaged medium where you can answer questions directly, where your former audience is now also your distributor, your sub-editor, your source.

The difference in perception is largely a result of people mistaking parts for the whole. The internet is not Twitter, or comment threads, or blogs. It is a collection of linked objects and people – in other words: all of the above, operating together, each used, ideally, to their strengths, and also, often in relationship to offline media. Continue reading →

When will we stop saying “Pictures from Twitter” and “Video from YouTube”?

25 Replies

Image from YouTube

Over the weekend the BBC had to deal with the embarrassing ignorance of someone in their complaints department who appeared to believe that images shared on Twitter were “public domain” and “therefore … not subject to the same copyright laws” as material outside social networks.

A blog post, from online communities adviser Andy Mabbett, gathered thousands of pageviews in a matter of hours before the BBC’s Social Media Editor Chris Hamilton quickly responded:

“We make every effort to contact people, as copyright holders, who’ve taken photos we want to use in our coverage.

“In exceptional situations, ie a major news story, where there is a strong public interest in making a photo available to a wide audience, we may seek clearance after we’ve first used it.”

(Chris also published a blog post yesterday expanding on some of the issues, the comments on which are also worth reading)

The copyright issue – and the existence of a member of BBC staff who hadn’t read the Corporation’s own guidelines on the matter – was a distraction. What really rumbled through the 170+ comments – and indeed Andy’s original complaint – was the issue of attribution.

Continue reading →

Some of my thoughts in Spanish and Catalan

3 Replies

If you prefer to read me in Spanish or Catalan, the following may be of interest:

How to: convert easting/northing into lat/long for an interactive map

4 Replies

A map generated in Google Fusion Tables from a geocoded dataset — A map generated in Google Fusion Tables from a dataset cleaned using these methods

Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn’t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed cameras.

So you’ll need a way to convert easting and northing into something that Fusion Tables does like – such as latitude and longitude.

Here’s how I did it – quickly. Continue reading →

How a musician and a Sikh TV channel dominated coverage of the Birmingham riots

10 Replies

One image from last night guaranteed not to have made it onto the front page - via Birmingham Riots 2011

It’s one thing to cover rioting on the doorstep of the national press – it’s quite another when squeezed regional newsrooms have to do the same. And as rioting in the UK spread from London to Birmingham and then other cities, some unlikely suspects showed how to cover a riot online even when you don’t have a newsroom.

Dominating online coverage in Birmingham was not a local newspaper or broadcaster but a Tumblr site – Birmingham Riots 2011 – set up by musician Casey Rain. Over dozens of entries Casey posted countless reports of what was taking place, and a range of photos and video footage which dwarfed the combined coverage of regional press and broadcast.
Continue reading →

Host your own crowdsourced investigation with the Help Me Investigate plugin

6 Replies

Help Me Investigate as it looked 2 years ago

When we open-sourced the code for Help Me Investigate the plan was to move from a single site to a decentralised, networked structure. Now, thanks to Andy Dickinson, it has become even easier for anyone to host their own journalism crowdsourcing platform.

Since a conversation a couple of months ago, Andy has been tweaking a WordPress plugin that replicates the functionality of the previous Help Me Investigate site. It’s now ready for use.

The plugin adds an ‘Investigations’ page to your self-hosted WordPress blog which holds ‘sticky’ pages for any investigations you want to pursue, and allows you to break those down into distinct challenges that anyone can contribute to.

You can also add tags and grade progress, and limit access to make an investigation more private. Full functionality and limitations are listed on the plugin page. Continue reading →

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.

Category Archives: online journalism

Creating Thematic Maps Based on UK Constituency Boundaries in Google Fusion Tables

Gathering data: a flow chart for data journalists

Using Google Spreadsheets as a Database Source for R

Has investigative journalism found its feet online? (part 3)

What next for investigative journalism in a world of information overload?

Some of my thoughts in Spanish and Catalan

How to: convert easting/northing into lat/long for an interactive map

How a musician and a Sikh TV channel dominated coverage of the Birmingham riots

Host your own crowdsourced investigation with the Help Me Investigate plugin