“Certain web apps will be given blanket exemptions from charging. Here’s Google: “Maps API applications developed by non-profit organisations, applications deemed by Google to be in the public interest, and applications based in countries where we do not support Google Checkout transactions or offer Maps API Premier are exempt from these usage limits.” So nonprofit news orgs look to be in the clear, and Google could declare other news org maps apps to be “in the public interest” and free to run. (It also notes that nonprofits could be eligible for a free Maps API Premier license, which comes with extra goodies around advertising and more.)”
I was very excited recently to read on the Scraperwiki mailing list that the website was working on making it possible to create an RSS feed from a SQL query.
Yes, that’s the sort of thing that gets me excited these days.
But before you reach for a blunt object to knock some sense into me, allow me to explain…
Scraperwiki has, until now, done very well at trying to make it easier to get hold of hard-to-reach data. It has done this in two ways: firstly by creating an environment which lowers the technical barrier to creating scrapers (these get hold of the data); and secondly by lowering the social barrier to creating scrapers (by hosting a space where journalists can ask developers for help in writing scrapers).
This move, however, does something different. Continue reading
CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it’s been around for some time, I’ve only just noticed the site’s API, so I thought I’d show how such an API can be useful as a way to draw on such data sources to complement data of your own. Continue reading
Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn’t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed cameras.
So you’ll need a way to convert easting and northing into something that Fusion Tables does like – such as latitude and longitude.
Here’s how I did it – quickly. Continue reading
It’s been over 2 years since I stopped doing the ‘Something for the Weekend’ series. I thought I would revive it with a tutorial on They Work For You and Google Refine…
If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs for those constituencies – the They Work For You API can save you hours of fiddling – if you know how to use it.
An API is – for the purposes of journalists – a way of asking questions for reams of data. For example, you can use an API to ask “What constituency is each of these postcodes in?” or “When did these politicians enter office?” or even “Can you show me an image of these people?”
The They Work For You API will give answers to a range of UK political questions on subjects including Lords, MLAs (Members of the Legislative Assembly in Northern Ireland), MPs, MSPs (Members of the Scottish Parliament), select committees, debates, written answers, statements and constituencies.
When you combine that API with Google Refine you can fill a whole spreadsheet with additional political data, allowing you to answer questions you might otherwise not be able to.
I’ve written before on how to use Google Refine to pull data into a spreadsheet from the Google Maps API and the UK Postcodes API, but this post takes things a bit further because the They Work For You API requires something called a ‘key’. This is quite common with APIs so knowing how to use them is – well – key. If you need extra help, try those tutorials first. Continue reading
Here’s an example of how APIs can be useful to journalists when they need to combine two sets of data.
I recently spoke to Lincoln investigative journalism student Sean McGrath who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic).
He had spent 3 days cleaning up the data and manually adding postcodes to it. This seemed a good example where using an API might cut down your work considerably, and so in this post I explain how you make a start on the same problem in less than an hour using Excel, Google Refine and the Google Maps API.
Step 1: Get the data in the right format to work with an API
APIs can do all sorts of things, but one of the things they do which is particularly useful for journalists is answer questions. Continue reading
On Friday I had quite a bit of fun with Churnalism.com, a new site from the Media Standards Trust which allows you to test how much of a particular press release has been reproduced verbatim by media outlets.
The site has an API, which got me thinking whether you might be able to ‘mash’ it with an RSS feed from Google News to check particular types of articles – and what ‘signals’ you might use to choose those articles.
I started with that classic PR trick: the survey. A search on Google News for “a survey * found” (the * is a wildcard, meaning it can be anything) brings some interesting results to start investigating.
Jon Bounds added a favourite of his: “hailed a success”.
And then it continued: Continue reading
The rather lovely DocumentCloud – a tool that allows journalists to share, annotate, connect and organise documents – has finally emerged from its closet and made itself available to public searches.
If you do end up on this list you’ll find it’s quite a powerful tool, with quick conversion of PDFs into text files, analytic tools and semantic tagging (so you can connect all documents with a particular person, or organisation) among its best features. The site is open source and has an API too.
I asked Program Director Amanda B Hickman what she’s learned on the project so far. Her response suggests that documents have a particular appeal for online readers:
“If we’ve learned anything, it is that people really love documents. It is pretty clear that when there’s something interesting going on in the news, plenty of people want to dig a little deeper. When Arizona Republic posted an annotated version of that state’s new immigration law, it got more traffic than their weekly entertainment round up. WNYC told us that the page listing the indictments in last week’s mob roundup was still getting more traffic than any other single news story even a week later.
“These were big news documents, to be sure, but it still seems pretty clear that people do want to dig deeper and explore the documents behind the news, which is great for us and great for news.”
If you have a spreadsheet containing geographical data such as postcodes you may want to know what constituency they are in, or convert them to local authority. That was a question that Bill Thompson asked on Twitter this week – and this is how I used Google Refine to do that: adding extra columns to a spreadsheet with geographic information.
You can watch a video tutorial of this here.
1. Find a website that gives information based on a postcode
First, I needed to find an API which would return a page of information on any postcode in JSON…
If that sounds like double-dutch, don’t worry, try this instead.
Both of these will generate a page giving you details about any given postcode. The formatting of these pages is consistent, e.g.
(The first removes the space between the two parts of the postcode, and adds .json; the second replaces the space with %20 – although I’m told by Matthew Somerville that it will work with spaces and postcodes without spaces)
This information will be important when we start to use Google Refine…
2. Create a new column that has text in the same format as the webpages you want to fetch
In Google Refine click on the arrow at the top of your postcode column and follow the instructions here to create a new column which has the same postcode information, but with no spaces. To replace the space with %20 instead you would replace the express with
Let’s name this column ‘SpacesRemoved’ and click OK.
Now that we’ve got postcodes in the same format as the webpages above, we can start to fetch a bunch of code giving us extra information on those postcodes.
3. Write some code that goes to a webpage and fetches information about each postcode
In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column by fetching URLs…’
This time you will type the expression:
That basically creates a URL that inserts ‘value’ (the value in the previous column) where you want it.
Call this column ‘JSON for postcode’ and click OK.
Each cell will now be filled with the results of that webpage. This might take a while.
4. Write some code that pulls out a specific piece of information from that
In Google Refine click on the arrow at the top of your ‘SpacesRemoved’ column and create a new column by selecting ‘Edit column’ > ‘Add column based on this column…’
Write the following expression:
Look at the preview as you type this and you’ll see information become more specific as you add each term in square brackets.
Call this ‘Council’ and click OK.
This column will now be populated with the council names for each postcode. You can repeat this process for other information, adapting the expression for different pieces of information such as constituency, easting and northing, and so on.
5. Export as a standard spreadsheet
Click Export in the top right corner and save your spreadsheet in the format you prefer. You can then upload this to Google Docs and share it publicly.
Although this post is about postcode data you can use the same principles to add information based on any data that you can find an API for. For example if you had a column of charities you could use the Open Charities API to pull further details (http://opencharities.org/info/about). For local authority data you could pull from the OpenlyLocal API (http://openlylocal.com/info/api).
If you know of other similarly useful APIs let me know.
The following is an unedited version of an article written for the International Press Institute report ‘Brave News Worlds (PDF)‘
For the past two centuries journalists have dealt in the currency of information: we transmuted base metals into narrative gold. But information is changing.
At first, the base metals were eye witness accounts, and interviews. Later we learned to melt down official reports, research papers, and balance sheets. And most recently our alloys have been diluted by statements and press releases.
But now journalists are having to get to grips with a new type of information: data. And this is a very rich seam indeed.
Data: what, how and why
Data is a broad term so I should define it here: I am not talking here about statistics or numbers in general, because those are nothing new to journalists. When I talk about data I mean information that can be processed by computers.
This is a crucial distinction: it is one thing for a journalist to look at a balance sheet on paper; it is quite another to be able to dig through those figures on a spreadsheet, or to write a programming script to analyse that data, and match it to other sources of information. We can also more easily analyse new types of data, such as live data, large amounts of text, user behaviour patterns, and network connections.
And that, for me, is hugely important. Indeed, it is potentially transformational. Adding computer processing power to our journalistic arsenal allows us to do more, faster, more accurately, and with others. All of which opens up new opportunities – and new dangers. Things are going to change. Continue reading