For the first Something For The Weekend of 2012 I want to tackle a common problem when you’re trying to scrape a collection of webpage: they have some sort of structure in their URL like this, where part of the URL refers to the name or code of an entity: http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237521 http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237629 http://www.ltscotland.org.uk/scottishschoolsonline/schools/freemealentitlement.asp?iSchoolID=5237823 In this instance, you can see that
Read more…
CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it’s been around for some time, I’ve only just noticed the site’s API, so I thought I’d show how such an API can be useful as a
Read more…
It’s been over 2 years since I stopped doing the ‘Something for the Weekend’ series. I thought I would revive it with a tutorial on They Work For You and Google Refine… If you want to add political context to a spreadsheet – say you need to know what political parties a list of constituencies voted for, or the MPs
Read more…
Following the post earlier this week on XML and RSS for journalists I wanted to look at another important format for journalists working with data: JSON. JSON is a data format which has been rising in popularity over the past few years. Quite often it is offered alongside – or instead of – XML by various information services, such as
Read more…
Here’s an example of how APIs can be useful to journalists when they need to combine two sets of data. I recently spoke to Lincoln investigative journalism student Sean McGrath who had obtained some information via FOI that he needed to combine with other data to answer a question (sorry to be so cryptic). He had spent 3 days cleaning
Read more…
Recent Comments