Increasingly you might come across an interesting set of interactive charts from a public body, or an interactive map, and you want to grab the data behind it in order to ask further questions. In many cases you don’t need to do any scraping — you just need to know where to look. In this post I explain how to work out where the data is being fetched from… Continue reading
One of the most useful applications of the data cleaning tool Open Refine (formerly Google Refine) is converting XML and JSON files into spreadsheets that you can interrogate in Excel.
Surprisingly, I’ve never blogged about it. Until now. Continue reading
Sylvia Tippmann wasn’t looking for a story. In fact, she was working on a way that Google could improve the way that it handled ‘right to be forgotten‘ processes, when she stumbled across some information that she suspected the search giant hadn’t intended to make public.
Two weeks ago The Guardian in the UK and Correct!v in Germany published the story of the leaked data, which was then widely picked up by the business and technology press: Google had accidentally revealed details on hundreds of thousands of ‘right to be forgotten’ requests, providing a rare insight into the controversial law and raising concerns over the corporation’s role in judging requests.
But it was the way that Tippmann stumbled across the story that fascinated me: a combination of tech savvy, a desire to speed up work processes, and a strong nose for news that often characterise data journalists’ reporting. So I wanted to tell it here. Continue reading
“Most of the big web apps provide their API in JSON format (Facebook, Twitter, Instagram) however, as you may know if you’ve ever tried to use these, they often require an OAuth login in order to access the API.”
If you’re looking to get into coding chances are you’ll stumble across a raft of jargon which can be off-putting, especially in tutorials which are oblivious to your lack of previous programming experience. Here, then, are 10 concepts you’re likely to come across – and what they mean.
A variable is one of the most basic elements of programming. It is, in a nutshell, a way of referring to something so that you can use it in a line of code. To give some examples:
- You might create a variable to store a person’s age and call it ‘age’
- You might create a variable to store the user’s name and call it ‘username’
- You might create a variable to count how many times something has happened and call it ‘counter’
- You might create a variable to store something’s position and call it ‘index’
Variables can be changed, which is their real power. A user’s name will likely be different every time one piece of code runs. An age can be added to at a particular time of year. A counter can increase by one every time something happens. A list of items can have other items added to it, or removed. Continue reading
CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it’s been around for some time, I’ve only just noticed the site’s API, so I thought I’d show how such an API can be useful as a way to draw on such data sources to complement data of your own. Continue reading
Following the post earlier this week on XML and RSS for journalists I wanted to look at another important format for journalists working with data: JSON.
JSON is a data format which has been rising in popularity over the past few years. Quite often it is offered alongside – or instead of – XML by various information services, such as Google Maps, the UK Postcodes API and the Facebook Graph API.
Because of this, in practice JSON is more likely to be provided in response to a specific query (“Give me geographical and political data about this location”) than a general file that you access (“Give me all geographical data about everywhere”).
I’ll describe how you supply that query below. Continue reading