api – Online Journalism Blog

Linked data and structured journalism at the BBC

Paul Bradshaw — Mon, 06 Jun 2016 06:56:15 +0000

Last month Basile Simon from BBC News Labs gave a talk at the CSV conference in Berlin: a two-day “community conference for data makers” (notes here). I invited Basile to publish his talk here in a special guest post.

At BBC News Labs, we’ve been pushing for more linked data in news for years now. We built a massive international news aggregator based on linked data, and spent years making it better… but it’s our production and live services who do the core of the job today.

We’re trying to stay relevant and to model our massive dataset of facts, quotes, news and articles. The answer to this may lie in structured journalism.

Starting in 2012, News Labs was founded to play with linked data. The original team, comprised of many data architects, strongly believed this was a revolution in the way we approached our journalism.

They were right.

Effortlessly updating the Olympics and elections

The BBC’s 2012 Olympics pages were generated using linked data

" data-large-file="https://onlinejournalismblog.com/wp-content/uploads/2016/06/bbc-olympics.png?w=625" class="size-full wp-image-22388" src="https://onlinejournalismblog.com/wp-content/uploads/2016/06/bbc-olympics.png?w=625&h=425" alt="BBC olympics" width="625" height="425" srcset="https://onlinejournalismblog.com/wp-content/uploads/2016/06/bbc-olympics.png?w=625&h=425 625w, https://onlinejournalismblog.com/wp-content/uploads/2016/06/bbc-olympics.png?w=150&h=102 150w, https://onlinejournalismblog.com/wp-content/uploads/2016/06/bbc-olympics.png?w=300&h=204 300w, https://onlinejournalismblog.com/wp-content/uploads/2016/06/bbc-olympics.png?w=768&h=522 768w, https://onlinejournalismblog.com/wp-content/uploads/2016/06/bbc-olympics.png 941w" sizes="(max-width: 625px) 100vw, 625px" />

The BBC’s 2012 Olympics pages were generated using linked data

The BBC powered its delivery of the 2012 London Olympics with linked data: effortlessly updating hundreds of athlete, sport, and competition pages with all the recent related news.

We use linked data to curate indexes, such as constituency pages during the elections – imagine keeping 650 pages up to date at any given moment – or the massive repository of all our programmes.

And we also have a lot of fun and academic research interest with our Juicer, a tool we built to ingest news coming from more than 600 international sources and to extract entities to match them against DBPedia. Juicer is a fantastic example of semantic technology.

But now you see, there’s a bit of an issue.

‘Trusting in articles’

We humans are extremely good at extracting knowledge out of information, and at making connections between things in our heads.

This “web of meanings,” as my colleague Paul Rissen calls it, is what makes you think of the Prime Minister when you read “Cameron warns prices would rise if the UK leaves the EU.”

Your brain instinctively tells you that we’re probably not talking about James Cameron, the film director, here. Or about the Scottish clan Cameron, for that matter.

The way we do online journalism is by publishing articles, which remain the basic unit of journalism (leaving aside broadcast journalism, of course).

Besides our live blogs, our interactives, our Snapchats and bots, we still write articles: massive walls of text, ranging from 300 to a couple of thousand words, that we just put online in the hope that, somehow, people will find them and read them.

That means that we probably trust that articles are currently the best way to contain and deliver the knowledge we produce.

Knowledge is association and links

This knowledge is the aforementioned mental graph, the connections made between topics.

It isn’t isolated facts: a random collection of things you know. Knowledge is association and links.

The whole point of writing an article about an event or a person is to convey knowledge: to provide context, insights gained from this knowledge.

We journalists pride ourselves in our unbiased and balanced knowledge that we deem more valuable than our opinions.

The real issue is the cost of producing these articles. From a newsroom perspective, the research that goes into each of them is made inefficient by the fact that journalists have to go back to old pieces of content and parse them again to find what they’re looking for: a date, a spelling, an information, etc.

But many stories develop. They’re in fact composed of a series of events, and oftentimes news organisations follow these developments and publish new articles as events unfold.

The costs multiply and add up.

Repeated context

Each of these articles follows the inverted pyramid structure: the most recent and newsworthy bits at the top; details, context and background further down.

This background and context information is in fact repeated, if only slightly differently, from article to article.

You probably see what I’m on about here:

There is a fundamental programming principle coming from Hunt and Thomas’ bible: “The Pragmatic Programmer“:

DRY – don’t repeat yourself.

And when I see how much we repeat ourselves in news, I cringe.

The costs multiply and add up every time you repeat yourself.

Worse, this is not even beneficial to our cherished audiences. The accumulation of articles and pieces about a topic offers to a new reader only the contemplation of an unstructured chaos.

Chaos in reverse chronological order, where articles, comment pieces, live blogs, columns, videos and social media are mixed together.

“Where to start?” asks our reader. “How can I inform myself and get a grasp of this story?”

Starting all over again, again

This is all because, every time we publish something, we let all the knowledge we invested in this piece’s production go to waste and we’re starting all over again the next time.

We need to start saving knowledge.

We’ve seen that linked data has proven to be quite an amazing tool for news publishers in the past. The point I want to make now is that we could – and probably should – go further.

We could produce knowledge and facts only once, in a way that makes them re-usable next time.

We could take events as the basic unit of reporting in such a way.

Journalism is about reporting on events, after all. Events implicate people, places, organisations, or things.

If you think about it, it’s also clear that a piece of journalism contains a mini graph in itself: it is about something, after all, it mentions people and companies… and you see that we’ve got a web, a graph here.

Events and ontologies

Events can be big events or small events, and some ontologies nest these events – as if some of them could contain, or rather be composed of other events.

Take an election, for example. That’s an event itself, but there are also many smaller, other events that participate in the election:

The campaign;
A candidate saying something at a press conference;
The results coming in…

A candidate touring a factory somewhere during their campaign is an event that we humans would see as linked to the bigger one: the election.

But it’s also linked to this factory’s story, and that’s a web again, because there are links everywhere.

Storylines

At the BBC, we’ve been playing with the concept of storylines for a little while now. A storyline is supported by an ontology, like the one above, and is no more and no less than a way to represent an event-based narrative.

Events and information are inputed in a structured way and can then be ordered in a narrative.

Obviously, events are often part of several stories and narratives. They can also have several interpretations.

For example, our candidate’s visit to a factory is, according to this ontology, an event; the story is the editorial perspective on this event: the election itself and how successful this candidate is being so far.

The reporter’s task is then to input facts and events into the database, as well as to connect and explain to the machine the relationships between them and earlier ones.

The storyline is made of components, such as the events we just mentioned, or datasets, or even other storylines!

This all looks tidy and rich, right?

But there’s even more: because the information and the content is now all tidily structured, it can be exposed with a bit of magic through an API.

And in as many formats as we want.

A presentation issue

After all, it’s just a presentation issue now that the content is there and structured. It can be delivered in many languages, you just need to plug in some translation pipeline like Google Translate or IBM Watson in the middle.

It can be snack-sized for people on their smartphones, expanded into a fully-fledged feature piece on desktop, and queried by our new fancy Facebook or Slack bot.

BBC R&D are playing around this idea with the Elastic News project, a way to deliver the same piece of content with variable depth to strengthen our audiences’ understanding of stories.

It’s up to the user to explore and dive into the details of a story should they want to do so.

Similarly, we’ve been looking at tuning up our content with in-line immediate explanations of some concepts about which we’ve got structured information (these are the people, places, organisations, and things I mentioned earlier, if you remember).

Object-Based Broadcasting

There are now lots of things that can be done with content made available through an API. R&D have a large workstream called Object-Based Broadcasting. The idea is that the content is a set of individual assets whose relationships and associations are described through metadata.

Content can be adjusted, re-versioned, made longer or shorter, and even explorable by the audiences.

We’ve separated the content from its delivery and consumption, and structured this ideal content around events.

Not only are the possibilities infinite and appealing, but if you think about it, it kind of makes sense to think this way.

It makes sense because news never stops. It happens all the time, it is a continuous flow and accumulation of events and facts.

As I said, stories develop and evolve, and journalism follows this.

We don’t have to repeat ourselves every time we’ve got to write an article

The only structure that makes sense is the narrative: “a spoken or written account of connected events; a story.” And see: “connected events.”

And we can invest our efforts into the creation and the curation of such narratives, now that we don’t have to repeat ourselves every time we’ve got to write an article.

These narratives make sense to somebody who stumbles upon them, because we think and understand the world in narrative forms.

“Oh, but what led to this event?” asks the reader. A click later, the narrative expanded to include another causal and helpful event.

“Ah, that makes sense, I get it now.” Job done.

Narratives evolve

These narratives are not frozen, they’re constantly evolving as stories develop. They can even represent the different views of the world we have. Chancellor Merkel thought it was only human and decent to open Germany’s doors to Syrian and Iraqi refugees; across the Channel the Daily Mail thinks it’s madness.

Same facts, different narratives.

Behind the scenes, and I quote Jacqui Maher and Paul Rissen‘s Manifesto for Structured Journalism:

“Such a database of knowledge – which already exists in the collective knowledge of our newsroom staff – could be used to provide context at scale across all our output. Structured journalism is a way of preserving a reporter’s expertise so that it isn’t lost once aired or published, and instead, is surfaced in related coverage.”

It's a wrap. pic.twitter.com/hId72gldZQ

— CSVConf (@CSVConference) May 4, 2016

Create your own Instagram/Facebook/Twitter API with Google Drive and IFTTT

Paul Bradshaw — Fri, 27 Feb 2015 13:58:32 +0000

My Birmingham City University colleague Nick Moreton has a neat little hack for connecting a JavaScript app to social media accounts by combining the automation tool IFTTT, and Google Drive. As he explains:

“Most of the big web apps provide their API in JSON format (Facebook, Twitter, Instagram) however, as you may know if you’ve ever tried to use these, they often require an OAuth login in order to access the API.”

IFTTT, you see, allows you to add a new row to a Google spreadsheet every time a particular criteria is met on, for example, Twitter (e.g. a particular account tweeting, or a hashtag being used). You only have to authorise IFTTT for the particular service once.

It is, basically, row-by-row scraping.

And any spreadsheet on Google Drive can be published as the JavaScript data format JSON.

Combine the two and you have a regularly updated live JSON feed from a social media service based on a criteria of your choosing.

You can read Nick’s post in full here.

How-to: learn about APIs while making tweetable quotes

Paul Bradshaw — Tue, 10 Feb 2015 06:43:16 +0000

This is the second in a series of tutorials introducing HTML, CSS and APIs. You should probably start with the first one, here.

You can also get all four tutorials in a small ebook.

Sharelines

In the previous post I outlined how to create a ‘Tweet this’ link using HTML to open a new Twitter window containing any text you liked. In this post I’ll outline how to add links, hashtags and @names to that tweet – and along the way find out a bit about APIs.

Stage 2: Adding links, hashtags and @names to a ‘tweet this’ window

You can add a link into your tweet just as you would any other word, for example:

https://twitter.com/intent/tweet?text=hello%20http://bit.ly/12345

But there’s another option: using this:

&url=

Here’s how it looks:

https://twitter.com/intent/tweet?text=hello&url=http://bit.ly/12345

You can also add hashtags using this:

&hashtags=

…like so:

https://twitter.com/intent/tweet?text=hello&url=http://bit.ly/12345&hashtags=journalism

And you can credit a Twitter username using this:

&via=

…like so:

https://twitter.com/intent/tweet?text=hello&url=http://bit.ly/12345&via=paulbradshaw

So we have text=, url=, hashtags=, and via=. Does it matter what order they come in? My answer would be ‘Try changing it, and see’…

…And you’ll find that no, it doesn’t matter. What does matter is that you put an ampersand between each pair, like so:

url=http://bitly.com/hellothere

&text=hello%20there

&via=paulbradshaw

&hashtags=journalism

What’s happening here? Well, you’re actually using a special part of Twitter called Web Intents. This is sort-of-an API (Application Programming Interface)…

An introduction to APIs

@paulbradshaw No, and maybe yes. It's not a traditional API, and many would argue it's not one at all. But it's arguable (just)

— Michæl Brunton-Spall (@bruntonspall) January 26, 2015

Firstly, what is an API? An API essentially makes it easier for computer scripts to communicate with each other, and automate actions. In this case, tweeting or retweeting text, links, images and other material.

For example, as long as a developer knows the structure of the URL, they can write scripts which automatically generate a ‘tweet this’ or ‘retweet this’ link. That saves a lot of time, yes?

In this case, it has made it easier for you to manually do the same thing: generate a ‘tweet this’ window pre-populated with certain text, links, hashtags and @names.

APIs and documentation

Like most APIs, Web Intents comes with pages of documentation explaining how it can be used.

The main page for this explains, for example that our URL is for a tweet:

https://twitter.com/intent/tweet

But you can also form a URL for a retweet – https://twitter.com/intent/retweet – or to favourite a tweet: https://twitter.com/intent/favorite. I’m not going to cover that here, but if you want to take this further it’s worth exploring the documentation for those.

Instead, drilling down further into the specific documentation for the ‘tweet’ options you’ll find a section named ‘Query parameters’.

And here, finally, we start to see those words we were putting into our URL: text, via, url, hashtags, plus a couple more: related and in-reply-to.

API parameters

‘Query parameters’ are types of questions you can ask of an API.

In many APIs you can ask a question – form a query – and get information back: for example one query parameter might be ‘postcode=’, and then you supply a value with that. In return, you get information about the postcode you supplied. (This is what the UK-Postcodes API does.)

In the case of Twitter’s Web Intents, what you get ‘back’ is that tweet box populated with the values you’ve supplied.

The query parameter in-reply-to can even add some metadata to the tweet which connects it to a specified other tweet.

Each parameter is followed by an equals sign and the value – as we’ve already seen. So an example of using the text query parameter is text=hello.

And each pair of query parameters (text=) and values (hello) is separated with an ampersand like so:

text=hello&url=http://bit.ly/12345&via=paulbradshaw

You can have a play around with various combinations of queries and see what happens. Again, you’re not going to break the Internet. Now you have a little bit of API experience to build on.

In the third post I take a detour into a little hack I discovered which allows you to embed images in a tweet. Then in the final post I’ll cover how to style your tweetable quotes further, and start to explore CSS.

How-to: learn HTML and CSS by making tweetable quotes

Paul Bradshaw — Mon, 09 Feb 2015 06:17:34 +0000

Sharelines

In the first of a series of tutorials I’m going to introduce you to some basic HTML by showing you a particularly useful application of simple coding skills: making something in your article ‘tweetable’.

I’ll do this in three stages: this first and longest post will introduce HTML basics by showing how to create the ‘tweetable quote’; the second post will add more details on tweeting links, hashtags and @names.

The third post will cover how to make a ‘tweetable image’; and finally, the third post will add a little design flair with CSS.

You can also get all four tutorials in a small ebook.

You will need an article already written in order to do this – ideally one with at least one image, and some good quotes.

Stage 1: The tweetable quote

If you know how to create a link then you already know how to create a tweetable quote.

Let me explain. A link has two parts:

The raw text or image which is linked, and
The HTML code which makes it into a link. HTML code is always in triangular brackets, sometimes called chevrons.

Here, for example, is a link with both elements:

PUT LINK TEXT HERE

The text (in capitals in the example above) will go to the link in quotation marks in the HTML, whatever that text is.

Now let me show you the HTML for a link which creates a very simple ‘tweet this’ box with the tweet already filled with the text ‘hello’. The structure is the same:

PUT LINK TEXT HERE

Starting with HTML: opening and closing tags

In WordPress, switch from the normal ‘Visual’ view to the ‘Text’ view, which allows you to see the post text including most of the HTML as well.

For example, if you have any formatting such as bold or italic text, subheadings, bullet or numbered lists and links, then you should be able to see the HTML doing that work. Here are some examples:

– this makes text bold

– this makes text italic

– this makes a Heading 2 subheading. Similar tags will create headings at levels 3 down to 6

` – …and then the first item in that list
* `
1. ` – this makes a numbered list, and then the first item in that list
  * `
  
  ` – this makes an indented quote
  * `` – this makes an image
  * `` – this makes a link. More on this later.

You should also notice that (almost) every tag has a similar, tag, with a backslash before it.

This ‘turns off’ the tag. For instance:

“ – this turns off bold text
“ – this turns off italic text
`

- this ends the Heading 2 subheading. *

- this marks the end of a bullet list... *

- ...and this marks the end of the first item in that list *

- this ends the last item in a numbered list, and then the numbered list as a whole *

` – this ends an indented quote
* “ – this marks the end of linked text.

You’ll notice one tag that is only in one of those two lists: . This is because the img tag is one of a very few tags that don’t have a closing tag.

All this is a long way of saying: if you want to create a link, you need to make sure that you close it with the “ tag.

You may have noticed that the in the link code example given earlier also includes other words like href= – why don’t we close those? Well, because those are not tags – they are something else, as I’ll explain next.

As a first exercise, before that, try this:

Make sure you are in the HTML view for your post (click the Text tab if you are in WordPress).
Find a quote (try CTRL+F to find it quickly)
Put immediately before it (on the same line) and immediately after it (also on the same line).
Preview the post after this change. The quote should now be styled like a link, like this. But when you hover over it, you cannot click – why?

Attributes and values

Some HTML tags, like and need attributes and values to work properly. You can create some HTML which looks like this: A LINK and it will look like a link – but it won’t go anywhere. Why? Because we haven’t specified where we want the link to go.

The source of the link is just one attribute that an tag can have. That attribute is href (hyperlink reference).

Other common attributes of tags include src (source), width, height, color and border. And when you start to think about those the idea of an attribute makes more sense: if you want to draw a box then of course you need to know its attributes in terms of width and height.

And of course each attribute needs a value: what is the width? What is the height, or the colour? What is the URL of the href of this link, or src of this image?

The value is specified by adding an equals operator after the attribute, and then the value in straight quotation marks.

A typical tag in full, then, with an attribute and its value, looks something like this:

The href attribute has (=) the value "http://onlinejournalismblog.com"

A single tag can have multiple attributes. An image can have a src attribute, a width and height, a border thickness, alignment, title and alternative description, to name just a few.

But you only close the tag. You do not close the attributes.

To apply these principles to your link, change the HTML so that the tag has an href attribute like so:

Make sure there is still an “ after the text, to end the link.

Now when you preview to see the effect of the change, your text should not only be styled like a link, but it should be clickable too, like this.

But when you click on the link, it will not go anywhere.

That is because you now have a tag and an attribute, but no value (the URL of the link it should be going to).

So let’s add one. The URL which will create your tweet:

https://twitter.com/intent/tweet?text=hello

If you add this as your href attribute’s value your full link HTML should look like this:

Your link text here

Change your HTML for the link so it uses the same URL. Then preview and test the link (you need to be logged into Twitter by the way, or you’ll be taken to a login page).

The link should open a Twitter box with the word ‘hello’ already entered.

Now ideally we want it to open in a separate window. And there’s an attribute for that: target.

The target attribute specifies whether you want this link to open in the same window, or a new one (among other now largely unused options).

If you don’t use it, the link will by default open in the same window. But if you want to open in a new window, then you need to give the target attribute the value ="_blank". Here’s an example of adding that to the link shown above:

PUT LINK TEXT HERE

Now preview and test the link.

Regular testing is key when playing with any code: it allows you to identify any problems quickly and specifically.

For example, if you make ten changes and then test, the cause of any problem could be any of those ten changes. If for each of those ten changes you test each time, you will only get that problem for the one change that causes it.

Customising the tweeted text – hackable URLs

When you click on that text you’ll notice that the resulting window contains the impressive but ultimately unhelpful text: ‘hello’.

Now we want to change that text to the same text as our quote.

If you look at the URL you should be able to guess how to do that.

At the end of the URL are the words text=hello. This is very similar to the attributes and values that we talked about: text always stays the same (it means the contents of the tweet) but the value can be changed. At the moment that value is 'hello' but… what if we change it?

Well, we can try and see what happens. It’s not going to break the internet.

So, change the value to something else to test our suspicions: is this the part of the URL which populates the text of the tweet?

In your browser address bar, then, copy and edit that URL to this:

https://twitter.com/intent/tweet?text=goodbye

And yes indeed when we go to that URL the text changes to ‘goodbye’.

This is called a ‘hackable URL‘. In other words, we can change (‘hack’) the URL to generate different results.

How about a longer phrase? When we try something like this…

https://twitter.com/intent/tweet?text=your quote here

…it works, but look at the final URL: it’s slightly different:

https://twitter.com/intent/tweet?text=your%20quote%20here

The spaces have been replaced by %20 – because URLs cannot have spaces in them (in Firefox it may look like spaces, but if you copy and paste the URL into a text editor you will see %20 instead).

This is called ‘escaping’ special characters which might otherwise cause problems, and your browser automatically does it.

Try it now, then, with the quote you actually want to appear in the tweet. Ideally you should copy the resulting URL with %20 instead of spaces – although if you didn’t the link would probably still work (‘resolve’) anyway.

Now use that in your HTML link instead of the simpler ‘hello’ version so you have something like this:

How to create a tweetable quote by Paul Bradshaw

By the way, speech marks are another special character which needs to be ‘escaped’. In this case, it will be replaced by %22.

When someone clicks on that link it should open a new window containing the text specified.

Have a play with the techniques covered so far until you’re confident. In particular, see if you can add a short link back to the original post.

In the next post I will outline how to add that link, plus other elements such as @names and hashtags.

Before then, let’s cover a bit more on HTML: specifically nested tags.

Changing your linked text to a ‘call to action’

So far we’ve been linking the quote itself, but will the user know what will happen when they click on it? Chances are the user will assume that link takes them to the source of the quote – not to a Twitter box allowing them to share it.

So we need to change that.

First, we need to create the ‘Call to Action’ (CTA) that tells the user to ‘Tweet this!’. Type that after the quote, perhaps in square brackets, like so:

[Tweet this!]

Now we need to link that text instead of the quote itself. You could, for example, cut and paste both parts of the tag (opening and closing) from where they were, to before and after [Tweet this!].

If you want that text to be a bit less obtrusive, you can make it ‘superscript’ (small text hovering slightly above normal text) with the tag like so: ^{[Tweet this!]}

^{[Tweet this!]}

Which tag comes first?

At this point you are dealing with a piece of HTML which uses two tags: and .

This is a good opportunity to introduce the LIFO rule in HTML: when you are combining more than one tag, they should be closed in the reverse order.

In other words: Last In First Out (LIFO).

If you want text to be bold and italic, for example, you could apply that formatting by combining the tags:

…then close in the reverse order:

“

It does not generally matter which one comes first; it only matters that you reverse the order when closing. So conversely, if you started with

…then you would close with

“

In some cases, however, you don’t have that option.

For example in a bullet or numbered list you have to open a tag for the list as a whole and for each item within that list. You cannot open a list item before you open the list in which it is supposed to sit.

So, you only use

once (because there is only one list) but within those you might use

(list item) and `

` as many times as you want bulleted items.

If you get any problems with tags it is worth checking:

Whether you closed them in the reverse order, and
If you change the order, does it help?

For example, if you have problems with , try – but always remember the LIFO rule: whichever tag you open first – or – should be the one you close last. ^{[Tweet this!]}

Adding a Twitter icon

The same principle applies if you want to use a little Twitter bird icon after your quote.

There are a number of these on Twitter’s image resources page including this one:

In this case your HTML looks like so:

This time the tag is nested within the and tags. We can’t reverse this order because does not have a closing tag. We link the image by surrounding it with the opening and closing tags.

The tag also has some attributes: src="" tells us where the image is being loaded from: in this case https://g.twimg.com/dev/documentation/image/Twitter_logo_blue_16.png

And the alt="" attribute tells us an alternative description for the image, in case the user is using screen reading software (because they are partially sighted or blind), or if the image does not load, and also to help search engines understand what the image is.

For more styling options see the final part of this series on using CSS.

Have a play around with making your own ‘tweet this’ links and different URLs. In the next post I’ll cover how to add other elements to the tweet itself.

Journalisme et code : 10 grands principes de programmation expliqués

Paul Bradshaw — Thu, 15 May 2014 10:50:57 +0000

Cedric Motte asked if he could translate Coding for journalists: 10 programming concepts it helps to understand into French. Here’s the result – first published on NewsResources.

Si vous envisagez de vous mettre à la programmation, il y a de fortes chances que vous butiez sur une série de termes techniques, un jargon qui peut être particulièrement rébarbatif, notamment dans les tutoriels, dont les auteurs ont tendance à oublier que vous êtes inexpérimentés en programmation.

Les sections qui suivent décrivent et indiquent dix concepts que vous êtes susceptible de – non, que vous allez – rencontrer.

1. Variables

La variable est l’un des éléments fondamentaux de la programmation. En quelques mots, une variable permet de faire référence à un élément afin de pouvoir l’utiliser dans une ligne de code. Voici quelques exemples :

Vous pouvez créer une variable pour stocker l’âge d’une personne et l’appeler « âge ».
Vous pouvez créer une variable pour stocker le nom d’un utilisateur et l’appeler « utilisateur ».
Vous pouvez créer une variable pour compter des événements et l’appeler « compteur ».
Vous pouvez créer une variable pour stocker la position d’un élément et l’appeler « index ».

Les variables peuvent être modifiées, c’est là que réside leur vraie puissance.
Un nom d’utilisateur peut être différent chaque fois qu’un fragment de code est exécuté.
L’âge peut être ajouté à un moment spécifique de l’année.
La valeur d’un compteur peut être incrémentée d’une unité à chaque nouvel événement. Il est possible d’ajouter ou de supprimer des éléments à une liste d’éléments.

Les variables peuvent également être combinées : un âge (une variable) peut être calculé en fonction d’une date de naissance (une autre variable).

2. Chaînes, entiers et autre jargon pour décrire les types de données

Il existe différents types de variable qui impactent ce que vous pouvez en faire. Les types de variable les plus fréquents sont les suivants :

Nombres : entiers ou flottants (avec des décimales)
Texte : généralement appelé chaînes (« string » dans le code) et indiqué entre des guillemets droits, par exemple : “17 août”.
Listes ou séries (voir l’explication ci-après) : normalement indiquées entre des crochets avec les valeurs séparées par des virgules, par exemple : [“Paris”, “Bruxelles”, “Montréal”]
Dictionnaires (ou « dict ») (voir l’explication ci-dessous) : normalement indiqués par des parenthèses, deux-points et des virgules, par exemple : {“Age”: 23, “Nom”: “Jeanne”}

C’est important, car des problèmes peuvent survenir lorsque le code détecte des informations dans un format incorrect. Par exemple, vous ne pouvez pas effectuer de calcul avec des chaînes (logique, une chaine est du texte) ou, parfois, combiner du texte avec des chiffres.
En pareils cas, la programmation implique souvent d’indiquer au code de considérer « 7 » comme un nombre et non comme une chaîne, voire de convertir « sept » dans son équivalent numérique. Les ordinateurs excellent pour effectuer des tâches de façon répétitive, mais ils ont besoin d’instructions explicites.

3. Classes, identificateurs et sélecteurs

Le code HTML utilise class= et id= pour identifier des types de contenus spécifiques et permettre de les manipuler avec un autre code. Par exemple, le code d’une page web peut se présenter comme suit :

Il existe au moins trois façons de rendre ces classes et ces identificateurs utiles :

Leur style peut être défini grâce à un fichier CSS.
Ils peuvent être modifiés avec Javascript ou d’autres langages.
Ils peuvent être identifiés et récupérés (on parle alors de « scraping ») avec un langage comme Python, Ruby ou PHP.

Le plus souvent, ces possibilités reposent sur des sélecteurs. Ces sélecteurs indiquent une classe par un point et un identificateur par un hashtag, par exemple :

.article (pour class=”article”)
#footer (pour id=”footer”)

Si ces symboles s’affichent ou si vous lisez des documents sur les classes et les identificateurs (« id »), vous savez désormais de quoi il s’agit.

4. Fonctions et méthodes

En général, les fonctions et les méthodes sont des recettes comportant un seul mot et permettant d’exécuter des actions dont l’explication nécessiterait autrement de nombreuses lignes de code.

Voici deux exemples :

len, dans certains langages, signifie « indiquer la longueur de l’élément spécifié » ;
split, dans certains langages, signifie « fractionner cet élément en un ou plusieurs éléments en fonction d’un critère spécifié ».

À cet effet, la fonction a besoin d’un ingrédient appelé argument ou paramètre (voir l’explication ci-dessous), et la méthode est associée à un ingrédient appelé objet (voir, là aussi, l’explication ci-dessous).

La distinction entre les méthodes et les fonctions est subtile et ne mérite pas qu’on s’y attarde ici car leur utilisation diffère d’un langage à l’autre.
Cependant, elles se distinguent par un critère essentiel, la possibilité de définir vos propres fonctions dans votre code, ce qui s’avère utile pour les actions que vous souhaitez exécuter plusieurs fois.

D’autres fonctions sont déjà utilisées initialement dans le langage de programmation. Voici, par exemple, une liste des fonctions intégrées de Python. JavaScript dispose également de ces fonctions et de celles-ci.

Un troisième type de fonction ou de méthode n’est disponible que si vous utilisez la bibliothèque appropriée (voir ci-dessous).

5. Arguments ou paramètres

Les fonctions et les méthodes (voir l’explication ci-dessus) ont besoin d’ingrédients, appelés argument ou paramètre, pour fonctionner.

Les arguments ou les paramètres s’affichent entre parenthèses, après le nom de la fonction, par exemple :

len(“Paul”)
len(myname)

La fonction len, par exemple, indique la longueur d’un argument indiqué entre parenthèses.
Dans le premier exemple, il s’agit d’une chaîne (indiquée par des guillemets) : “Paul”. Dans ce cas, le résultat serait 4 (4 caractères).

En revanche, si la variable est une liste, le résultat correspondra au nombre d’éléments dans cette liste et non au nombre de caractères. Pour d’autres types de données, cela ne fonctionnera peut-être pas du tout, et un message d’erreur sera généré.

La documentation d’une fonction ou d’une méthode devrait vous renseigner davantage sur leur action et sur les arguments nécessaires. En règle générale, ces arguments sont appelés paramètres, mais ces deux mots désignent la même réalité :

l’un dans un sens général (« Cette fonction comporte un paramètre : un objet à mesurer »),

l’autre dans un sens spécifique (« En utilisant l’argument “Paul” »).

Certaines fonctions et méthodes utilisent plusieurs paramètres, séparés par une virgule. Certains paramètres sont facultatifs. Parfois, les parenthèses restent vides, par exemple : ready(). Là encore, ce point sera décrit en détail dans la documentation.

Lorsque vous apprenez à programmer, il peut être utile de chercher les mots « documentation » et « fonction » ou « méthode » avec l’action que vous souhaitez effectuer ou le nom de la fonction/méthode qui vous pose problème.

Voici un exemple de fonction utilisé dans le calendrier des événements autour du journalisme et des médias

function generateCalendar (eventData) {
  monthNames = ["Jan", "Fev", "Mars", "Avril", "Mai", "Juin", "Juil", "Aout", "Sept", "Oct", "Nov", "Dec"]
  weekdays   = ["Lundi", "Mardi", "Mercredi", "Jeudi", "Vendredi", "Samedi", "Dimanche"]
  today      = new Date()
  months = []
  generateAllTheMonths(eventData)

  $.each(eventData, function(i, event){
    appendEvent(event)
  })

Comment on le traduit en langage courant ?
. La fonction est destinée à créer le calendrier en allant chercher l’argument eventData
. Les noms des mois seront affichés et écrits comme indiqués entre guillemets droits
. Les noms des semaines seront affichés et écrits comme indiqués entre guillemets droits
. Pour le jour qui correspond à aujourd’hui, on ira chercher l’argument contenu dans le paramètre Date (en l’occurrence ici, il s’agira d’afficher une feuille de style particulière – un petit rond bleu en fond)
. on génère (=affiche ici) tous les mois qui ont des données correspondant à l’argument eventData. Cela explique le non affichage des mois pour lesquels nous n’avons pas renseigné d’événements.
. pour chaque événement, on crée une boucle qui…
. … va chercher les infos de l’argument event, argument qui est décrit plus bas dans le code.
. et on termine toujours les fonctions par une accolade

6. Bibliothèques

Les bibliothèques sont des collections d’autres fonctions et méthodes, qui permettent d’aller plus loin qu’avec les bases du langage de programmation. En d’autres termes, elles permettent d’utiliser le code développé par d’autres personnes : c’est l’un des aspects les plus puissants de la programmation.

Pensez à un problème que vous pourriez rencontrer. Il est probable que quelqu’un a créé une bibliothèque apportant une solution : dessin d’une carte, extraction d’informations à partir d’une série de pages web (« scraping »), conversion d’un document, représentation de données sous forme de graphique ou dans des tables interactives, création d’animations ou d’effets.

Ainsi, il est également utile d’effectuer une recherche en indiquant votre problème, le langage que vous utilisez ou apprenez, ainsi que le mot « bibliothèque ». Par exemple : « cartographie javascript bibliothèque ».

Ajout Cédric.
github est le lieu magique pour trouver des bibliothèques. Sur github, la terminologie est un peu différente, mais vous pouvez considérez que ce qu’on appelle dans ce billet une bibliothèque sera un “repository” sur github. Pour récupérer ce repo, sans compte github vous pouvez télécharger le zip qui contient tous les fichiers. Avec un compte, vous pouvez “forker” le repo : les fichiers sont alors collés dans votre compte github. Ensuite, vous pouvez les “cloner” sur votre ordinateur afin de travailler dessus en local.
Github mérite largement un autre post – un jour on s’y mettra

7. Listes/séries et dictionnaires/dicts

Les listes et les dictionnaires sont des types d’informations spécifiques, qui peuvent s’avérer extrêmement utiles en programmation, mais qui peuvent également être source de confusion pour ceux qui n’ont pas l’habitude de les utiliser.

La terminologie varie : dans certains langages de programmation, les listes sont appelées séries et les dictionnaires, dicts. Cependant, je préfère « listes » et « dictionnaires », car ces termes sont plus compréhensibles.

Très simplement, une liste ou une série, c’est une liste d’éléments, par exemple :
[“Asie”, “Afrique”, “Europe”]
Le contenu d’une liste peut correspondre à un ou plusieurs des types de données abordés précédemment.
Les listes sont extrêmement utiles dans deux cas :
1. stockage d’informations (par exemple, dans le « scraping » ou pour les réponses d’un utilisateur à un questionnaire) ;
2. répétition d’actions (par exemple, pour représenter chaque nombre ou lieu d’une liste sur une courbe ou une carte). Reportez-vous à la section sur les boucles ci-dessous.

Les dictionnaires sont similaires en ce sens qu’ils sont également structurés sous forme de liste, avec une différence essentielle : il existe une liste de paires.
Chaque paire possède une étiquette (appelée clé) et une valeur, connectées par deux-points, par exemple :
“Age”: 24
Le mot « dictionnaire » a son importance. Envisagez-le comme une collection de mots auxquels sont associées des définitions. Vous pouvez également l’imaginer comme des titres de colonnes (âge, nom, lieu) et des valeurs (18, Sarah, Genève).

Chaque paire est ensuite séparée par des virgules et le tout est placé dans une liste, délimitée par des accolades, par exemple :
{“Age”: 23, “Nom”: “Jeanne”}
Cela est particulièrement utile pour stocker des données possédant plusieurs étiquettes. Par exemple, vous pouvez stocker une liste d’âges sous forme de liste simple. Mais si vous souhaitez connecter chaque âge à un nom ou à un lieu, vous aurez besoin d’un dictionnaire.
C’est précisément la logique sous-jacente du format de données JSON, utilisé par différentes API (voir ci-dessous).
Lecture : Data for journalists: JSON for beginners

Comme les listes, les dictionnaires peuvent contenir tous les types de données sous chaque clé, y compris des listes et des dictionnaires (c’est-à-dire un dictionnaire peut contenir un dictionnaire).

8. Boucles : for, each, while

Comme indiqué précédemment, l’un des aspects appréciables concernant les listes est qu’elles permettent de répéter des actions de nombreuses fois, ce qui constitue l’une des principales utilisations de la programmation.
À cet effet, vous utilisez normalement une boucle (« loop »). La boucle commence au premier élément d’une liste, lui applique une action, puis recommence de même pour le deuxième élément, et ainsi de suite, jusqu’au dernier élément.

Voici quelques exemples :

Utilisation de chaque (boucle « each ») lieu d’une liste et positionnement sur une carte
Utilisation de chaque nombre d’une liste et adaptation de la taille des barres d’un graphique à barres afin qu’elles correspondent à ces nombres
Utilisation de chaque élément d’une liste (par exemple, des codes d’identification) et ajout à une URL partielle pour créer l’URL complète
Utilisation de chaque URL d’une liste et exécution de code pour en extraire des informations
Exécution d’une animation lorsque (boucle « while ») le score d’une personne est inférieur ou supérieur à une valeur donnée

La façon dont une boucle crée simultanément une variable et le nombre de fois que la valeur de cette variable change lors de l’exécution de la boucle peuvent être source de confusion. Par exemple :
numberlist = [1,2,3,4]
for num in numberlist:
print num

Dans ce code, « num » est créé sous forme de nouvelle variable devant contenir une valeur lors de l’exécution de la liste. L’élément num prend successivement les valeurs 1, 2, 3 et 4 avant que la boucle se termine (toutes les valeurs de la liste sont utilisées).

9. Objets

Le mot « objet » (« object » dans le code) est employé régulièrement dans les tutoriels sans référence à ce qu’il signifie. Sans trop rentrer dans le détail, quand on parle d’« objet » en programmation, il s’agit généralement d’un élément qui peut être manipulé ou utilisé d’une certaine façon dans le code, comme une variable contenant un âge, un nom, une liste, etc.

Lorsque le mot « objet » est suivi d’un autre mot, la frustration est à son comble. Par exemple, vous pouvez entendre parler d’un « objet jQuery » ou d’un « objet lxml ».
Quand les objets sont décrits ainsi, il peuvent généralement être manipulés ou utilisés avec du code provenant de cette bibliothèque :
Les méthodes jQuery peuvent alors être utilisées sur un « objet jQuery » et les méthodes lxml sur un « objet lxml ». Comment deviennent-ils des objets de ce type ? Dans le code, il y a normalement une variable qui les transforme, à un moment donné, en « objet jQuery », « objet lxml », etc. (pour vous faire une idée plus claire, effectuez des recherches sur ce sujet ou reportez-vous à des tutoriels ou aux commentaires associés à du code).

10. API

Une API est une interface de programmation d’application (d’après l’acronyme anglais Application Programming Interface). Il s’agit en fait d’une manière de poser des questions et d’obtenir des réponses.
Les API sont particulièrement utiles en programmation, car elles permettent de poser un grand nombre de questions et d’obtenir tout autant de réponses (généralement sous forme de données structurées), souvent liées à des données directes, sans intermédiaire.

La programmation se focalise souvent sur la présentation des informations qui en résultent : par exemple, sur une carte, un graphique ou une chronologie.
Les API sont à la base d’un grand nombre d’applications. Les applications Twitter, par exemple, permettent d’obtenir des réponses à la question « Que tweetent les personnes que je suis ? » à partir de l’API Twitter. Les données qui en résultent sont présentées de différentes façons, selon les applications et leur code spécifique, mais les données sous-jacentes (les tweets) sont les mêmes.

Les API généralement utiles d’un point de vue journalistique sont les suivantes :
API « Médias sociaux » (sujets de conversation/contenus partagés par des personnes à un endroit précis/avec un terme particulier)
API « Actualités » (contenu publié par un journaliste/dans une catégorie spécifique)
API « Politique » (vote d’un homme politique en particulier, circonscription de cette personne)
API « Lieux » (latitude et longitude d’un code postal, autorité locale)
API « Crimes » (crimes commis à proximité d’un endroit à une date précise, issue de ces crimes)

Si vous disposez d’une grande quantité de données, vous pouvez utiliser une API pour poser la même question à chaque donnée (avec des boucles, voir ci-dessus), par exemple, chaque code postal, homme politique ou mot recherché.
Les API peuvent également être combinées : par exemple, vous pouvez utiliser les réponses d’une API comme base pour les questions d’une autre.

Une question à une API prend normalement la forme d’une URL. Par exemple, l’URL pour interroger l’API « Police locale » sur les crimes survenus pendant un mois donné à une latitude et une longitude spécifiques est la suivante : http://data.police.uk/api/crimes-at-location?date=2012-02&lat=52.629729&lng=-1.131592
Notez que la date, la latitude et la longitude figurent toutes dans l’URL, qui est créée en respectant les indications de cette documentation.

Ai-je oublié un concept ? Faites-le-moi savoir et j’essaierai de compléter le document.

Contributeurs à la traduction française
François Jacques – twitter
Ingrid Pigueron G+ & twitter

Maps “in the public interest” now exempt from Google Maps API charge

Paul Bradshaw — Mon, 28 Nov 2011 12:55:34 +0000

If you thought you couldn’t use the Google Maps API any more as a journalist, this update to the Google Geo Developers Blog should make you reconsider. From Nieman Journalism Lab:

“Certain web apps will be given blanket exemptions from charging. Here’s Google: “Maps API applications developed by non-profit organisations, applications deemed by Google to be in the public interest, and applications based in countries where we do not support Google Checkout transactions or offer Maps API Premier are exempt from these usage limits.” So nonprofit news orgs look to be in the clear, and Google could declare other news org maps apps to be “in the public interest” and free to run. (It also notes that nonprofits could be eligible for a free Maps API Premier license, which comes with extra goodies around advertising and more.)”

Scraperwiki now makes it easier to ask questions of data

Paul Bradshaw — Thu, 22 Sep 2011 20:09:34 +0000

Image from @EatSafeWalsall

I was very excited recently to read on the Scraperwiki mailing list that the website was working on making it possible to create an RSS feed from a SQL query.

Yes, that’s the sort of thing that gets me excited these days.

But before you reach for a blunt object to knock some sense into me, allow me to explain…

Scraperwiki has, until now, done very well at trying to make it easier to get hold of hard-to-reach data. It has done this in two ways: firstly by creating an environment which lowers the technical barrier to creating scrapers (these get hold of the data); and secondly by lowering the social barrier to creating scrapers (by hosting a space where journalists can ask developers for help in writing scrapers).

This move, however, does something different.

It allows you to ask questions – of any dataset on the site. Not only that, but it allows you to receive updates as those answers change. And those updates come in an RSS feed, which opens up all sorts of possibilities around automatically publishing those answers.

The blog post explaining the development already has a couple of examples of this in practice:

Anna, for example, has scraped data on alcohol licence applications. The new feature not only allows her to get a constant update of new applications in her RSS reader – but you could also customise that feed to tell you about licence applications on a particular street, or from a particular applicant, and so on.

You will need to know some SQL, which is widely used in data journalism – particularly in the US – but it’s pretty simple to learn, because as a query language, it is designed to ask questions like ‘Select all the applications from that dataset where the application is of this status and the applicant has this name’.

And because RSS is so flexible, Stuart can use the same technology to publish live updates on restaurant inspections to @EatSafeWalsall (it could also feed a widget on a blog or website, or a map, a Facebook page, or an email newsletter).

So you can put that blunt object away. This makes Scraperwiki useful in wholly new ways: asking questions, and publishing and distributing the results, automatically.

How to use the CableSearch API to quickly reference names against Wikileaks cables (SFTW)

Paul Bradshaw — Fri, 09 Sep 2011 12:33:25 +0000

CableSearch is a neat project by the European Centre for Computer Assisted Research and VVOJ (the Dutch-Flemish association for investigative journalists) which aims to make it easier for journalists to interrogate the Wikileaks cables. Although it’s been around for some time, I’ve only just noticed the site’s API, so I thought I’d show how such an API can be useful as a way to draw on such data sources to complement data of your own.

Example question: “How many Swedish party leaders are mentioned in the cables?”

There’s no particular reason why I picked Sweden, but this is an exercise you could do with any list – MPs, cabinet members, organisational heads, etc.

First, you need to grab the list. I did so by using the =importHTML formula on this Wikipedia page. You would obviously need to check that. Alternatively, you could use =importXML on this official Swedish parliament page for a list of ministers.

(I’m not going to repeat these processes as you can read how to do these by clicking through to the links explaining them above)

Here are the results. As often happens with Wikipedia tables, the first row is shifted so the headings don’t quite match the columns below. As we only need a list of names we don’t have to correct that. (For the =importXML scrape, you’ll also encounter a problem with accented characters, but this will still be quicker to correct than if we were manually copying the list across)

Now download that spreadsheet as a CSV file, and open up Google Refine.

Testing with the API

I’ve previously explained how to use Google Refine with the APIs of Google Maps, UK-Postcodes, and They Work For You (UK politics).

The CableSearch API page is pretty straightforward if you’ve followed any of those – but it’s key that you test what results Google Refine provides against what you get from a manual search (and make sure you have a test that provides unusual results – in this case, anything less than 10 results).

In particular, testing reveals that your search term needs to first be formatted in a particular way to avoid you getting the wrong results.

Formatting your data

So in our data we have a list of names – but if we just run them through CableSearch we will get results where those names do not appear together. In other words, a search for John Jones will bring back results where anyone called John and anyone called Jones is mentioned.

The normal solution is to put quotation marks around the search term, to ensure that only results containing that exact phrase are returned, i.e. “John Jones”.

With an API where we are constructing a URL, however, that space can cause problems because a URL cannot contain a space. We need to replace it with a code for a space: %20 (if you do a search for anything containing a space, you will notice that %20 will sometimes appear in the URL for the results in its place; at other times a + sign will replace the space)

So, here’s how to reformat the text accordingly:

Click on the arrow at the top of your column of names, and select Edit Column > Add column based on this column…
In the window that appears type the following code: ‘”‘+value.split(” “).join(“%20”)+'”‘
Give the column a name and click OK.

The start and end may be difficult to see, so here it is with spaces in between:

‘ ” ‘

You’ll see that it’s a single inverted comma followed by double inverted commas and a further single inverted comma. That adds double inverted commas at the start and end of our new data.

The rest of the code splits the original data wherever there is a space (” “) and joins the resulting fragments together with “%20”.

And so John Jones becomes “John%20Jones” – which will work in the API (one cell has 2 names, however, which you will need to clean up).

Grabbing from the API

Now that we have properly formatted text we can ask the CableSearch API for the information it has on each name. Here’s how:

Click on the arrow at the top of the newly created column of formatted names, and select Edit Column > Add column by fetching URLs
In the window that appears type the following code: “http://cablesearch.org/cable/api/search?q=”+value
Give the column a name and click OK.

It will now go and fetch data for each name, which may take a few minutes (or more, depending how many names you have).

When it’s finished you should have a column of cells containing JSON data. It will be very hard to look at (more on how to read JSON here) but that’s OK because we’re going to create a final column to extract the piece of data we want.

Extracting from the JSON

The process should be familiar by now:

Click on the arrow at the top of the newly created column of formatted names, and select Edit Column > Add column based on this column…
In the window that appears type the following code: value.parseJson().info.items
Give the column a name and click OK.

This will create a new column which just tells you how many results there are for each name. Where it says ’10’ there are probably more (that’s the maximum value – sadly the API doesn’t return any information on total records, although the API page details one way you can continue to cycle through pages of results beyond the first 10).

This enables you to take a list of names and quickly find out which ones are mentioned in the cables at all, and which ones have been mentioned just a few times – saving you lots of searches, and time, and allowing you to narrow the focus of your work.

A more powerful API would allow you to narrow your focus further: by date range, for example, or source, urgency or classification. The broader point is: this is why APIs are useful. Knowing how to use them (and which ones there are) simply gives you another way to do a job better.

How to: convert easting/northing into lat/long for an interactive map

Paul Bradshaw — Fri, 12 Aug 2011 21:01:08 +0000

A map generated in Google Fusion Tables from a dataset cleaned using these methods

" data-large-file="https://onlinejournalismblog.com/wp-content/uploads/2011/07/picture-42.png?w=515" class="size-full wp-image-14984 " alt="A map generated in Google Fusion Tables from a geocoded dataset" src="https://onlinejournalismblog.com/wp-content/uploads/2013/12/picture-42.png?w=625" />

A map generated in Google Fusion Tables from a dataset cleaned using these methods

Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn’t too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed cameras.

So you’ll need a way to convert easting and northing into something that Fusion Tables does like – such as latitude and longitude.

Here’s how I did it – quickly.

Find an API to do the work for you

The first thing I needed was an online tool that will do the conversions. Nearby.org.uk is pretty useful for doing so manually – and there’s an API as well – but I wanted something that would give me a nice JSON feed for Google Refine.

So I asked Twitter.

This is where being a part of communities of practice is important for journalists. (Samuel Johnson once said that there are two types of knowledge: “We know a subject ourselves, or we know where we can find information upon it.” Those communities are an example of the latter).

Stuart Harrison very helpfully said he would adapt his postcodes API to convert easting and northing – and within an hour it was ready.

Using Google Refine to work with the API

The API works by generating information in JSON format based on a URL (I explain JSON in this post).

For example, the following URL generates a page of JSON with the latitude and longitude for easting 492412, northing 329757:

http://www.uk-postcodes.com/eastingnorthing.php?easting=492412&northing=329757

I know that Google Refine will be able to use that JSON to extract the latitude and longitude for dozens of rows with different values and add them to the spreadsheet (here’s a post explaining that in more detail).

So here’s what I do:

Generating the end bit of the URLs

I need a new column in my spreadsheet that fetches information from those URLs – there are a couple of ways of doing this but I’m going to show the simplest way for a beginner (rather than the simplest method programmatically*)

This involves creating a new column which conveniently puts together the end part of the URL that I’ll be calling: in this case easting=492412&northing=329757 (where the numbers change in each cell).

Click on the drop-down arrow at the top of the Easting column and select Edit column > Add column based on this column…
In the window that appears type the following GREL (Google Refine Expression Language): “easting=”+cells[“Easting”].value+”&northing=”+cells[“Northing”].value
This assumes that the column with the easting values is called ‘Easting’ (note the capital E) and the northing column is called ‘Northing’. Change these to the names of your columns if they’re different.
Give the new column a name in the box at the top and save it. You should see a new column appear, populated with values like easting=492412&northing=329757 – in each cell the process is simply writing a string of characters that begins with easting=, then adds the value in the cell within the ‘Easting’ column, adds &northing=, then adds the value in the cell within the ‘Northing’ column.

These are the second parts of the URLs we’re going to fetch lat-long values from.

Fetching data from those URLs

At the top of this new column, then:

Click on the drop-down arrow of your newest column and select Edit column > Add column by fetching URLs…
In the window that appears type the following GREL (Google Refine Expression Language): “http://www.uk-postcodes.com/eastingnorthing.php?”+value
As you can see, this simply looks at a URL that begins http://www.uk-postcodes.com/eastingnorthing.php? and ends with the value in each cell of the column selected. It will then populate a new column of cells with the JSON returned by each different URL.
Give the new column a name in the box at the top and save it. You should again see a new column appear – but this will take longer, because it is going to that website and gathering information. Make a cup of tea.

Extracting the latitude and longitude into separate cells

Great – now we have the lat-long values for each row. But to visualise this data we need separate columns for latitude and longitude, so this is how we get that out of the JSON. UPDATE: In the 2.0 version of Refine the old GREL (struck through below) no longer seems to work – thanks to Tom in the comments for pointing this out and adding the new code which is shown below.

Click on the drop-down arrow and select Edit column > Add column based on this column…
In the window that appears type the following GREL (Google Refine Expression Language): ~~value.parseJson()lng~~ parseJson(value).get(“lng”)
This will look at the value of each cell, and pull out the bit after “lng” and populate a new column of cells with each value
Give the new column a name in the box at the top (e.g. longitude) and save it.
Repeat the process for latitude – the GREL you need is ~~value.parseJson()lat~~ parseJson(value).get(“lat”)

You should now have a spreadsheet of data that includes latitude and longitude for each row. Click on Export in the upper right corner and select Comma-separated value.

Visualise it in Fusion Tables

Go to Google Fusion Tables and upload that file. Then open it. Click on Visualize and you should have a map option. Once visualised you can embed it elsewhere by clicking on Get embeddable link.

For an example of how that embed code looks on a page, here is one I prepared earlier. (And here is the data it is pulling from).

*The simpler way programmatically is to go straight to ‘Fetching data from URLs’ and use the following GREL code:

“http://www.uk-postcodes.com/eastingnorthing.php?easting=”+cells[“Easting”].value+”&northing=”+cells[“Northing”].value

How to: convert easting/northing into lat/long for an interactive map

Paul Bradshaw — Fri, 12 Aug 2011 08:27:07 +0000

A map generated in Google Fusion Tables from a dataset cleaned using these methods

" data-large-file="https://onlinejournalismblog.com/wp-content/uploads/2011/07/picture-42.png?w=515" class="size-full wp-image-14984 " src="https://onlinejournalismblog.com/wp-content/uploads/2013/12/picture-42.png?w=625" alt="A map generated in Google Fusion Tables from a geocoded dataset" />

A map generated in Google Fusion Tables from a dataset cleaned using these methods

So you’ll need a way to convert easting and northing into something that Fusion Tables does like – such as latitude and longitude.

Here’s how I did it – quickly.

Find an API to do the work for you

So I asked Twitter.

Stuart Harrison very helpfully said he would adapt his postcodes API to convert easting and northing – and within an hour it was ready.

Using Google Refine to work with the API

The API works by generating information in JSON format based on a URL (I explain JSON in this post).

For example, the following URL generates a page of JSON with the latitude and longitude for easting 492412, northing 329757:

http://www.uk-postcodes.com/eastingnorthing.php?easting=492412&northing=329757

So here’s what I do:

Generating the end bit of the URLs

Click on the drop-down arrow at the top of the Easting column and select Edit column > Add column based on this column…
In the window that appears type the following GREL (Google Refine Expression Language): “easting=”+cells[“Easting”].value+”&northing=”+cells[“Northing”].value
This assumes that the column with the easting values is called ‘Easting’ (note the capital E) and the northing column is called ‘Northing’. Change these to the names of your columns if they’re different.
Give the new column a name in the box at the top and save it. You should see a new column appear, populated with values like easting=492412&northing=329757 – in each cell the process is simply writing a string of characters that begins with easting=, then adds the value in the cell within the ‘Easting’ column, adds &northing=, then adds the value in the cell within the ‘Northing’ column.

These are the second parts of the URLs we’re going to fetch lat-long values from.

Fetching data from those URLs

At the top of this new column, then:

Click on the drop-down arrow of your newest column and select Edit column > Add column by fetching URLs…
In the window that appears type the following GREL (Google Refine Expression Language): “http://www.uk-postcodes.com/eastingnorthing.php?”+value
As you can see, this simply looks at a URL that begins http://www.uk-postcodes.com/eastingnorthing.php? and ends with the value in each cell of the column selected. It will then populate a new column of cells with the JSON returned by each different URL.
Give the new column a name in the box at the top and save it. You should again see a new column appear – but this will take longer, because it is going to that website and gathering information. Make a cup of tea.

Extracting the latitude and longitude into separate cells

Click on the drop-down arrow and select Edit column > Add column based on this column…
In the window that appears type the following GREL (Google Refine Expression Language): ~~value.parseJson()lng~~ parseJson(value).get(“lng”)
This will look at the value of each cell, and pull out the bit after “lng” and populate a new column of cells with each value
Give the new column a name in the box at the top (e.g. longitude) and save it.
Repeat the process for latitude – the GREL you need is ~~value.parseJson()lat~~ parseJson(value).get(“lat”)

You should now have a spreadsheet of data that includes latitude and longitude for each row. Click on Export in the upper right corner and select Comma-separated value.

Visualise it in Fusion Tables

For an example of how that embed code looks on a page, here is one I prepared earlier. (And here is the data it is pulling from).

*The simpler way programmatically is to go straight to ‘Fetching data from URLs’ and use the following GREL code:

“http://www.uk-postcodes.com/eastingnorthing.php?easting=”+cells%5B“Easting”%5D.value+”&northing=”+cells%5B“Northing”%5D.value

api – Online Journalism Blog

Linked data and structured journalism at the BBC

Effortlessly updating the Olympics and elections

‘Trusting in articles’

Knowledge is association and links

Repeated context

Starting all over again, again

Events and ontologies

Storylines

A presentation issue

Object-Based Broadcasting

We don’t have to repeat ourselves every time we’ve got to write an article

Narratives evolve

Create your own Instagram/Facebook/Twitter API with Google Drive and IFTTT

How-to: learn about APIs while making tweetable quotes

Sharelines

Stage 2: Adding links, hashtags and @names to a ‘tweet this’ window

An introduction to APIs

APIs and documentation

API parameters

How-to: learn HTML and CSS by making tweetable quotes

Sharelines

Stage 1: The tweetable quote

Starting with HTML: opening and closing tags

– this makes a Heading 2 subheading. Similar tags will create headings at levels 3 down to 6

Attributes and values

Customising the tweeted text – hackable URLs

Changing your linked text to a ‘call to action’

Which tag comes first?

Adding a Twitter icon

Journalisme et code : 10 grands principes de programmation expliqués

1. Variables

2. Chaînes, entiers et autre jargon pour décrire les types de données

3. Classes, identificateurs et sélecteurs

4. Fonctions et méthodes

5. Arguments ou paramètres

6. Bibliothèques

7. Listes/séries et dictionnaires/dicts

8. Boucles : for, each, while

9. Objets

10. API

Maps “in the public interest” now exempt from Google Maps API charge

Scraperwiki now makes it easier to ask questions of data

How to use the CableSearch API to quickly reference names against Wikileaks cables (SFTW)

Example question: “How many Swedish party leaders are mentioned in the cables?”

Testing with the API

Formatting your data

Grabbing from the API

Extracting from the JSON

How to: convert easting/northing into lat/long for an interactive map

Find an API to do the work for you

Using Google Refine to work with the API

Generating the end bit of the URLs

Fetching data from those URLs

Extracting the latitude and longitude into separate cells

Visualise it in Fusion Tables

How to: convert easting/northing into lat/long for an interactive map

Find an API to do the work for you

Using Google Refine to work with the API

Generating the end bit of the URLs

Fetching data from those URLs

Extracting the latitude and longitude into separate cells

Visualise it in Fusion Tables