Tag Archives: tools

Just add JavaScript: use these 3 tools to get instant interactivity

Maria Crosas Batista highlights ways to get started with adding interactivity to your journalism.

This post is for beginners that are learning HTML, CSS and JavaScript. Below are 3 useful websites to embed maps, charts and timelines in your HTML without going crazy. Continue reading

Livesheets creator wants to “make all kids into rocket scientists”

Abacus image by Anssi Koskinen

Abacus image by Anssi Koskinen on Flickr

“Imagine if you could search for any calculations and then just use them directly without ever having to work it out yourself from scratch.”

This is the vision of developer Daniel Maxwell, the creator of livesheets.com, whose dream it is for no one in the world to perform the same calculation twice again. Continue reading

A case study in following a field online: setting up feeds on CCGs

Over at Help Me Investigate Health I’ve just published a bunch of 20 places to keep up to date with clinical commissioning. It’s an example of something I’ve written about previously – setting up an online network infrastructure as a journalist. And below, I explain the process behind it:

Following CCGs across local newspapers and blogs

If you’re going to start scrutinising a field, it’s very useful to be kept up to date with developments in that field:

  • Concerns raised in one local newspaper may be checked elsewhere;
  • Specialist magazines may provide guides to jargon or processes that helps save you a lot of time;
  • Politicians might raise concerns and get answers;
  • And expert bloggers can provide leads and questions that you might want to follow up.

Rather than checking a list of websites on the off chance that one has been updated, a much more efficient way to keep up to date on what’s happening is to use a free RSS readerContinue reading

A new tool for online verification: Google’s ‘Search by Image’

Google have launched a ‘Search by Image’ service which allows you to find images by uploading, dragging over, or pasting the URL of an existing image.

The service should be particularly useful to journalists seeking to verify or debunk images they’re not sure about.

(For examples where it may have been useful, look no further than this week’s Gay Syrian Blogger story, as well as the ‘dead’ Osama Bin Laden images that so many news outlets fell for)/

TinEye, a website and Firefox plugin, does the same thing – but it will be interesting to see if Google’s service is more or less powerful (let me know how you get on with it) Find it hereVideo here.

Extractiv: crawl webpages and make semantic connections

Extractiv screenshot

Here’s another data analysis tool which is worth keeping an eye on. Extractiv “lets you transform unstructured web content into highly-structured semantic data.” Eyes glazing over? Okay, over to ReadWriteWeb:

“To test Extractive, I gave the company a collection of more than 500 web domains for the top geolocation blogs online and asked its technology to sort for all appearances of the word “ESRI.” (The name of the leading vendor in the geolocation market.)

“The resulting output included structured cells describing some person, place or thing, some type of relationship it had with the word ESRI and the URL where the words appeared together. It was thus sortable and ready for my analysis.

“The task was partially completed before being rate limited due to my submitting so many links from the same domain. More than 125,000 pages were analyzed, 762 documents were found that included my keyword ESRI and about 400 relations were discovered (including duplicates). What kinds of patterns of relations will I discover by sorting all this data in a spreadsheet or otherwise? I can’t wait to find out.”

What that means in even plainer language is that Extractiv will crawl thousands of webpages to identify relationships and attributes for a particular subject.

This has obvious applications for investigative journalists: give the software a name (of a person or company, for example) and a set of base domains (such as news websites, specialist publications and blogs, industry sites, etc.) and set it going. At the end you’ll have a broad picture of what other organisations and people have been connected with that person or company. Relationships you can ask it to identify include relationships, ownership, former names, telephone numbers, companies worked for, worked with, and job positions.

It won’t answer your questions, but it will suggest some avenues of enquiry, and potential sources of information. And all within an hour.

Time and cost

ReadWriteWeb reports that the process above took around an hour “and would have cost me less than $1, after a $99 monthly subscription fee. The next level of subscription would have been performed faster and with more simultaneous processes running at a base rate of $250 per month.”

As they say, the tool represents “commodity level, DIY analysis of bulk data produced by user generated or other content, sortable for pattern detection and soon, Extractiv says, sentiment analysis.”

Which is nice.

Data cleaning tool relaunches: Freebase Gridworks becomes Google Refine

When I first saw Freebase Gridworks I was a very happy man. Here was a tool that tackled one of the biggest problems in data journalism: cleaning dirty data (and data is invariably dirty). The tool made it easy to identify variations of a single term, and clean them up, to link one set of data to another – and much more besides.

Then Google bought the company that made Gridworks, and now it’s released a new version of the tool under a new name: Google Refine.

It’s notable that Google are explicitly positioning Refine in their video (above) as a “data journalism” tool.

You can download Google Refine here.

Further videos below. The first explains how to take a list on a webpage and convert it into a cleaned-up dataset – a useful alternative to scraping:

The second video explains how to link your data to data from elsewhere, aka “reconciliation” – e.g. extracting latitude and longitude or language.