Data journalism pt5: Mashing data (comments wanted)

This is a draft from a book chapter on data journalism (part 1 looks at finding data; part 2 at interrogating datapart 3 at visualisation, and 4 at visualisation tools). I’d really appreciate any additions or comments you can make – particularly around tips and tools.

UPDATE: It has now been published in The Online Journalism Handbook.

Mashing data

Wikipedia defines a mashup particularly succinctly, as “a web page or application that uses or combines data or functionality from two or many more external sources to create a new service.” Those sources may be online spreadsheets or tables; maps; RSS feeds (which could be anything from Twitter tweets, blog posts or news articles to images, video, audio or search results); or anything else which is structured enough to ‘match’ against another source.

This ‘match’ is typically what makes a mashup. It might be matching a city mentioned in a news article against the same city in a map; or it may be matching the name of an author with that same name in the tags of a photo; or matching the search results for ‘earthquake’ from a number of different sources. The results can be useful to you as a journalist, to the user, or both.

Why make a mashup?

Mashups can be particularly useful in providing live coverage of a particular event or ongoing issue – mashing images from a protest march, for example, against a map. Creating a mashup online is not too dissimilar from how, in broadcast journalism, you might set up cameras at key points around a physical location in anticipation of an event from which you will later ‘pull’ live feeds: in a mashup you are effectively doing exactly the same thing – only in a virtual space rather than a physical one. So, instead of setting up a feed at the corner of an important junction, you might decide to pull a feed from Flickr of any images that are tagged with the words ‘protest’ and ‘anti-fascist’.

Some web developers have built entire sites that are mashups. Twazzup (twazzup.com) for example, will show you a mix of Twitter tweets, images from Flickr, news updates and websites – all based on the search term you enter. And Friendfeed (friendfeed.com) pulls in data that you and your social circle post to a range of social networking sites, and displays them in one place.

Mashups also provide a different way for users to interact with content – either by choosing how to navigate (for instance by using a map), or by inviting them to input something (for instance, a search term, or selecting a point on a slider). The Super Tuesday YouTube/Google Maps mashup, for instance, provided an at-a-glance overview of what election-related videos were being uploaded where across the US.

Finally, mashups offer an opportunity for juxtaposing different datasets to provide fresh, sometimes ongoing, insights. The MySociety/Channel 4 project Mapumental, for example, combines house price data with travel information and data on the ‘scenicness’ of different locations to provide an interactive map of a location which the user can interrogate based on their individual preferences.

Mashup tools

Like so many aspects of online journalism, the ease with which you can create a mashup has increased significantly in recent years. An increase in the number and power of online tools, combined with the increasing ‘mashability’ of websites and data, mean that journalists can now create a basic mashup through the simple procedures of drag-and-drop or copy-and-paste.

A simple RSS mashup, which combines the feeds from a number of different sources into one, for example, can now be created using an RSS aggregator such as xFruits (xfruits.com) or Jumbra (jumbra.com).

Likewise, you can mix two maps together using the website MapTube (maptube.org) which also contains a number of maps for you to play with.

And if you want to mix two sources of data into one visualisation the site DataMasher (datamasher.org) will let you do that – although you’ll have to make do with the US data that the site provides. Google Public Data Explorer (google.com/publicdata) is a similar tool which allows you to play with global data.

But perhaps the most useful tool for news mashups is Yahoo! Pipes (pipes.yahoo.com).

Yahoo! Pipes allows you to choose a source of data – it might be an RSS feed, an online spreadsheet or something that the user will input – and do a variety of things with it. Here are just some of the basic things you might do:

  • Add it to other sources
  • Combine it with other sources – for instance, matching images to text
  • Filter it
  • Count it
  • Annotate it
  • Translate it
  • Create a gallery from the results
  • Place results on a map

You could write a whole book on how to use Yahoo! Pipes – indeed, people have – so we will not cover the practicalities of using all of those features here. There are also dozens of websites and help files devoted to the site (which you should explore). Below, however, is a short tutorial to introduce you to the website and how it works – this is a good way to understand how basic mashups work, and how easily they can be created.

Mashups and APIs

Although there are a number of easy-to-use mashup creators listed above, really impressive mashups tend to be written by people with knowledge of programming languages, and use APIs. APIs (Application Programming Interface) allow websites to interact with other websites. The launch of the Google Maps API in 2005, for example, has been described as a ‘huge tipping point’ in mashup history (Duvander, 2008) as it allowed web developers to ‘mash’ countless other sources of data with maps. Since then it has become commonplace for new websites, particularly in the social media arena, to launch their own APIs in order to allow web developers to do interesting things with their feeds and data – not just mashups, but applications and services too.

If you want to develop a particularly ambitious mashup it is likely that you will need to teach yourself some programming skills, and familiarise yourself with some APIs (the APIs of Twitter, Google Maps and Flickr are good places to start).

Box-out: Anatomy of a feed

The image below from ReadWriteWeb shows the code behind a simple Twitter update. It includes information about the author, their location, whether the update was a reply to someone else, what time and where it was created, and lots more besides. Each of these values can be used by a mashup in various ways – for example, you might match the author of this tweet with the author of a blog or image; you might match its time against other things being published at that moment; or you might use their location to plot this update on a map.

While the code can be intimidating, you do not need to understand programming in order to be able to do things with it. Of course, it will help if you do…

Anatomy of a Twitter feed

7 thoughts on “Data journalism pt5: Mashing data (comments wanted)

  1. Pingback: links for 2010-05-04 « Onlinejournalismtest's Blog

  2. Pingback: Recommended Links for May 4th | Alex Gamela - Digital Media & Journalism

  3. Pingback: Prototype Trends: Women of Twitter « The Daily Drink

  4. ScrapingArts

    Excellent article, which we had the please to tweet:

    But first let me say that the #1 problem facing journalists is not the availability of mashable data, but the availability of the piece of data or dataset they WANT to investigate. For every RSS and XML feed there are hundred pieces of crucial data, hidden behind legacy systems, clumsy web front-ends, members only login windows and proprietary formats.

    There is a cottage industry of websites on sex offenders and crime records. Why? because legislation made it necessary to make that data available. But what about all the important government, business and private sector data that SHOULD be public, but that don’t have the prime-time news appeal to make them so?

    People in our industry (the web scrapers) have been a pariah in the web community. But there are those of us who are dying to collaborate with the journalists, academia and the public sector to put our services to positive uses. I spend a few hours of my day on hold on the phone, waiting for “permission to scrape” some public data or another.

    Reply
  5. Pingback: COMMS 3.0: How open data will change the face of news and PR « The Dan Slee Blog

  6. Pingback: Video, das Video, la vidéo, el video, Il video, 录影 « City Video Journalism

  7. Pingback: How to be a data journalist | Richard Hartley

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.