Data journalism pt3: visualising data – charts and graphs (comments wanted)

This is a draft from a book chapter on data journalism (the first, on gathering data, is here; the section on interrogating data is here). I’d really appreciate any additions or comments you can make – particularly around considerations in visualisation. A further section on visualisation tools, can be found here.

UPDATE: It has now been published in The Online Journalism Handbook.

“At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers – even a very large set – is to look at pictures of those numbers.” (Edward Tufte, The Visual Display of Quantitative Information, 2001)

Visualisation is the process of giving a graphic form to information which is often otherwise dry or impenetrable. Classic examples of visualisation include turning a table into a bar chart, or a series of percentage values into a pie chart – but the increasing power of both computer analysis and graphic design software have seen the craft of visualisation develop with increasing sophistication. In larger organisations the data journalist may work with a graphic artist to produce an infographic that visualises their story – but in smaller teams, in the initial stages of a story, or when speed is of the essence they are likely to need to use visualisation tools to give form to their data.

Broadly speaking there are two typical reasons for visualising data: to find a story; or to tell one. Quite often, it is both.

In the parking tickets story above, for example, it was the process of visualisation that tipped off Adrian Short and Guardian journalist Charles Arthur to the story – and led to further enquiries.

In most cases, however, the story will not be as immediately visible. Sometimes the data will need to be visualised in different ways before a story becomes clear. And an understanding of the strengths of different types of visualisation can be particularly useful here.

UPDATE (Dec 7, 2010): Visualisation probably needs to be extended to include humanisation and personalisation. More detail here, and to come.

Types of visualisation

Visualisation can take on a range of forms. The most familiar are those we know from maths and statistics: pie charts, for example, allow you to show how one thing is divided – for example, how a budget is spent, or how a population is distributed. They are thought to be particularly useful when the proportions represented are large (for example, above 25%), but less useful when lower percentages are involved, due to issues with perception and the ability to compare different elements.

More useful in those circumstances are bar charts or histograms. Although these look the same there are subtle differences between them: the bars in bar charts represent categories (such as different cities), whereas bars in histograms represent different values on a continuum (for instance: ages, weights or amounts). You should avoid using 3D or shadow effects in bar charts as these do not add to the information or clarity (histograms do not have gaps between bars). The advantage of both types of chart over pie charts is that users can more easily see the difference between one quantity and another. Bar charts also allow you to show change over time.

Pictograms are like bar charts but use an icon to represent quantity – so a population of 50,000 might be represented by 5 ‘person’ icons. It is not advisable to use pictograms if quantities are close together as the user will find it harder to discern the differences.

Also useful for showing change over time are line graphs. Lines are “suited for showing trend, acceleration or deceleration, and volatility, including sudden peaks or troughs” (Wong, 2010, p51). In addition, a series of lines overlaid upon each other can also quickly show if any variables change at different points or at simultaneous points, suggesting either relationships or shared causes (but by no means proving it – these should be taken as starting points for further investigation. You should also avoid plotting more than four lines in one chart for purposes of clarity).

Line graphs should not be used to show unrelated events. As Seth Godin (2009) puts it: “A graph of IQs of everyone in your kindergarten class should be a series of unrelated points, not a line graph. On the other hand, your weight loss is in fact a continuous function, so each piece of data should be attached.”

Scattergrams are similar to line graphs, showing the distribution of individual elements against two axes, but can be particularly useful in showing up ‘outliers’. Outliers are pieces of data which differ noticeably from the rest. These may be of particular interest journalistically when they show, for example, an MP claiming substantially more (or less) expenses than their peers.

A number of charts can be visualised together in what is sometimes called ‘small multiples‘, allowing the journalist or users to display a number of pie charts, line graphs or other charts alongside each other – allowing comparison, for example, between different populations.

Two increasingly popular forms of visualisation online are treemaps and bubble charts. Unlike other charts which allow you to visualise two aspects of the data (i.e. their place on each axis) bubble charts allow you to visualise three aspects of the data – the third being represented by the size of the bubble itself. A particularly good example of bubble charts in action can be seen in Hans Rosling’s TED talk on debunking third-world myths – a presentation which also demonstrates the potential of other forms of visualisation, and animation, in presenting complex information in an easy-to-understand way.

Finally, Treemaps visualise hierarchical data in a way that could be described as rectangular pie charts-within-pie charts. This is particularly useful for representing different parts of a whole and their relationship to each other, for instance, different budgets within a government.

Perhaps the best-known example of a treemap is Newsmap, created in 2004 by Marcos Weskamp. This visualises the amount of coverage given to stories by news organisations based on a feed from Google News. Weskamp explains it as follows:

“Google News automatically groups news stories with similar content and places them based on algorithmic results into clusters. In Newsmap, the size of each cell is determined by the amount of related articles that exist inside each news cluster that the Google News Aggregator presents. In that way users can quickly identify which news stories have been given the most coverage, viewing the map by region, topic or time. Through that process it still accentuates the importance of a given article.” (Weskamp, 2005)

These are just the most common forms of visualisation, but there are dozens more to explore. The Periodic Table of Visualisation is a particularly useful webpage giving an overview of the various forms.

Considerations in visualisation

Charlie Beckett makes a useful distinction between using visualisation for “rational understanding (I now get the figures) and emotional understanding (I now care about the figures and want to do something).” It is worth deciding which of the two you are aiming for.

When visualising data it is also important to ensure that any comparisons are meaningful, or like-for-like. In one visualisation of how many sales a musician needs to make to earn the minimum wage, for example, a comparison is made between sites selling albums, sites selling individual tracks, and those providing music streams. Clearly this is misleading – and was criticised for being so (Techdirt, 2010).

The Wall Street Journal Guide to Information Graphics (2010) offers a wealth of tips on elements to consider and mistakes to avoid in both visualisation and data research and is well worth reading for more on this area. Here are just a selection:

  • “Choose the best data series to illustrate your point, e.g. market share vs. total revenue
  • “Filter and simplify the data to deliver the essence of the data to your intended audience
  • “Make numerical adjustments to the raw data to enhance your point, e.g. absolute values vs. percentage change
  • “Choose the appropriate chart settings, e.g. scale, y-axis increments and baseline
  • “If the raw data is insufficient to tell the story, do not add decorative elements. Instead, research additional sources and adjust data to stay on point
  • “Data is only as good as its source. Getting data from reputable and impartial sources is critical. For example, data should be benchmarked against a third party to avoid bias and add credibility
  • “In the research stage, a bigger data set allows more in-depth analysis. In the edit phase, it is important to assess whether all your extra information buries the main point of the story or enhancwes [it].”

Visualising large amounts of text

If you are working with text rather than numbers there are ways to visualise that as well. Word clouds, for instance, show which words are used most often in a particular document (such as a speech, bill, or manifesto) or data stream (such as an RSS feed of what people are saying on Twitter or blogs). This can be particularly useful in drawing out the themes of a politician’s speech, for example, or the reaction from people online to a particular event. They can also be used to draw comparisons – word clouds have been used in the past to compare the inaugural speeches of Barack Obama with those of Bush and Clinton; and to compare the 2010 UK election manifestos of the Labour and Conservative parties. The tag cloud is similar to the word cloud, but typically allows you to click on an individual tag (word or phrase) to see where it has been used.

There are other forms for word visualisation too, particularly around showing relationships between words – when they occur together, or how often. The terminology varies: visualisation tool ManyEyes, for example, calls these word trees and phrase nets but other tools will have different names.

Once again, I’d welcome any comments on areas I may have missed or things journalists should consider. I’ve had to split this section into two, so Part 4 continues to look at visualisation, and focuses on tools and publishing.

7 thoughts on “Data journalism pt3: visualising data – charts and graphs (comments wanted)

  1. Pingback: links for 2010-04-28 « Onlinejournalismtest's Blog

  2. Pingback: Recommended Links for April 28th | Alex Gamela - Digital Media & Journalism

  3. Amber

    I noticed you mentioned using no more than four lines on a line graph and this is good advice. A similar piece of advice I like to keep in mind is to vary the line pattern, as well. This is valuable to do for two reasons: 1) If your graphic is reproduced in black and white, it won’t become confusing, and 2) people who are colorblind will find it easier to identify the separate lines. This advice can apply to graphics other than line graphs, as well.

    One other valuable thing to include in almost any kind of visualization is a clear legend. You would be surprised how often this is forgotten!

    And finally, I wanted to mention that if there is a geospatial component to your topic, mapping the area or showing a distribution based on location can be hugely valuable. Here’s a link to a great article on how a map played a major role in a federal jury’s decision: The Revolution Will Be Mapped.

    I hope these suggestions are helpful!

    Reply
  4. Pingback: Something I wrote for the Guardian Datablog (and caveats) | Online Journalism Blog

  5. Pingback: Manchester police tweets – live data visualisation by the MEN | Online Journalism Blog

  6. Pingback: Do things #3 | This Is Possible

  7. Pingback: Data journalism pt2: Interrogating data | Online Journalism Blog

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.