Tag Archives: The Pudding

Words as data: how data journalists tell stories about documents and text

Documents and other collections of text can be goldmines for data journalism — if you know how to approach them as data. Here are some techniques and inspiration for your next data project.

From stories about political speech and song lyrics, to street names and social media chatter, data journalists now have a wide range of examples of text-as-data to draw inspiration and guidance from, while tools such as Pinpoint and NotebookLM are making text analysis easier than ever.

I compiled a list of over 200 pieces of data journalism where text or documents were used as sources. Quantification techniques ranged from counting the frequency of a single word and using Google’s ngram viewer, to machine learning and topic modelling.

Looking at those articles it’s clear that, once quantified, journalists tell the same stories about text as any other piece of data: using the seven most common angles.

But how those angles are used — and how often — is where it gets interesting…

7 common angles for data stories: text and documents 
Scale: how often words/phrases are used
Change: how language has changed
Ranking: the most/least common words/phrases
Variation: e.g. in relation to gender, ethnicity, ideology etc.
Exploration: journeys through multiple angles; interactives
Relationships: correlations, similarities and connections
Meta: ‘how we quantified text’
Leads: clusters, patterns or themes for further digging
Continue reading

Why discipline is one of the 7 habits of successful journalists

"A nose for news, a plausible manner and an ability to write and deliver concise, accurate copy to deadline" - description of the qualities needed by journalists, from Ethics & Journalism by Karen Sanders

In a previous post I wrote about the central role of creativity in journalism training — in this penultimate post in a series on the seven habits of successful journalists, I explore how discipline is equally important in directing that creativity towards a professional end — and how it can actually help create the conditions for creativity. You can also read the posts on curiosity, scepticismpersistence and empathy.

While many are attracted to journalism because of its opportunities for creative expression, few are attracted by its various constraints. But it is those particular contraints which make journalism distinctive, and separate from other creative work such as art or fiction.

In fact you might argue that it is constraints that make journalism more similar to creative fields such as design, where the functionality and user of the work must be considered, leading to increasing cross-pollenation between them (e.g. the rise of design thinking in journalism).

These constraints can be broadly classed as aspects of the work that require self-control, or discipline. For example:

  • We must consider the audience in the selection and treatment of stories
  • We must hit regular deadlines
  • We must write within a particular word count or to particular timings
  • We must remain impartial and objective in our reporting (in most genres)

These aspects of discipline are reflected in some of the most common feedback given to trainee journalists: Continue reading