Category Archives: data journalism

What are regular expressions — and how to use them in Google Sheets to get data from text

In an extract from a new chapter in the ebook Finding Stories in Spreadsheets, I explain what regular expressions are — and how they can be used to extract information from spreadsheets. The ebook version of this tutorial includes a dataset and exercise to employ these techniques.

The story was an unusual one: the BBC Data Unit had been given access to a dataset on more than 200,000 works of art in galleries across the UK. What patterns could we find in the data that would allow us to tell a story about the nature of the nation’s paintings?

Some of the data was straightforward to work with: the ‘artist’ column was relatively clean, and allowed us to identify the most common male and female artist. It turned out that the latter – the Victorian botanist Marianne North – was relatively unknown. So, that was one story we could tell.

ukart

But other parts of the data were more problematic. The date column, for example, contained inconsistently formatted data: in the majority of cases a specific year had been entered, but in many others the data contained text such as “18th century” or “1900-1920” or “1800s”.

We also noticed that monarchs featured heavily in the art – but understandably there was no column that was specifically dedicated to classifying those. If we wanted to identify the most-painted monarchs we would have to create new data that somehow extracted those names from the paintings’ titles.

These problems – extracting data from existing data, particular text data – are what regular expressions are designed for. In this chapter I will explain what regular expressions are, and how to use them in spreadsheets.

Continue reading

Os ângulos mais usados por jornalistas para contar histórias com dados

Nas minhas aulas e treinamentos de jornalismo de dados, costumo falar sobre os tipos mais comuns de histórias que podem ser encontradas em bancos de dados. Então, selecionei 100 reportagens baseadas em  dados, analisei-as e verifiquei com qual frequência cada um desses ângulos é utilizado.

Cheguei à conclusão de que, na verdade, existem sete ângulos principais para reportagens e histórias baseadas em dados. Muitas histórias incorporam outros ângulos como dimensões secundárias da narrativa (uma história de mudança pode passar a falar sobre a escala de algo, por exemplo), mas todas as histórias de jornalismo de dados que examinei levaram um desses ângulos como fio-condutor.

Neste post, examino como os sete ângulos mais comuns podem ajudar você a ter ideias para histórias e reportagens, assim como a variedade de execuções e as principais considerações para se ter em mente.

Continue reading

“Don’t give me more data — give me a story.” AJ Labs’ Mohammed Haddad on spotlighting human driven data journalism

The Arab Spring: Retweeted

Al Jazeera’s interactive team AJ Labs have a mantra: “human driven data journalism”. In a guest post for OJB Hanna Duggal speaks to the team’s lead Mohammed Haddad on what this means and how he tackles big data, including a recent story commemorating the Arab Spring. 

Mohammed Haddad joined Al Jazeera just as the Egyptian revolution began to unfold in 2011. Since then he has been behind some of Al Jazeera’s most prolific data stories, covering everything from UN General Assembly voting to mapping India and China’s disputed borders.

And, while many of the issues Al Jazeera covers are deeply complex, AJ Labs often help to explain such narratives using data journalism. Continue reading

“Systems would go offline for days just to delay the release of data” – Rodrigo Menegat on Covid-19 data journalism in Brazil

In a guest post for OJB, Rodrigo George Willoughby spoke to data journalist Rodrigo Menegat about reporting on Covid-19 in Brazil, managing uncertainty and how data journalism could help debunk misinformation.

At the height of the first wave of the coronavirus pandemic in March, data on the disease was in high demand. It required collaboration — something made more difficult with data lacking in quality.

Having spent most of his career covering politics, last year Rodrigo Menegat realised that science data — particularly Covid-19 data — was fast becoming a staple in the newsroom. 

“The first challenge was learning how to cover data which is very different to sport or politics,” he says.

The difficulty was understanding something that, as a country, Brazil was not ready to face. Continue reading

Striking the balance between graphic design and data journalism: “Design is a conversation”

Beirut blast scrollytell

Reuters’ Graphics Team is renowned for creating a myriad of innovative news stories under tight deadlines, from Covid-19 coverage to mapping the movement of shifting smoke from California wildfires. In a guest post for OJB, Hanna Duggal speaks to the team’s Simon Scarr and Marco Hernandez about pushing the boundaries of visual storytelling in the newsroom and the relationship between data and design. 

In a world that has become increasingly data-prolific and hardwired towards visual content, visualisation provides the newsroom with both a way to communicate complex data effectively and to engage audiences.

Data graphics have become more immersive, compelling and revealing, — and for Reuters, an integral part of how stories are told.

“I’m incredibly proud of our breaking news work,” says Simon Scarr, Reuters’ Deputy Head of Graphics. Continue reading

“There are still many questions that are not answered” – Nicolas Kayser-Bril on investigating algorithmic discrimination on Facebook

When deciding who to show an ad to, Facebook relies on gross stereotypes

 

In a special guest post for OJB, Vanessa Fillis speaks to AlgorithmWatch’s Nicolas Kayser-Bril about his work on how online platforms optimise ad delivery, including his recent story on how Facebook draws on gender stereotypes.

Kayser-Bril first became aware of automated discrimination when he read about an experiment done by researchers at North Eastern University in the US. Seeing that the analysis could be replicated in Europe, he decided to take a closer look at Facebook and Google’s distribution systems.

“Automated systems are supposed to bring relevant content to the users,” says Nicolas. “And I use ‘relevant’ because it’s the adjective that Facebook uses — and there is a sense that relevant content is determined based on the actions of the users themselves.”

But in reality, everything Kayser-Bril knows about large scale automated systems like Facebook’s news feed hints that their decisions about what to show to an user is based on many different factors instead. Continue reading

Tim Harford on telling data stories with audio: “You need to keep simplifying”

Economist and podcaster Tim Harford, author of How To Make The World Add Up, spoke to MA Data Journalism students this month. In a guest post for OJB Niels de Hoog rounds up Tim’s tips on creating compelling number-driven stories for radio and podcasts 

Orson Welles famously said that there’s nothing an audience won’t understand, as long as you can get them to be interested.

Listening to Tim Harford’s podcasts it is clear that he has taken this message to heart.

“If you’ve got a hook, a personality, or a question people want answered, that will carry people through a certain degree of complexity that they wouldn’t tolerate if it was reported straight.”

Take More or Less, his podcast about statistics for BBC Radio 4. At first glance it doesn’t offer the easiest subject for an engaging audio story — yet somehow the programme is very entertaining to listen to. Continue reading

Brazilian journalists launch network analysis tool to investigate political relationships

Cruza Grafos

The Brazilian Association of Investigative Journalism (Abraji) has launched an advanced data tool to help journalists research about politicians and companies, reports Beatriz Farrugia

The platform, Cruza Grafos (“Crossing Graphs”), was created by a partnership between Brasil.io and the Google News Initiative

Cruza Grafos (registration required) is an online visual interface where journalists can research political candidates, and relate candidates to companies and entities with an official registration number in Brazil. 

The tool allows journalists to work with huge datasets without any coding.

According to Reinaldo Chaves, Abraji’s project coordinator, many journalists do not know how to code or even how to open a spreadsheet — a situation that makes some investigative projects impossible to happen. 

“We hope the Cruza Grafos makes this kind of investigation easier and democratizes access to huge datasets.”

Continue reading

3 more angles most often used to tell data stories: explorers, relationships and bad data stories

Scale: 'This is how big an issue is' Change/stasis: ‘This is going up/down/not improving’ Outliers/ranking: ‘The best/worst/where we rank’ Variation: "Postcode lotteries" and distributions Exploration: Tools, simulators, analysis — and art Relationships/debunking: ‘Things are connected’ — or not, networks and flows of power and money Problems & solutions: ‘Concerns over data’, ‘Missing data’, ‘Get the data’

Yesterday I wrote the first of a two-part series on the 7 angles that are used to tell stories about data. In this second part I finish the list with a look at the three less common angles: those stories focusing on relationships; angles that focus on the data itself — its absence, poor quality, or existence — and exploratory stories that often provide an opportunity to get to the grips with the data itself.

Data angle 5. ‘Explore’: tools, interactivity — and art

How Y’all, Youse and You Guys Talk

This New York Times interactive became one of their most-read stories of all time

Exploratory angles are largely web-native. Its selling point is often characterised by a ‘call to action’  like “explore”, “play” or “Take the quiz”. Alternatively, it might sell the comprehensiveness of the analysis in the way that it is “Mapped” or documents “Every X that ever happened”, or simply answers the question “Who/how/where”. Continue reading

Here are the angles journalists use most often to tell the stories in data

7 common angles for data storie: scale, change, ranking, variation, explore, relationships, bad data, leads

In my data journalism teaching and training I often talk about common types of stories that can be found in datasets — so I thought I would take 100 pieces of data journalism and analyse them to see if it was possible to identify how often each of those story angles is used.

I found that there are actually broadly seven core data story angles. Many incorporate other angles as secondary dimensions in the storytelling (a change story might go on to talk about the scale of something, for example), but all the data journalism stories I looked at took one of these as its lead.

In the first of a two-part series I walk through how the four most common angles can help you identify story ideas, the variety of their execution, and the considerations to bear in mind. Continue reading