Category Archives: online journalism

VIDEO: Where data journalists get data from

Journalists get hold of data using four broad approaches: it might be newly published or issued; it might be leaked; they might request it; or they might seek it out based on an idea or in reaction to a news event.

In this second short video first made for students on the MA in Data Journalism at Birmingham City University and shared as part of a series of video posts, I go through the different ways that journalists obtain data and the different types of story that those sources can lead to.

VIDEO: What is data journalism — and why is it growing so much?

Data journalism isn’t just about spreadsheets and interactives: in this video from my MA Data Journalism classes at Birmingham City University I look at why the news industry has expanded its focus on data journalism over the past decade, and how thinking about definitions of data journalism can help reporters think more broadly about potential stories and subjects beyond official statistics.

I also look at related terms such as computational journalism, robot journalism and augmented journalism — and what we can learn from those definitions as practitioners.

This is part of a series of videos recorded during the coronavirus pandemic.

VIDEO: How to write for the web (BASIC principles)

The best online journalism has a range of qualities: it tends to be succinct, easy to scan, and it considers how a user might interact with it — whether through links or embedded elements, or more conversational elements like comments and social media hashtags.

One way to remember those qualities is the mnemonic BASIC: Brevity; Adaptability; Scannability; Interactivity; and Community/Conversation. In the video below I talk through those five qualities, and how to put them into practice when writing for the web.

This video was first made for students on the MA in Multiplatform and Mobile Journalism and the MA in Data Journalism at Birmingham City University and is shared as part of a series of video posts. A shorter version can also be found here.

Investigating the World Cup: tips on making FOIA requests to create a data-driven news story

Image by Ambernectar 13

Beatriz Farrugia used Brazil’s freedom of information laws to investigate the country’s hosting of the World Cup. In a special guest post for OJB, the Brazilian journalist and former MA Data Journalism student passes on some of her tips for using FOIA.

I am from Brazil, a country well-known for football and FIFA World Cup titles — and the host of the World Cup in 2014. Being a sceptical journalist, in 2019 I tried to discover the real impacts of that 2014 World Cup on the 213 million residents of Brazil: tracking the 121 infrastructure projects that the Brazilian government carried out for the competition and which were considered the “major social legacy” of the tournament.  

In 2018 the Brazilian government had taken the website and official database on the 2014 FIFA World Cup infrastructure projects offline — so I had to make Freedom of Information (FOIA) requests to get data.

The investigation took 3 months and more than 230 FOIA requests to 33 different public bodies in Brazil. On August 23, my story was published.

Here is everything that I have learned from making those hundreds of FOIA requests:

Continue reading

Pajama ethics: bear in mind these 4 principles when doing desktop-based reporting

Image by morgaine CC BY-SA 2.0

“Pajama Journalism”—reports you can do in nightclothes on a computer, without going anywhere or talking to anyone—should not define online news, but the practice is widespread. In a special guest post, Michael Bugeja argues that following just four basic principles of reporting can help improve this form of journalism.

Continue reading

Using satellite data for journalism — tips from the experts

For reporters satellite data offers unique opportunities for original investigations and visual storytelling. But how do you get started? And what should you be looking out for? In a guest post for the Online Journalism Blog, MA Data Journalism student Niels de Hoog speaks to four journalists who regularly work with satellite data about how to start, best practices and —most importantly — mistakes to avoid.

Continue reading

How to use the ‘4 stages of curiosity’ as a framework for investigations

While researching my post on developing curiosity in journalism I came across Terry Heick‘s 4 stages of curiosity. It outlines 4 steps that learners go through as they grapple with new knowledge: firstly finding out what they are expected to do (the process); then understanding the content involved; then how to transfer that to particular situations; and finally how it applies to, and changes, them.

But the same model can also be adapted to provide a framework for investigations. Here’s how:

Continue reading

What are regular expressions — and how to use them in Google Sheets to get data from text

In an extract from a new chapter in the ebook Finding Stories in Spreadsheets, I explain what regular expressions are — and how they can be used to extract information from spreadsheets. The ebook version of this tutorial includes a dataset and exercise to employ these techniques.

The story was an unusual one: the BBC Data Unit had been given access to a dataset on more than 200,000 works of art in galleries across the UK. What patterns could we find in the data that would allow us to tell a story about the nature of the nation’s paintings?

Some of the data was straightforward to work with: the ‘artist’ column was relatively clean, and allowed us to identify the most common male and female artist. It turned out that the latter – the Victorian botanist Marianne North – was relatively unknown. So, that was one story we could tell.

ukart

But other parts of the data were more problematic. The date column, for example, contained inconsistently formatted data: in the majority of cases a specific year had been entered, but in many others the data contained text such as “18th century” or “1900-1920” or “1800s”.

We also noticed that monarchs featured heavily in the art – but understandably there was no column that was specifically dedicated to classifying those. If we wanted to identify the most-painted monarchs we would have to create new data that somehow extracted those names from the paintings’ titles.

These problems – extracting data from existing data, particular text data – are what regular expressions are designed for. In this chapter I will explain what regular expressions are, and how to use them in spreadsheets.

Continue reading

Os ângulos mais usados por jornalistas para contar histórias com dados

Nas minhas aulas e treinamentos de jornalismo de dados, costumo falar sobre os tipos mais comuns de histórias que podem ser encontradas em bancos de dados. Então, selecionei 100 reportagens baseadas em  dados, analisei-as e verifiquei com qual frequência cada um desses ângulos é utilizado.

Cheguei à conclusão de que, na verdade, existem sete ângulos principais para reportagens e histórias baseadas em dados. Muitas histórias incorporam outros ângulos como dimensões secundárias da narrativa (uma história de mudança pode passar a falar sobre a escala de algo, por exemplo), mas todas as histórias de jornalismo de dados que examinei levaram um desses ângulos como fio-condutor.

Neste post, examino como os sete ângulos mais comuns podem ajudar você a ter ideias para histórias e reportagens, assim como a variedade de execuções e as principais considerações para se ter em mente.

Continue reading

Brazilian government attacks data journalist for reporting app that prescribes ineffective treatments for COVID-19

Mayra Pinheiro fala à CPI da Covid

Government says journalist “extracted data improperly” — but the journalist affirms that he only used a browser’s Inspect Element tool, reports Beatriz Farrugia.

Data journalism has been at the centre of a political debate in Brazil for two weeks after President Jair Bolsonaro’s government made allegations against a data journalist — for extracting data from a web app developed by the Brazilian Ministry of Health to prescribe treatments against COVID-19. 

The TrateCov app was launched in January 2021 for Brazilian doctors. Professionals were told they would be able to enter a patient’s profile and symptoms into the app, which would then suggest medication. 

However, the data journalist Rodrigo Menegat analyzed the app’s source code and found that, regardless of the patient’s symptoms, age and health conditions, TrateCov indicated the use of chloroquine, hydroxychloroquine and ivermectin — drugs with no scientific evidence supporting their use in the treatment of coronavirus. 

He announced his discovery on 20 January in a series of tweets. “Guys,” he wrote:

“I just put in the TrateCov app that my patient is a one week-old newborn who has a stomach ache and a runny nose. The app recommended chloroquine, ivermectin, azithromycin and everything else. Crime, crime, crime, crime.”

Other journalists and broadcasters tested the app and came to the same conclusion.

CNN Brazil reported that it simulated a query for a baby aged five months, with symptoms of fever and nasal congestion. The treatment recommended by TrateCov was chloroquine, hydroxychloroquine and ivermectin.

Soon after the complaints, the app was removed by the Brazilian Government.

Accused of committing cyber crime

Then on May 25th, during a public session of a parliamentary inquiry, Menegat was accused of having committed cyber crime by an official of the Brazilian Ministry of Health: Mayra Pinheiro

The parliamentary inquiry, opened late last month, is investigating the Bolsonaro government’s response to the pandemic. More than 461,000 people have died in Brazil so far. 

Approved by Brazil’s Supreme Court, the inquiry is pursuing multiple lines of investigation, such as why the Brazilian government promoted ineffective treatments and why three health ministers were removed over the pandemic. 

Naming the data journalist, Pinheiro said Menegat performed an “improper data extraction”. 

“He was unable to hack,” said Mayra. “He did an improper data extraction. Hacking is when you use someone’s password, enter a platform, a system. The term is not hacking. Today we have the official report that classifies it as improper data extraction.

“He did improper simulations. [The system] was taken down for investigation.”

In another testimony session to the parliamentary inquiry the previous week the former Health Minister General Eduardo Pazuello said that the app had been “stolen and hacked by a citizen”. 

After the allegations the data journalist explained that he had only used the browser’s Inspect Element tool to analyse the source code. 

“As a data journalist and developer, I only analyzed the source code which was public and available on the website of the TrateCov app, saved on a government server (https://tratecov.saude.gov.br) and accessible to any internet user curious enough to do this verification on their own.”

“The procedure has in no way altered any content on the platform”, he added. 

Since the allegations Menegat has limited his social media accounts to avoid online attacks by government supporters. 

“I am closing my Twitter account for more than an obvious reason, but I will be very pleased to show who wants to know how to use the Element Inspector to access source code from any website in the world,” wrote the journalist. 

Other Brazilian data journalists showed support for Menegat and published content explaining the technique used to analyse the app.

“The alleged hacking of the TrateCov application was nothing more than a journalistic investigation technique already used in newsrooms around the world,” said Daniel Trielli, journalist and researcher in media, technology and society, in an article published by the Folha de S.Paulo newspaper.