In an extract from a new chapter in the ebook Finding Stories in Spreadsheets, I explain what regular expressions are — and how they can be used to extract information from spreadsheets.The ebook version of this tutorial includes a dataset and exercise to employ these techniques.
The story was an unusual one: the BBC Data Unit had been given access to a dataset on more than 200,000 works of art in galleries across the UK. What patterns could we find in the data that would allow us to tell a story about the nature of the nation’s paintings?
Some of the data was straightforward to work with: the ‘artist’ column was relatively clean, and allowed us to identify the most common male and female artist. It turned out that the latter – the Victorian botanist Marianne North – was relatively unknown. So, that was one story we could tell.
But other parts of the data were more problematic. The date column, for example, contained inconsistently formatted data: in the majority of cases a specific year had been entered, but in many others the data contained text such as “18th century” or “1900-1920” or “1800s”.
We also noticed that monarchs featured heavily in the art – but understandably there was no column that was specifically dedicated to classifying those. If we wanted to identify the most-painted monarchs we would have to create new data that somehow extracted those names from the paintings’ titles.
These problems – extracting data from existing data, particular text data – are what regular expressions are designed for. In this chapter I will explain what regular expressions are, and how to use them in spreadsheets.
Nas minhas aulas e treinamentos de jornalismo de dados, costumo falar sobre os tipos mais comuns de histórias que podem ser encontradas em bancos de dados. Então, selecionei 100 reportagens baseadas em dados, analisei-as e verifiquei com qual frequência cada um desses ângulos é utilizado.
Cheguei à conclusão de que, na verdade, existem sete ângulos principais para reportagens e histórias baseadas em dados. Muitas histórias incorporam outros ângulos como dimensões secundárias da narrativa (uma história de mudança pode passar a falar sobre a escala de algo, por exemplo), mas todas as histórias de jornalismo de dados que examinei levaram um desses ângulos como fio-condutor.
Neste post, examino como os sete ângulos mais comuns podem ajudar você a ter ideias para histórias e reportagens, assim como a variedade de execuções e as principais considerações para se ter em mente.
Al Jazeera’s interactive team AJ Labs have a mantra: “human driven data journalism”. In a guest post for OJB Hanna Duggal speaks to the team’s lead Mohammed Haddad on what this means and how he tackles big data, including a recent story commemorating the Arab Spring.
Reuters’ Graphics Team is renowned for creating a myriad of innovative news stories under tight deadlines, from Covid-19 coverage to mapping the movement of shifting smoke from California wildfires. In a guest post for OJB, Hanna Duggal speaks to the team’s Simon Scarr and Marco Hernandez about pushing the boundaries of visual storytelling in the newsroom and the relationship between data and design.
In a world that has become increasingly data-prolific and hardwired towards visual content, visualisation provides the newsroom with both a way to communicate complex data effectively and to engage audiences.
Data graphics have become more immersive, compelling and revealing, — and for Reuters, an integral part of how stories are told.
“I’m incredibly proud of our breaking news work,” says Simon Scarr, Reuters’ Deputy Head of Graphics. Continue reading →
In a special guest post for OJB, Vanessa Fillis speaks to AlgorithmWatch’s Nicolas Kayser-Bril about his work on how online platforms optimise ad delivery, including his recent story on how Facebook draws on gender stereotypes.
“Automated systems are supposed to bring relevant content to the users,” says Nicolas. “And I use ‘relevant’ because it’s the adjective that Facebook uses — and there is a sense that relevant content is determined based on the actions of the users themselves.”
But in reality, everything Kayser-Bril knows about large scale automated systems like Facebook’s news feed hints that their decisions about what to show to an user is based on many different factors instead. Continue reading →
Economist and podcaster Tim Harford, author of How To Make The World Add Up, spoke to MA Data Journalism students this month. In a guest post for OJB Niels de Hoog rounds up Tim’s tips on creating compelling number-driven stories for radio and podcasts
Orson Welles famously said that there’s nothing an audience won’t understand, as long as you can get them to be interested.
Listening to Tim Harford’s podcasts it is clear that he has taken this message to heart.
“If you’ve got a hook, a personality, or a question people want answered, that will carry people through a certain degree of complexity that they wouldn’t tolerate if it was reported straight.”
Take More or Less, his podcast about statistics for BBC Radio 4. At first glance it doesn’t offer the easiest subject for an engaging audio story — yet somehow the programme is very entertaining to listen to. Continue reading →
Cruza Grafos (registration required) is an online visual interface where journalists can research political candidates, and relate candidates to companies and entities with an official registration number in Brazil.
The tool allows journalists to work with huge datasets without any coding.
According to Reinaldo Chaves, Abraji’s project coordinator, many journalists do not know how to code or even how to open a spreadsheet — a situation that makes some investigative projects impossible to happen.
“We hope the Cruza Grafos makes this kind of investigation easier and democratizes access to huge datasets.”
Yesterday I wrote the first of a two-part series on the 7 angles that are used to tell stories about data. In this second part I finish the list with a look at the three less common angles: those stories focusing on relationships; angles that focus on the data itself — its absence, poor quality, or existence — and exploratory stories that often provide an opportunity to get to the grips with the data itself.
Data angle 5. ‘Explore’: tools, interactivity — and art
Exploratory angles are largely web-native. Its selling point is often characterised by a ‘call to action’ like “explore”, “play” or “Take the quiz”. Alternatively, it might sell the comprehensiveness of the analysis in the way that it is “Mapped” or documents “Every X that ever happened”, or simply answers the question “Who/how/where”. Continue reading →
In my data journalism teaching and training I often talk about common types of stories that can be found in datasets — so I thought I would take 100 pieces of data journalism and analyse them to see if it was possible to identify how often each of those story angles is used.
I found that there are actually broadly seven core data story angles. Many incorporate other angles as secondary dimensions in the storytelling (a change story might go on to talk about the scale of something, for example), but all the data journalism stories I looked at took one of these as its lead.
In the first of a two-part series I walk through how the four most common angles can help you identify story ideas, the variety of their execution, and the considerations to bear in mind. Continue reading →