Tag Archives: AI

VIDEO: How automation played a central role in data journalism — and is now playing it again

Automation was key to the work of data journalism pioneers such as Adrian Holovaty — and it’s becoming increasingly central once again. This video, made for students on the MA in Data Journalism at Birmingham City University, explores the variety of roles that automation plays in data journalism; new concepts such as robot journalism, natural language generation (NLG) and structured journalism; and how data journalists’ editorial role becomes “delegated to the future” through the creation of algorithms.

You can find the video about Poligraft, and the FT on robot journalism at those links.

This video is shared as part of a series of video posts.

This is how I’ll be teaching journalism students ChatGPT (and generative AI) next semester

Robot with books
Image by kjpargeter on Freepik

I’m speaking at the Broadcast Journalism Teaching Council‘s summer conference this week about artificial intelligence — specifically generative AI. It’s a deceptively huge area that presents journalism educators with a lot to adapt to in their teaching, so I decided to put those in order of priority.

Each of these priorities could form the basis for part of a class, or a whole module – and you may have a different ranking. But at least you know which one to do first…

Priority 1: Understand how generative AI works

The first challenge in teaching about generative AI is that most people misunderstand what it actually is — so the first priority is to tackle those misunderstandings.

Continue reading

What is dirty data and how do I clean it? A great big guide for data journalists

Image: George Hodan

If you’re working with data as a journalist it won’t be long before you come across the phrases “dirty data” or “cleaning data“. The phrases cover a wide range of problems, and a variety of techniques for tackling them, so in this post I’m going to break down exactly what it is that makes data “dirty”, and the different cleaning strategies that a journalist might adopt in tackling them.

Four categories of dirty data problem

Look around for definitions of dirty data and the same three words will crop up: inaccurate, incomplete, or inconsistent.

Dirty data problems:
Inaccurate: Data stored as wrong type; Misentered data; Duplicate data; abbreviation and symbols.
Incomplete: Uncategorised; missing data.
Inconsistent: Inconsistency in naming of entities; mixed data
Incompatible data:  Wrong shape;
‘Dirty’ characters (e.g. unescaped HTML)

Inaccurate data includes duplicate or misentered information, or data which is stored as the wrong data type.

Incomplete data might only cover particular periods of time, specific areas, or categories — or be lacking categorisation entirely.

Inconsistent data might name the same entities in different ways or mix different types of data together.

To those three common terms I would also add a fourth: data that is simply incompatible with the questions or visualisation that we want to perform with it. One of the most common cleaning tasks in data journalism, for example, is ‘reshaping‘ data from long to wide, or vice versa, so that we can aggregate or filter along particular dimensions. (More on this later).

Continue reading

Here are some great examples of how to use AI and satellite imagery in journalism

False colour image of the Paraná River near its mouth at the Rio de La Plata, Argentina
False colour image of the Paraná River near its mouth at the Rio de La Plata, Argentina. Image: Copernicus Sentinel data [2022] processed by Sentinel Hub.

In a guest post for OJB, first published on ML Satellites, MA Data Journalism student Federico Acosta Rainis explains what can be learned from some examples of the format.

Satellite imagery is increasingly a key asset for journalists. Looking from above often allows us to put a story into context, take a more interesting perspective or show what some power prefers to keep hidden.

But with hundreds of satellites taking thousands of images of the Earth every day, it is difficult to separate the wheat from the chaff. How can we find relevant stories in this ocean of data?

Continue reading

What stories can you tell using AI and satellite imagery? Here are some ideas

In the second of two guest posts for OJB, first published on the ML Satellites blog, MA Data Journalism student Federico Acosta Rainis uses the 8 angles used by data journalists framework to explore satellite image-driven journalism.

Satellite-driven stories don’t have to use using artificial intelligence (AI) — many can be told using satellite data alone, without. The main advantages of AI include quantifying phenomena, identifying patterns, showing changes or finding a “needle in a haystack” across large territories or different time periods.

AI algorithms can also be used to automate a process: since satellites produce recurring data, you can build, for example, a platform that automatically detects changes in the size of forests.

Paul Bradshaw’s framework for data journalism angles recognises eight types of stories: scale, change, ranking, variation, exploration, exploration, relationships, stories about data and stories through data. The same framework can be adopted to generate ideas for satellite journalism, too.

Continue reading

Journalism, AI and satellite imagery: how to get started

Satellite image of the Amazon. Tocantins, Brazil. Source: Copernicus Sentinel data [2022] processed by Sentinel Hub, using Highlight Optimized Natural Color.

In the first of two guest posts for OJB, first published on ML Satellites, MA Data Journalism student Federico Acosta Rainis explains how to get started with satellite journalism — and avoid common pitfalls.

Working with satellite imagery and AI models takes time and patience. There is no general rule: you have to find the right model for each case, in a process of trial and error, while crunching large amounts of data.

That is why the advice of Anatoly Bondarenko, data editor of Texty, is crucial:

Continue reading

GEN 2019 round-up: 4 videos to watch on the potential of data and AI

Krishna Bharat

This year’s Global Editor’s Network (GEN) Summit, in Athens, Greece, had a big focus on the use of verification and automation. BBC News data scientist and PGCert Data Journalism student Alison Benjamin went along to see what was being said about artificial intelligence (AI), data and technology in the news industry. Here are her highlights…
Continue reading

If we are using AI in journalism we need better guidelines on reporting uncertainty

Chart: women speak 27% of the time in Game of Thrones

The BBC’s chart mentions a margin of error

There’s a story out this week on the BBC website about dialogue and gender in Game of Thrones. It uses data generated by artificial intelligence (AI) — specifically, machine learning —  and it’s a good example of some of the challenges that journalists are increasingly going to face as they come to deal with more and more algorithmically-generated data.

Information and decisions generated by AI are qualitatively different from the sort of data you might find in an official report, but journalists may fall back on treating data as inherently factual.

Here, then, are some of the ways the article dealt with that — and what else we can do as journalists to adapt.

Margins of error: journalism doesn’t like vagueness

The story draws on data from an external organisation, Ceretai, which “uses machine learning to analyse diversity in popular culture.” The organisation claims to have created an algorithm which “has learned to identify the difference between male and female voices in video and provides the speaking time lengths in seconds and percentages per gender.”

Crucially, the piece notes that:

“Like most automatic systems, it doesn’t make the right decision every time. The accuracy of this algorithm is about 85%, so figures could be slightly higher or lower than reported.”

And this is the first problem. Continue reading

GEN Summit: AI’s breakthrough year in publishing

This week’s GEN Summit marked a breakthrough moment for artificial intelligence (AI) in the media industry. The topic dominated the agenda of the first two days of the conference, from Facebook’s Antoine Bordes opening keynote to voice AI, bots, monetisation and verification – and it dominated my timeline too.

At times it felt like being at a conference in the 1980s discussing how ‘computers’ could be used in the newsroom, or listening to people talking about the use of mobile phones for journalism in the noughties — in other words, it feels very much like early days. But important days nonetheless.

Ludovic Blecher‘s slide on the AI-related projects that received Google Digital News Initiative funding illustrated the problem best, with proposals counted in categories as specific as ‘personalisation’ and as vague as ‘hyperlocal’.

Digging deeper, then, here are some of the most concrete points I took away from Lisbon — and what journalists and publishers can take from those.

Continue reading

This is what I learned after teaching chatbots to journalists: 3 takeaways for newsrooms

In a guest post for OJB Maria Crosas points out three main takeaways that newsrooms should consider when aiming for a complete chatbot experience. 

Over the past year I’ve been frequently invited to share ideas around how bots can help newsrooms to deliver news, and advice on how to build an engaging chatbot experiences. And throughout these classes, I’ve also had challenging questions on how these technologies are pushing the boundaries of ethics, artificial intelligence and storytelling.

I’ve boiled down these experiences into 3 takeaways for newsrooms that want to begin the chatbot journey. Here they are…

Continue reading