In a guest post for OJB, Barbara Maseda looks at how the media has used text-as-data to cover State of the Union addresses over the last decade.
State of the Union (SOTU) addresses are amply covered by the media —from traditional news reports and full transcripts, to summaries and highlights. But like other events involving speeches, SOTU addresses are also analyzable using natural language processing (NLP) techniques to identify and extract newsworthy patterns.
Every year, a new speech is added to this small collection of texts, which some newsrooms process to add a fresh angle to the avalanche of coverage.
Barbara Maseda is on a John S. Knight Journalism Fellowship project at Stanford University, where she is working on designing text processing solutions for journalists. In a special guest post she explains what she’s found so far — and why she needs your help.
Over the last few months, I have been talking to journalists about their trials and tribulations with textual sources, trying to get as detailed a picture as possible of their processes, namely:
how and in what format they obtain the text,
how they find newsworthy information in the documents,
using what tools,
for what kinds of stories,
…among other details.
What I’ve found so far is fascinating: from tech-savvy reporters who write their own code when they need to analyze a text collection, to old-school investigative journalists convinced that printing and highlighting are the most reliable and effective options — and many shades of approaches in between.
What’s your experience?
If you’ve ever dug a story out of a pile of text, please let me know using this questionnaire. It doesn’t matter if you’ve used more or less sophisticated tools to do it.
Bombings in Barcelona in 1938 (Image by Italian Airforce under CC)
In a guest post for OJB, Carla Pedret looks at a new data journalism project to catalogue what happened during the Spanish Civil War.
125,000 people died, disappeared or were repressed in the Spanish Civil War (1936-1939) and during the Franco dictatorship, according to historians. Many of their families still do not know, 40 years later, what exactly happened to them.
It’s always difficult to get publishers to agree to things like this, so if you have any comments or feedback that I can use to make a similar case to publishers in future, please let me know in the comments.
The latest in my series of FAQ posts comes from a current MA Online Journalism student, who is writing an article for a German publication.
How has the use of user-generated content from social media changed over the last years in the UK?
The use of UGC from social media has changed enormously in the UK in the last decade. Obviously many of the platforms didn’t even exist a decade ago, so we’ve moved from quoting emails to taking screenshots, to a situation now where it’s common to embed live social media content which users can interact with from the article itself – whether that’s to share, like, follow, or respond. Continue reading →
The latest in my series of FAQ posts comes from the National University of Sciences & Technology (NUST) in Pakistan. As always, I’m publishing my answers to their questions here in case it’s of use to anyone else.
Q. What would you say to convince journalists — especially journalists working in developing countries where even the acquisition of public records is often a tedious task — about the importance of data journalism?
If you believe that journalism has a duty to be factual, accurate, and to engage an audience in subjects which have a clear public and civic importance, then data journalism is going to be very important to your work. Continue reading →
Because he sends me an email every December, Nic Newmanhas a tag all of his own on this blog. So as this year’s email lands in my inbox here’s my annual reply around what I’ve noticed in the last 12 months — along with some inevitably doomed predictions of what might happen in the next year…
Surprising in 2017: horizontal storytelling and Facebook disappointments