In a guest post for OJB, Barbara Maseda looks at how the media has used text-as-data to cover State of the Union addresses over the last decade.
State of the Union (SOTU) addresses are amply covered by the media —from traditional news reports and full transcripts, to summaries and highlights. But like other events involving speeches, SOTU addresses are also analyzable using natural language processing (NLP) techniques to identify and extract newsworthy patterns.
Every year, a new speech is added to this small collection of texts, which some newsrooms process to add a fresh angle to the avalanche of coverage.
Barbara Maseda is on a John S. Knight Journalism Fellowship project at Stanford University, where she is working on designing text processing solutions for journalists. In a special guest post she explains what she’s found so far — and why she needs your help.
Over the last few months, I have been talking to journalists about their trials and tribulations with textual sources, trying to get as detailed a picture as possible of their processes, namely:
how and in what format they obtain the text,
how they find newsworthy information in the documents,
using what tools,
for what kinds of stories,
…among other details.
What I’ve found so far is fascinating: from tech-savvy reporters who write their own code when they need to analyze a text collection, to old-school investigative journalists convinced that printing and highlighting are the most reliable and effective options — and many shades of approaches in between.
What’s your experience?
If you’ve ever dug a story out of a pile of text, please let me know using this questionnaire. It doesn’t matter if you’ve used more or less sophisticated tools to do it.
It’s quite common when working with Google Sheets to have data set to US format (Month-Day-Year) without realising it. This is because Google will format your dates based on what ‘locale’ or language you have set – and the default is US English.
Instructions on how to change that are here – but what if it’s too late? What if you’ve already inputted or imported data which, when updated to a different format, will make it the wrong date?Continue reading →