Barbara Maseda is on a John S. Knight Journalism Fellowship project at Stanford University, where she is working on designing text processing solutions for journalists. In a special guest post she explains what she’s found so far — and why she needs your help.
Over the last few months, I have been talking to journalists about their trials and tribulations with textual sources, trying to get as detailed a picture as possible of their processes, namely:
- how and in what format they obtain the text,
- how they find newsworthy information in the documents,
- using what tools,
- for what kinds of stories,
…among other details.
What I’ve found so far is fascinating: from tech-savvy reporters who write their own code when they need to analyze a text collection, to old-school investigative journalists convinced that printing and highlighting are the most reliable and effective options — and many shades of approaches in between.
What’s your experience?
If you’ve ever dug a story out of a pile of text, please let me know using this questionnaire. It doesn’t matter if you’ve used more or less sophisticated tools to do it.
Here are a few reasons and incentives to contribute:
1. Help create a public database of text-data-driven stories
Pieces based on the analysis of text collections are not as common as their structured-data-driven counterparts, and are harder to find all in one place.
One of the goals of this survey is to create a database of examples. As with the rest of my work in Stanford University, the data will be publicly available for anyone to use.
Concretely, it will include information about the story:
- Media outlet
- Date of publication
And details about the production and sources:
- Type of document(s) used
- Number of sources
- Size of the text collection
- Type of elements considered
- Type of analysis
- Tools/methods used
- Time needed for production
2. Make sure your work gets included
I regularly check news websites looking for examples to bookmark, but it’s not always obvious whether a story involved text analysis or not, and many times they don’t come with a “how we did it” blog post associated.
On other occasions, it’s hard to find works from many years ago, or stories that are no longer available online. Please, help me find yours.
3. Share your process (and learn from other people’s)
The details about the production of these stories, especially the software and approaches used, could be a valuable reference for beginners as to what skills and tools are more relevant, or for more experienced journalists to compare notes with fellow text-data enthusiasts.
4. Help lower the bar of complicated text processing
One of the goals of my project is to make solutions accessible to reporters who don’t have the time (or desire) to become “text-miner-journalists.”
As I wrote in a previous post, the list of skills for this area of expertise is long, and the training time-consuming. Ultimately, my interest is to find ways to bring the benefits of these techniques to more reporters.
5. No text-driven stories? No problem
Finally, although the questionnaire is only useful if you have an example to share, I’m also interested in hearing about less successful cases.
Did you have to deal with a group of documents that was too complicated to process?
Was there a text or file format that became your nightmare?
I want to hear all about it. Email me, and maybe we can put together a list of interesting challenges.
Disclosure: Barbara is a former student of mine on the MA in Online Journalism (now the MA in Data Journalism). A version of this post first appeared on Text Data Stories.