What do journalists do with large amounts of text?

Barbara Maseda is on a John S. Knight Journalism Fellowship project at Stanford University, where she is working on designing text processing solutions for journalists. In a special guest post she explains what she’s found so far — and why she needs your help.

Over the last few months, I have been talking to journalists about their trials and tribulations with textual sources, trying to get as detailed a picture as possible of their processes, namely:

how and in what format they obtain the text,
how they find newsworthy information in the documents,
using what tools,
for what kinds of stories,

…among other details.

What I’ve found so far is fascinating: from tech-savvy reporters who write their own code when they need to analyze a text collection, to old-school investigative journalists convinced that printing and highlighting are the most reliable and effective options — and many shades of approaches in between.

What’s your experience?

If you’ve ever dug a story out of a pile of text, please let me know using this questionnaire. It doesn’t matter if you’ve used more or less sophisticated tools to do it.

Here are a few reasons and incentives to contribute:

1. Help create a public database of text-data-driven stories

Pieces based on the analysis of text collections are not as common as their structured-data-driven counterparts, and are harder to find all in one place.

One of the goals of this survey is to create a database of examples. As with the rest of my work in Stanford University, the data will be publicly available for anyone to use.

Concretely, it will include information about the story:

Title
Media outlet
Date of publication
Author(s)
Link

And details about the production and sources:

Type of document(s) used
Source(s)
Number of sources
Size of the text collection
Type of elements considered
Type of analysis
Tools/methods used
Time needed for production

2. Make sure your work gets included

I regularly check news websites looking for examples to bookmark, but it’s not always obvious whether a story involved text analysis or not, and many times they don’t come with a “how we did it” blog post associated.

On other occasions, it’s hard to find works from many years ago, or stories that are no longer available online. Please, help me find yours.

3. Share your process (and learn from other people’s)

The details about the production of these stories, especially the software and approaches used, could be a valuable reference for beginners as to what skills and tools are more relevant, or for more experienced journalists to compare notes with fellow text-data enthusiasts.

4. Help lower the bar of complicated text processing

One of the goals of my project is to make solutions accessible to reporters who don’t have the time (or desire) to become “text-miner-journalists.”

As I wrote in a previous post, the list of skills for this area of expertise is long, and the training time-consuming. Ultimately, my interest is to find ways to bring the benefits of these techniques to more reporters.

5. No text-driven stories? No problem

Finally, although the questionnaire is only useful if you have an example to share, I’m also interested in hearing about less successful cases.

Did you have to deal with a group of documents that was too complicated to process?

Was there a text or file format that became your nightmare?

I want to hear all about it. Email me, and maybe we can put together a list of interesting challenges.

Disclosure: Barbara is a former student of mine on the MA in Online Journalism (now the MA in Data Journalism). A version of this post first appeared on Text Data Stories.

1 thought on “What do journalists do with large amounts of text?”

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.

What do journalists do with large amounts of text?

What’s your experience?

1. Help create a public database of text-data-driven stories

2. Make sure your work gets included

3. Share your process (and learn from other people’s)

4. Help lower the bar of complicated text processing

5. No text-driven stories? No problem

1 thought on “What do journalists do with large amounts of text?”

Leave a comment Cancel reply

What’s your experience?

1. Help create a public database of text-data-driven stories

2. Make sure your work gets included

3. Share your process (and learn from other people’s)

4. Help lower the bar of complicated text processing

5. No text-driven stories? No problem

Share this:

Related

1 thought on “What do journalists do with large amounts of text?”

Leave a comment Cancel reply