Tag Archives: OCR

Making video and audio interviews searchable: how Pinpoint helped with one investigation

Pinpoint creates a ranking of people, organisations, and locations with the number of times they are mentioned on your uploaded documents.

MA Data Journalism student Tony Jarne spent eight months investigating exempt accommodation, collecting hundreds of documents, audio and video recordings along the way. To manage all this information, he turned to Google’s free tool Pinpoint. In a special guest post for OJB, he explains how it should be an essential part of any journalist’s toolkit.

The use of exempt accommodation — a type of housing for vulnerable people — has rocketed in recent years.

At the end of December, a select committee was set up in Parliament to look into the issue. The select committee opened a deadline, and anyone who wished to do so could submit written evidence.

Organisations, local authorities and citizens submitted more than 125 pieces of written evidence to be taken into account by the committee. Some are only one page — others are 25 pages long.

In addition to the written evidence, I had various reports, news articles, Land Registry titles an company accounts downloaded from Companies House.

I needed a tool to organise all the documentation. I needed Pinpoint

Continue reading

Investigations tool DocumentCloud goes public (PS: documents drive traffic)

The rather lovely DocumentCloud – a tool that allows journalists to share, annotate, connect and organise documents – has finally emerged from its closet and made itself available to public searches.

This means that anyone can now search the powerful database (some tips here) of newsworthy documents. If you want to add your own, however, you still need approval.

If you do end up on this list you’ll find it’s quite a powerful tool, with quick conversion of PDFs into text files, analytic tools and semantic tagging (so you can connect all documents with a particular person, or organisation) among its best features. The site is open source and has an API too.

I asked Program Director Amanda B Hickman what she’s learned on the project so far. Her response suggests that documents have a particular appeal for online readers:

“If we’ve learned anything, it is that people really love documents. It is pretty clear that when there’s something interesting going on in the news, plenty of people want to dig a little deeper. When Arizona Republic posted an annotated version of that state’s new immigration law, it got more traffic than their weekly entertainment round up. WNYC told us that the page listing the indictments in last week’s mob roundup was still getting more traffic than any other single news story even a week later.

“These were big news documents, to be sure, but it still seems pretty clear that people do want to dig deeper and explore the documents behind the news, which is great for us and great for news.”