All the Chilcot Iraq Inquiry report documents structured by entities and dates

When you’re dealing with documents amounting to 2.6million words spread across over 50 PDFs, you need to do more than just be able to press the CTRL and F keys together.

And yet political journalists across the country will be relying on just that to report on the Chilcot Report into the UK’s involvement in the Iraq war (also known as the Iraq Inquiry) this week.

I’ve uploaded all the PDFs to the document analysis service DocumentCloud. You can find them on the site here. You’ll need a DocumentCloud account to see it, but if you haven’t got an account you can also search all 55 documents at the same time in an embedded search I’ve created over on HelpMeInvestigate.

Entities

One of the advantages of using a service like DocumentCloud is ‘entity analysis’. This basically goes through the documents and identifies entities such as people, places, organisations and ‘terms’ (for example: ‘chemical warfare’), treating each type of entity separately and creating a little histogram showing where those entities are mentioned in the document.

To view the documents in this way, you just need to click the ‘Analyze’ button in DocumentCloud and choose the view you want:

Analyze buttons: view entites or timeline

Click the Analyze button to see the documents by timeline or entities

‘View Entities’ gives you a view like the one shown below:

chilcot report on document cloud

In the entity analysis view you can see that the Ministry of Health is mentioned a lot towards the end of this document

If you hover over any of those little bars you should see a popup showing the context within which the entity is mentioned…

entity popup

Hovering over this bar shows the text surrounding the location identified

And you can click to see the raw text in full:

selly oak hospital in context

If you choose the Analyze Timeline option DocumentCloud will show you a timeline of events it has identified in the selected documents. This allows you to spot outliers (such as the earliest events in the narrative), clusters, or to zoom into a particular key period.

documentcloud timeline

You can click and drag to zoom in. Again by hovering over any point you will see a preview of the context within which a date is mentioned, and can click on that to see the original text in full.

documentcloud zoom timeline

Those are just some of the basic ways in which DocumentCloud makes interrogating documents much quicker. You can also use Overview to analyse it in other ways, but that’s another story…

overview chilcot

How the Chilcot report looks in Overview

Advertisements

2 thoughts on “All the Chilcot Iraq Inquiry report documents structured by entities and dates

  1. Pingback: I documenti del rapporto Chilcot ordinati per entità – hookii

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s