So Google scans email for dodgy images – should we be worried about scanning for sensitive documents?

You could be forgiven for not having heard of John Henry Skillern. The 41 year old is facing charges of possession and promotion of child pornography after Google detected images of child abuse on his Gmail account.

Because of his case we now know that Google “proactively scours hundreds of millions of email accounts” for certain images. The technology has raised some privacy concerns which have been largely brushed aside because, well, it’s child pornography.

Sky’s technology correspondent Tom Cheshire, for example, doesn’t think it is an invasion of our privacy for “technical and moral reasons”. But should journalists be worried about the wider applications of the technology, and the precedent being set?

Automated matching

Part of Cheshire’s technical argument against the software representing an invasion of privacy is that it is almost entirely automated. As The Telegraph reported:

“It is understood that the software works by comparing images held in users’ accounts against a vast database of child abuse images which have been collated by child protection agencies around the world.

“Each one of the images is given a unique fingerprint, known as a hash, which is then used to compare with those held in the database.”

When a match is found, humans come into the process: “Trained specialists at organisations examine the image and decide whether to alert the police.”

But it’s not too big a leap of the imagination to see the same technology being used to spot documents held in users’ accounts against a database of documents the authorities don’t want made public (on the basis of ‘national security’). Or even images the police don’t want distributed.

And if that technology was employed, it is much less likely that its use would be made public in a court case in the same way as Skillern’s.

This ‘feature creep’ has been seen before in both technologies and laws. The Regulation of Investigatory Powers Act (RIPA), for example, was intended to allow surveillance related to terrorism or serious crime, but authorities used it for purposes including “spying on garden centres for selling pot plants; snooping on staff for using work showers or monitoring shops for unlicensed parrots.”

Who controls the database controls what gets flagged

In the description given above Google is entirely reliant on whoever compiles the database, and whoever they pass the images onto.

However noble the stated purpose, this is state surveillance, with the notable quirk that those conducting the surveillance are ‘blind’.

As Cheshire reports: “No humans are looking at images, which would be illegal. Nor does Google store child abuse images itself, which would also be illegal.”

So if a government whistleblower was trying to share documents their employers could be notified without anyone else knowing.

If a journalist passed on sensitive documents to a colleague a ‘red flag’ would be raised in a government office.

Where protestors shared images of police brutality, that image could be used to identify all of the recipients, including any reporters.

More: Why every journalist should have a threat model (with cats)

Google says it is not looking for other crimes at the moment, but it’s safe to say any extension of the technology, if introduced, would be operating without users knowing for some time.

On that basis journalists should assume that documents and images cannot be safely shared using Gmail – our account or any source’s.

Encryption, suggested by Cheshire, is not going to be a practical option for most sources. At the very least we should switch to a different email service ourselves and recommend that documents are shared using old fashioned post.

In the meantime, we need to talk about the oversight for systems of mass warrantless surveillance and the implications that such systems have for freedom of speech.

Google may be a commercial organisation, but in these situations it is acting as an agent of the state, and should be subject to the same checks and balances.

Comments

@paulbradshaw You should listen to the Daily Tech News Show if you don’t already; @acedtect covered this well in a couple of shows

— Steadman (@iamsteadman) August 7, 2014

@p3k @paulbradshaw or, as i’ve learned in a workshop with @MacLemon only recently, use other ways to communicate, e.g. jabber (?)

— Regula Troxler (@regulatroxler) August 7, 2014

@regulatroxler @paulbradshaw @MacLemon sure, a secure jabber channel should do the job as well.

— piefke 3000 (@p3k) August 7, 2014

@p3k @paulbradshaw @MacLemon as a beginner in the field of encrypting, i found using jabber easier than pgb … 1/2

— Regula Troxler (@regulatroxler) August 7, 2014

@p3k @regulatroxler @paulbradshaw Securing Jabber is much more approachable and less error prone. If you have questions, I’m happy to help.

— MacLemon (@MacLemon) August 7, 2014

@paulbradshaw G must be accessing sensitive data (consent dubious in t&cs) so we should be concerned. They r private NSA with own interests

— Nicola Avery (@NicolaAvery) August 7, 2014

5 thoughts on “So Google scans email for dodgy images – should we be worried about scanning for sensitive documents?”

Tilman August 7, 2014 at 12:13 pm

I guess you can stop using Dropbox then as well. In this article from March 2014, Techcrunch explains how copyrighted stuff is being detected from being passend on via dropbox “without actually looking at your stuff” – goes the same way though: http://techcrunch.com/2014/03/30/how-dropbox-knows-when-youre-sharing-copyrighted-stuff-without-actually-looking-at-your-stuff/

Reply ↓
1. Paul Bradshaw Post authorAugust 7, 2014 at 12:17 pm
  
  Yes, not to mention Condoleezza Rice’s position on the board!
  
  Reply ↓
Pingback: Why every journalist should have a threat model (with cats) | Online Journalism Blog
Pingback: Miriah Ludtke | Journalistic Integreity in a Digital World
Pingback: What you read most on the Online Journalism Blog in 2014 | Online Journalism Blog

Online Journalism Blog

Comment, analysis and links covering online journalism and online news, citizen journalism, blogging, vlogging, photoblogging, podcasts, vodcasts, interactive storytelling, publishing, Computer Assisted Reporting, User Generated Content, searching and all things internet.

So Google scans email for dodgy images – should we be worried about scanning for sensitive documents?

Automated matching

Who controls the database controls what gets flagged

More: Why every journalist should have a threat model (with cats)

Comments

5 thoughts on “So Google scans email for dodgy images – should we be worried about scanning for sensitive documents?”

Leave a comment Cancel reply

Automated matching

Who controls the database controls what gets flagged

More: Why every journalist should have a threat model (with cats)

Comments

Share this:

Related

5 thoughts on “So Google scans email for dodgy images – should we be worried about scanning for sensitive documents?”

Leave a comment Cancel reply