Last week, a man in Texas was arrested by the police for sending child porn to a friend email. This isn’t something we’d usually report on, except in this case the man was arrested due to a direct tip-off from Google, which caught the child porn moving across its Gmail servers and dutifully alerted the police. This raises a rather thorny question: How did Google scan the man’s inbox for child porn? And more importantly, does this mean Google is scanning everyone’s inbox for child porn and other illegal materials? Is there some gross invasion of privacy going on here?

First, the good news: According to Google, which gave a statement to the AFP after the arrest of the Houston man, “we only use this technology to identify child sexual abuse imagery.” Google even goes on to clarify what it doesn’t scan: “[We don’t scan] other email content that could be associated with criminal activity (for example using email to plot a burglary).”

Furthermore, Google doesn’t have some crack team of child porn investigators that manually searches through some 400 million Gmail inboxes. Instead, Google employs and automated system that checks the cryptographic hash (think of it as a digital fingerprint) of every attachment that traverses its servers. Exact technical details of Google’s automated system aren’t known, but it almost certainly works in the same way as Dropbox’s automated copyright/piracy prevention system. Basically, Google maintains a database of known indecent images of children — and then compares the hash/fingerprint every attachment you send against that database. If there’s a match, presumably a human at Google double-checks the result and then notifies the relevant authorities.

Such approaches aren’t unusual, too. Since 2009, Microsoft has been developing a system called Photo DNA that can automatically identify child porn. Microsoft has since donated PhotoDNA to the National Center for Missing & Exploited Children (NCMEC), and along with Microsoft’s own OneDrive and Bing, it’s also used by Facebook and Twitter. The same kind of digital fingerprinting also exists for video, and other forms of media. In all cases, though, these systems can only detect files that have already been tagged as child porn; it won’t pick up a new file that hasn’t been seen before. These databases — maintained by Google, Microsoft, NCMEC, the authorities, and others — contain upwards of a hundred million examples of known child porn images and videos — but they’re no good at preventing people from creating and disseminating new stuff.

While no one is claiming that automated detection of child porn is a bad thing, it does raise some interesting questions. As Google explicitly points out, other criminal activity via Gmail is ignored. Why should Google prevent the distribution of child porn, but not other crimes? If Google detects email correspondence between two would-be terrorists, should it intervene? What about two thieves discussing plans to burglarize someone’s home? Or two guys planning to rape a girl at next week’s house party?

Google has previously admitted that it does scan your Gmail inbox to display relevant ads, so it’s clearly capable of detecting potentially criminal activity. Likewise, Google (and Microsoft and other big web companies) could easily keep track of people searching for child porn or bomb-making guides. Child porn is probably only the beginning.