The latest Edward Snowden bombshell that the National Security Agency has been hacking foreign Google and Yahoo data centers is particularly disturbing. Plenty has been written about it so I normally wouldn’t comment except that the general press has, I think, too shallow an understanding of the technology involved. The hack is even more insidious than they know.

The superficial story is in the NSA slide (above) that you’ve probably seen already. The major point being that somehow the NSA -- probably through the GCHQ in Britain -- is grabbing virtually all Google non-spider web traffic from the Google Front End Servers, because that’s where the SSL encryption is decoded.

Yahoo has no such encryption.

The major point being missed, I think, by the general press is how the Google File System and Yahoo’s Hadoop Distributed File System play into this story. Both of these Big Data file systems are functionally similar. Google refers to its data as being in chunks while Hadoop refers to blocks of data, but they are really similar -- large flat databases that are replicated and continuously updated in many locations across the application and across the globe so the exact same data can be searched more or less locally from anywhere on Earth, maintaining at all costs what’s called data coherency.

Data replication, which is there for reasons of both performance and fault tolerance, means that when the GCHQ in London is accessing the Google data center there, they have access to all Google data, not just Google’s UK data or Google’s European data. All Google data for all users no matter where they are is reachable through any Google data center anywhere, thanks to the Google File System.

This knocks a huge hole in the legal safe harbor the NSA has been relying on in its use of data acquired overseas, which assumes that overseas data primarily concerns non-US citizens who aren’t protected by US privacy laws or the FISA Court. The artifice is that by GCHQ grabbing data for the NSA and the NSA presumably grabbing data for GCHQ, both agencies can comply with domestic laws and technically aren’t spying on their own citizens when in fact that’s exactly what they have been doing.

Throw Mama from the train.

If Google’s London data center holds not just European information but a complete copy of all Google data then the legal assumption of foreign origin equals foreign data falls apart and the NSA can’t legally gather data in this manner, at least if we’re supposed to believe the two FISA court rulings to this effect that have been released.

This safe harbor I refer to, by the way, isn’t the US-EU safe harbor for commercial data sharing referred to in other stories. That’s a nightmare, too, but I’m strictly writing here about the NSA’s own shaky legal structure:

According the the Foreign Intelligence Surveillance Court of Review: …the Director of National Intelligence (DNI) and the Attorney General (AG) were permitted to authorize, for periods of up to one year, “the acquisition of foreign intelligence information concerning persons reasonably believed to be outside the United States” if they determined that the acquisition met five specified criteria. Id. These criteria included (i) that reasonable procedures were in place to ensure that the targeted person was reasonably .believed to .be located outside the United States; ( ii) that the acquisitions did not constitute electronic surveillance; 2 (iii) that the surveillance would involve the assistance of a communications service provider [I hate to jump in here, but Google says they didn't know about the data being taken, so can this assistance be unknowing or unwilling? -- Bob]; (iv) that a significant purpose of the surveillance was to obtain foreign intelligence information; and (v) that minimization procedures in place met the requirements of 50 U.S.C. § 1801(h)

This is a huge point of law missed by the general news reports -- a point so significant and obvious that it ought to lead to immediate suspension of the program and destruction of all acquired data… but it probably won’t.

That probably won’t happen because Congress seems hell-bent on quickly passing an intelligence reform bill that not only doesn’t prohibit these illegal activities, the bill seems to give them a legal basis they didn’t have before.

Some kind of reform, eh?

This news also blows a hole in the argument that these agencies are gathering data mainly so they’ll be able to retrospectively analyze after the next terrorist attack as was done right after the Boston Marathon bombings. If we already have after-the-fact access to historical data through this hack, why bother even gathering it before?

The other part of this story that’s being under-reported I think is exactly how the GCHQ is gaining access to Google and Yahoo data? A cynical friend of mine guesses it is happening this way:

"The NSA probably has a Hadoop system set up and linked to Google’s. All data that goes onto Google’s network is automatically replicated on the NSA system. Heck that Hadoop system is probably sitting in Google’s data center. You don’t need to move the data. You just need to access a copy of it. It would not surprise me if this is being done with Google, Yahoo, Microsoft, Facebook, Twitter, … The government has probably paid each of them big bucks to set up, support, and manage a replica of their data in their own data centers".

I think my friend is wrong because I can’t see either Google or Yahoo being stupid enough to help such a process occur. The associated revenue isn’t enough to be worth it for either company.

GCHQ could get the data from a network contractor like BT. Or they could do it themselves by physically tapping the fibers. There is a technique where if you bend individual fibers into a tight loop (tighter than the reflected angle) some light escapes the fiber and can be harvested with a detector and the unencrypted data read. All it looks like to the network is a slight signal attenuation. But given that cable bundles hold at least 148 fibers each, such physical extraction would require a unit the size of a refrigerator installed somewhere.

I doubt that the NSA and GCHQ are grabbing the signals from cables between data centers. Rather they are probably grabbing the signals from cables within the data centers -- still unencrypted despite Google’s recently expanded encryption system. I’d bet money on that. These data centers tend to be leased buildings and I’m sure some royal is the beneficial owner of the UK facility and has access to the physical plant…

But this is all just speculation and will probably have to remain so as both governments do all they can to rein-in public debate.

My concern is also with what happens down the road. A lot of this more aggressive NSA behavior came in with the Patriot Act and has become part of the agency’s DNA, raising the floor for questionable practices. So there is less a question of what wrong will they do with this capability than in what direction and how far will they extend future transgressions?

This GCHQ business also feels to me like it may have come from the Brits and simply fallen in the lap of the NSA. If not, then why would they be simultaneously fighting the FISA court for the same information from domestic sources?