Did The NSA Think The Public Can't Do Math? Attempt To Downplay Data Collection Fails Miserably

from the carry-the-one... dept

Scope and Scale of NSA Collection



According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However, of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world's traffic in conducting their mission -- that's less than one part in a million. Put another way, if a standard basketball court represented the global communications environment, NSA's total collection would be represented by an area smaller than a dime on that basketball court.

The dime on the basketball court, as NSA describes it, is still 29.21 petabytes of data a day. That means NSA is "touching" more data than Google processes every day (a mere 20 petabytes).

As a result, if properly tuned, the packet analyzer gear at the front-end of XKeyscore (and other deep packet inspection systems) can pick out a very small fraction of the actual packets sent over the wire while still extracting a great deal of information (or metadata) about who is sending what to who. This leaves disk space for "full log data" on connections of particular interest.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community. Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis. While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Last week we wrote about the NSA's ridiculous attempt to justify its surveillance efforts, including this really wacky callout designed to show just how "little" data the NSA collects.This was bizarre on a number of levels, not the least of which is the wacky basketball court-to-dime scale. Next time, maybe we can play "is it bigger than a breadbox" with the NSA. But, as for what any of this meant, it hasn't been at all clear. Since the NSA has already redefined basic English words like "collect," "target," "datamine," and "relevant" it's not at all clear what is meant by "touch." However, some are starting to dig into the numbers, and contrary to the NSA's attempt to suggest that this is "nothing to fear," a bit of analysis certainly suggests they're collecting quite a bit of info.First up, we have Jeff Jarvis, who highlights a bunch of important comparative datapoints including that Sandvine claims that only 2.9% of US traffic is communication traffic and 68.8% of all email is spam -- meaning that it's entirely possible that the NSA collects nearly all non-spam email and it would still be within its 1.6% number. He also points out that 62% of traffic on the internet is considered entertainment, and we can assume that the NSA doesn't need to collect every copy ofthat people are passing around (I'm sure one or two will do the job). He similarly points out that Google itself claims to only index approximately 0.004% of traffic on the internet, suggesting that the NSA may be collecting more info than Google indexes by two orders of magnitude.Meanwhile, Sean Gallagher, over at Ars Technica, digs a bit deeper into the numbers, suggesting that the NSA's data collection is closer to being on par with Google , but still greater than Google:Gallagher also looks much more closely at the recently revealed details of the Xkeyscore program, to show how that 1.6% of "touched" internet communications can cover pretty much everything important.In other words, while the 1.6% number was put forth by the NSA to try to make people think this is no big deal, when you look at what it means, it suggests it's a very big deal indeed. In fact, the NSA may be collecting even more information that people had believed before.

Filed Under: data, data collection, internet traffic, nsa, nsa surveillance, scale