Yet Another Report Showing 'Anonymous' Data Not At All Anonymous

from the what-privacy dept

"The MIT researchers also looked at whether they could preserve anonymity in large data sets by intentionally making the data less precise, in order to examine whether preserving privacy would still enable useful analysis. But the researchers found that even if the data set was characterised as each purchase having taken place in the span of a week at one of the 150 stores in the same general area, four purchases would still be enough to identify more than 70 percent of users."

"We are showing that the privacy we are told that we have isn't real," study co-author Alex "Sandy" Pentland of MIT said in an email...The study shows that when we think we have privacy when our data is collected, it's really just an "illusion", said Eugene Spafford, director of Purdue University's Centre for Education and Research in Information Assurance and Security. Spafford, who wasn't part of the study, said it makes "one wonder what our expectation of privacy should be anymore."

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community. Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis. While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

As companies expand the amount of data hoovered up via their subscribers, a common refrain to try and ease public worry is that consumers shouldn't worry because this data is "anonymized." However, time and time again studies have highlighted how it's not particularly difficult to tie these data sets to consumer identities -- usually with only the use of a few additional contextual clues. It doesn't really matter whether we're talking about cellular location data, GPS data, taxi data or NSA metadata , the basic fact is these anonymous data sets aren't really anonymous.The latest in a long stream of such studies comes from MIT, where researchers explored (the actual study is paywalled) whether they could glean unique identities from "anonymous" user data using a handful of contextual clues. Studying the purportedly anonymous credit card transactions of 1.1 million users at 10,000 retail locations over a period of three months, the researchers found they could identify 90% of the users' names by using four additional data points like the dates and locations of four purchases. Using three clues, including more specific points like the exact price of a purchase, allowed the identifying of 94% of the consumers. Intentionally trying to make the data points less precise didn't help protect consumer privacy much:Note they're not saying they can ascertain your personal identity from this data alone, but they (or a hacker that nabs this data) can identify you if they have just a smattering of other contextual clues as to who you are. In an age when cellular companies track and sell your daily location down to the minute, and your automobile, insurance companies and toll payment systems are all gathering even more precise data , that's not going to be a particularly difficult task. The gist of the study isn't going to be a shock to most of you: privacy in the modern age -- unless you're willing to go to extreme lengths -- is an illusion.That said, it's very important to remember that we can probably trust that companies rushing head first toward vast new revenue generation opportunities are spending the time and resources necessary to ensure consumer privacy is at the very top of their list of priorities.

Filed Under: anonymous, anonymous data, data