For the past year, Apple has touted a mathematical tool that it describes as a solution to a paradoxical problem: mining user data while simultaneously protecting user privacy. That secret weapon is "differential privacy," a novel field of data science that focuses on carefully adding random noise to an individual user's information before it's uploaded to the cloud. That way, a company such as Apple's total dataset reveals meaningful results without any one person's secrets being spilled.

But differential privacy isn't a simple toggle switch between total privacy and no-holds-barred invasiveness. And a new study, which delves deeply into how Apple actually implements the technique, suggests the company has ratcheted that dial further toward aggressive data-mining than its public promises imply.

Epsilon, Epsilon

Researchers at the University of Southern California, Indiana University, and China's Tsinghua University have dug into the code of Apple's MacOS and iOS operating systems to reverse-engineer just how the company's devices implement differential privacy in practice. They've examined how Apple's software injects random noise into personal information—ranging from emoji usage to your browsing history to HealthKit data to search queries—before your iPhone or MacBook upload that data to Apple's servers.

Ideally, that obfuscation helps protect your private data from any hacker or government agency that accesses Apple's databases, advertisers Apple might someday sell it to, or even Apple's own staff. But differential privacy's effectiveness depends on a variable known as the "privacy loss parameter," or "epsilon," which determines just how much specificity a data collector is willing to sacrifice for the sake of protecting its users' secrets. By taking apart Apple's software to determine the epsilon the company chose, the researchers found that MacOS uploads significantly more specific data than the typical differential privacy researcher might consider private. iOS 10 uploads even more. And perhaps most troubling, according to the study's authors, is that Apple keeps both its code and epsilon values secret, allowing the company to potentially change those critical variables and erode their privacy protections with little oversight.

In response to the study, Apple points out that its data collection is purely opt-in. (Apple prompts users to share "diagnostics and usage" information with the company when its operating systems first load.) And it fundamentally disputes many of the study's findings, including the degree to which Apple can link any specific data to a specific user.

But the study's authors stand by their claims and maintain that Apple oversells its differential privacy protections. "Apple’s privacy loss parameters exceed the levels typically considered acceptable by the differential privacy research community," says USC professor Aleksandra Korolova, a former Google research scientist who worked on Google's own implementation of differential privacy until 2014. She says the dialing down of Apple's privacy protections in iOS in particular represents an "immense increase in risk" compared to the uses most researchers in the field would recommend.

Frank McSherry, one of the inventors of differential privacy and a former Microsoft researcher, puts his interpretation of the study's findings more candidly: "Apple has put some kind of handcuffs on in how they interact with your data," he says. "It just turns out those handcuffs are made out of tissue paper."

'Not a Reassuring Guarantee'

To determine the exact parameters Apple uses to handicap its data mining, the Indiana, USC and Tsinghua researchers spent more than six months digging through the code of MacOS and iOS 10, identifying the specific files Apple assembles, encrypts, and uploads from iPhones and Macs back to its servers once a day. They used the reverse-engineering tool Hopper to pull the code apart and debugging tools to watch it run in real time to see how it functions, step by step.