Since November 2013, researchers at Stanford University have been asking: What’s in your metadata?

Specifically, the study encouraged volunteers who also used Facebook to install an app called MetaPhone on their Android phones. The app was designed to act as a sort of slimmed-down version of the National Security Agency by attempting to gather the same metadata collected by telecom firms, and in turn, intelligence agencies. Volunteers who chose to participate allowed the researchers access to their calling and texting data, the date and time, and the duration of the call.

Since late last year, the team has been releasing interim results from the 546 people that chose to participate. On Wednesday, the team released its latest and most complete findings and was startled by what it found.

“At the outset of this study, we shared the same hypothesis as our computer science colleagues—we thought phone metadata could be very sensitive,” Jonathan Mayer, a graduate student leading the project, wrote on Wednesday.

“We did not anticipate finding much evidence one way or the other, however, since the MetaPhone participant population is small, and participants only provide a few months of phone activity on average. We were wrong. We found that phone metadata is unambiguously sensitive, even in a small population and over a short time window. We were able to infer medical conditions, firearm ownership, and more, using solely phone metadata.”

Mayer explained to Ars by phone that given the small sample size and the study duration of only a few months, the team had originally hypothesized that the information gathered would not be as revealing.

“I think it's very certainly strongly suggestive that a larger pool that spans more time would have remarkably more sensitive information in it,” he added.

The new results provide a strong, research-based analytical counterweight to the government assertion that metadata is somehow less revelatory than capturing actual call data.

A likely abortion?

So what was revealed, precisely? Mayer and his team showed that participants called public numbers of “Alcoholics Anonymous, gun stores, NARAL Pro-Choice, labor unions, divorce lawyers, sexually transmitted disease clinics, a Canadian import pharmacy, strip clubs, and much more.”

The researchers were even surprised that they had real-world results to support a classic nightmare scenario feared by many civil libertarians and privacy activists.

Participant A communicated with multiple local neurology groups, a specialty pharmacy, a rare condition management service, and a hotline for a pharmaceutical used solely to treat relapsing multiple sclerosis. Participant B spoke at length with cardiologists at a major medical center, talked briefly with a medical laboratory, received calls from a pharmacy, and placed short calls to a home reporting hotline for a medical device used to monitor cardiac arrhythmia. Participant C made a number of calls to a firearm store that specializes in the AR semiautomatic rifle platform. They also spoke at length with customer service for a firearm manufacturer that produces an AR line. In a span of three weeks, Participant D contacted a home improvement store, locksmiths, a hydroponics dealer, and a head shop. Participant E had a long, early morning call with her sister. Two days later, she placed a series of calls to the local Planned Parenthood location. She placed brief additional calls two weeks later, and made a final call a month after.

And the most surprising second step was the fact that these privacy researchers decided not to follow up with some of these willing voluntary participants.

“We were able to corroborate Participant B’s medical condition and Participant C’s firearm ownership using public information sources,” the team added. “Owing to the sensitivity of these matters, we elected to not contact Participants A, D, or E for confirmation.”

“Metadata surveillance endangers privacy”

Privacy activists and lawyers immediately lauded the Stanford findings.

Jennifer Granick, the director of civil liberties at the Stanford Center for Internet and Society where Mayer is affiliated, concluded that this study “adds important empirical evidence to support what is now a growing consensus. Metadata surveillance endangers privacy.”

Meanwhile, Brian Pascal, who is a non-resident fellow at the Stanford Center for Internet and Society, told Ars that it’s surprising that even those who knew they were being monitored appeared to not “skew calling habits towards the bland.”

“However, this does not appear to be the case,” he added. “For example, 2 percent of participants called ‘adult establishments,’ knowing that their calling metadata was being recorded. It’s not difficult to imagine that some users, knowing that MetaPhone gathers this information, might change their calling habits. Without a control group, though, it’s impossible to know just how much MetaPhone (or surveillance in general) changes behavior. Admittedly, MetaPhone focuses more on illustrating just how powerful metadata can be, rather than on the impact of surveillance on personal choice, but it’s an interesting implication nonetheless.”

Others drew a clear line between this work and the NSA’s rationale for collect-it-all.

“This just confirms what everyone's intuition suggested—phone metadata is incredibly revealing. It's great to have some empirical evidence to back up that intuition, and it only reinforces the intrusiveness of the NSA's mass collection of Americans' call records.”

“This is striking,” Fred Cate, a law professor at Indiana University, told Ars by e-mail.

“It highlights three key points. First, that the key part of the NSA’s argument—we weren’t collecting sensitive information so what is the bother?—is factually wrong. Second, that the NSA and the [Foreign Intelligence Surveillance Act] Court failed to think this through; after all, it only takes a little common sense to realize that sweeping up all numbers called will inevitably reveal sensitive information. Of course the record of every call made and received is going to implicate privacy. And third, it lays bare the fallacy of the Supreme Court’s mind-numbingly broad wording of the third-party doctrine in an age of big data: just because I reveal data for one purpose—to make a phone call—does not mean that I have no legitimate interest in that information, especially when combined with other data points about me.”