The mass collection of telephone records by government surveillance programs poses a clear threat to the personal privacy of ordinary citizens, according to US researchers who used basic phone logs to identify people and uncover confidential information about their lives.

Armed with anonymous “metadata” on people’s calls and texts, but not the contents of the communications, two scientists at Stanford University worked out individuals’ names, where they lived and the names of their partners. But that was not all.



The same data led them to uncover potentially sensitive information about some individuals. One man was found to own a rifle, while another had recently been diagnosed with an irregular heartbeat. Other data pointed to a new pregnancy, a person with multiple sclerosis, and an individual who was gearing up to grow cannabis.



The results highlight the extraordinary power of telephone metadata – the number called, when, and for how long – particularly when it is paired with public information available from services such as Google, Yelp and Facebook. The value of the data, which is not subject to the same legal protections as the content of people’s communications, has long been recognised by the security services. As Stewart Baker, the former general counsel at the US National Security Agency put it in the aftermath of Edward Snowden’s revelations: “Metadata absolutely tells you everything about somebody’s life.”



Patrick Mutchler, a computer security researcher at Stanford, said that while the power of metadata was understood by those gathering the information, the public was largely in the dark because so few published studies have revealed how rich the data are. “That makes it difficult for people with strong opinions about these programs to fight them. Now we have hard evidence we can point to that didn’t exist in the past,” he said.



For the study, the researchers signed up 823 people who agreed to have metadata collected from their phones through an Android app. The app also received information from their Facebook accounts, which the scientists used to check the accuracy of their results. In all, the researchers gathered metadata on more than 250,000 calls and over 1.2m texts.



Analysts who logged into the NSA’s metadata gathering system were initially allowed to examine data up to three hops away from an individual. A call from the target individual’s phone to another number was one hop. From that phone to another was two hops. And so on. The records available to analysts stretched back for five years. The collection window has now been restricted to two hops and 18 months at most.



The Stanford study found that armed with one phone number to start from, the NSA program would initially have given analysts access to telephone metadata for tens of millions of people. Once restrictions came into place, that number fell dramatically, but it still meant that armed with a single phone number, an NSA analyst could retrieve metadata on 25,000 people.



Writing in the journal Proceedings of the National Academy of Sciences, Mutchler describes how on a shoestring budget, he and fellow graduate student, Jonathan Mayer, uncovered a wealth of personal information, some of it sensitive, about people who took part in the study. Through automatic and manual searches, they identified 82% of people’s names. The same technique gave them the names of businesses the people had called. When these were plotted on a map, they revealed clusters of local businesses, which the scientists speculated surrounded the person’s home address. In this way, they named the city people lived in 57% of the time, and were nearly 90% accurate in placing people within 50 miles of their home. Mutchler believes some of the misses came from people not updating their Facebook page when they moved out of their parents’ home, for example, to go to college.



The scientists next delved into more personal territory. Using a simple computer program to analyse people’s call patterns, they inferred who among the study volunteers was in a relationship. Once they knew the owner of a particular number had a partner, identifying the significant other was trivial, they report.



For the final part of the study, the researchers delved even deeper, to see what sensitive information they could glean from telephone metadata. They gathered details on calls made to and from a list of organisations, including hospitals, pharmacies, religious groups, legal services, firearms retailers and repair firms, marijuana dispensaries, and sex establishments. From these, they pieced together some extraordinary vignettes from people’s lives.



The metadata from one person in the study showed they had a long call from a cardiology centre; spoke briefly with a medical laboratory; answered a number of short calls from a local pharmacy, and then made calls to a hotline for abnormal heart-rate monitoring devices. Another participant made frequent calls to a local gun supplier that specialised in semi-automatic rifles, and later placed a number of long calls to the customer support hotline run by a major gun manufacturer that produced the rifles. Another still placed calls to a hardware store, a locksmiths, a hydroponics supplier and a head shop in the space of three weeks. The metadata from two others suggested one had multiple sclerosis and the other had just become pregnant.



“All of this should be taken as an indication of what is possible with two graduate students and limited resources,” said Mutchler, who argues that the findings should make policymakers think twice before authorising mass surveillance programs. “Large-scale metadata surveillance programs, like the NSA’s, will necessarily expose highly confidential information about ordinary citizens,” the scientists write, adding: “To strike an appropriate balance between national security and civil liberties, future policymaking must be informed by input from relevant sciences.”



Ross Anderson, professor of security engineering at Cambridge University, said the study provided numbers that discussions can now be based on. “With the right analytics running over nation-scale comms data you can infer huge amounts of sensitive information on everyone. We always suspected that of course, but here’s the data.”