For well over a decade, identity thieves, phishers, and other online scammers have created a black market of stolen and aggregated consumer data that they used to break into people's accounts, steal their money, or impersonate them. In October, dark web researcher Vinny Troia found one such trove sitting exposed and easily accessible on an unsecured server, comprising 4 terabytes of personal information—about 1.2 billion records in all.

While the collection is impressive for its sheer volume, the data doesn't include sensitive information like passwords, credit card numbers, or Social Security numbers. It does, though, contain profiles of hundreds of millions of people that include home and cell phone numbers, associated social media profiles like Facebook, Twitter, LinkedIn, and Github, work histories seemingly scraped from LinkedIn, almost 50 million unique phone numbers, and 622 million unique email addresses.

"It’s bad that someone had this whole thing wide open," Troia says. "This is the first time I've seen all these social media profiles collected and merged with user profile information into a single database on this scale. From the perspective of an attacker, if the goal is to impersonate people or hijack their accounts, you have names, phone numbers, and associated account URLs. That's a lot of information in one place to get you started."

"What stands out about this incident is the sheer volume of data that’s been collected." Troy Hunt, HaveIBeenPwned

Troia found the server while looking for exposures with fellow security researcher Bob Diachenko on the web scanning services BinaryEdge and Shodan. The IP address for the server simply traced to Google Cloud Services, so Troia doesn't know who amassed the data stored there. He also has no way of knowing if anyone else found and downloaded the data before he did, but notes that the server was easy to find and access. WIRED checked six people's personal email addresses against the data set; four were there and returned accurate profiles. Troia reported the exposure to contacts at the Federal Bureau of Investigation. Within a few hours, he says, someone pulled the server and the exposed data offline. The FBI declined to comment for this story.

Of Unknown Origin

The data Troia discovered seems to be four data sets cobbled together. Three were labeled, perhaps by the server owner, as coming from a data broker based in San Francisco called People Data Labs. PDL claims on its website to have data on over 1.5 billion people for sale, including almost 260 million in the US. It also touts more than a billion personal email addresses, more than 420 million LinkedIn URLs, more than a billion Facebook URLs and IDs, and more than 400 million phone numbers, including more than 200 million valid US cellphone numbers.

PDL cofounder Sean Thorne says that his company doesn't own the server that hosted the exposed data, an assessment Troia agrees with based on his limited visibility. It's also unclear how the records got there in the first place.

“The owner of this server likely used one of our enrichment products, along with a number of other data-enrichment or licensing services," says Sean Thorne, cofounder of People Data Labs. "Once a customer receives data from us, or any other data providers, the data is on their servers and the security is their responsibility. We perform free security audits, consultations, and workshops with the majority of our customers."

Troia thinks it's unlikely that People Data Labs was breached, since it would be simpler to just buy data from the company. An attacker on a budget could also sign up for a free trial that PDL advertises, offering 1,000 consumer profiles per month. "One thousand profiles to 1,000 burner accounts and you've got pretty much all of it," Troia points out.

One of the other data sets is labeled "OXY," and every record in it also contains an "OXY" tag. Troia speculates that this may refer to Wyoming-based data broker Oxydata, which claims to have 4 TB of data, including 380 million profiles on consumers and employees in 85 industries and 195 countries around the world. Martynas Simanauskas, Oxydata director of business-to-business sales, emphasized that Oxydata hasn't suffered a breach and that it does not label its data with an "OXY" tag.