By now, everyone knows that databases of personal information are widely available for sale on the Dark Web. But now two security researchers – Bob Diachenko and Vinny Troia – have uncovered an incredibly massive database of more than 1.2 billion people exposed on the Dark Web. The size of this personal data leak far exceeds anything that has been ever available before. The exposed data includes 650 million unique email addresses, 420 million LinkedIn URLs, 1 billion Facebook URLs and IDs, and more than 400 million phone numbers. What’s perhaps even more concerning from a security and privacy perspective is that nobody knows how the personal data leak occurred, or why a server with this massive data trove was left unprotected.

Personal data leaks vs. personal data breaches

Anurag Kahol, CTO of Bitglass, highlights the size and scope of this data leak: “This unsecured database is one for the record books. Impacting 1.2 billion records, it is one of the largest leaks we have ever seen. Names, email addresses, and phone numbers, along with other social media profile information, were left public facing. It is currently unknown who owns this database; however, they will surely face significant repercussions from regulatory bodies as well as the general public. There is no excuse for negligent security practices such as leaving databases exposed.”

While the size of this personal data leak is truly epic, it does not necessarily imply that a data breach at any company actually occurred. It might sound incredible, but much of this information is widely available on the public web, without the need for any logins, passwords, or sophisticated credentials. It appears that a hacker (or group of hackers) simply scraped information from as many different public sources as possible, and then combined everything into a database. For example, the personal data leak included Facebook and LinkedIn URLs, Twitter account information and Github URLs – and all of this is widely available online for anyone willing to scrape all of these sites.

Since the personal data leak included email addresses, names, phone numbers, work histories, social media profiles and other personally identifiable information (PII), the two security researchers started analyzing the type of data that is scraped by legitimate firms known as “data enrichment” firms. These are basically data brokers that buy and sell data that has been scraped from as many different sources as possible. As best as the security researchers could tell and as Troia noted, the trail appears to lead back to two of the most infamous data enrichment firms – People Data Labs (PDL) and Oxydata.io. When the security researches cross-checked the information found in these companies’ databases with the information discovered as part of the personal data leak, there appears to be almost perfect match with the PDL and Oxydata information.

Yet, here’s where things get even stranger –both firms deny that they have been attacked, or that there has been any data breach of any kind. Moreover, since the firm hosting the server – Google Cloud Services – is under no legal obligation to disclose the identity of the owners of the server, the researchers don’t even know who owned the server, if anyone accessed the data, or why the data seems to be an exact match with the datasets owned by the data enrichment firms. The best working hypothesis is that a current or former customer of PDF exposed the data as part of the personal data leak. This might have occurred for criminal, nefarious purposes – or it might simply be the case of a misconfigured server that was simply never “locked down” for security purposes. Needless to say, the two security researchers informed law enforcement, and the server is no longer available or operational.

Javvad Malik, Security Awareness Advocate at KnowBe4, comments on the scope of this personal data leak: “This incident is less of a data leak and more of a full on data tsunami. The biggest challenge when these kinds of repositories are found is that it’s near impossible to accurately identify who the owner is. It could be a company that is legitimately recording data, or a third party tasked with compiling profiles, a researcher, or a criminal.”

Privacy and security implications of the personal data leak

Some might say that this is a lot of hullaballoo about nothing. Or that it is simply a stunt pulled off by two security researchers looking to make a name for themselves. But that would be to overlook the significant privacy and security implications of this new personal data leak.

For one, it raises questions about what any firm is doing with such huge datasets of people. If names, account profiles, and contact information for hundreds of millions of people are widely available for sale online, doesn’t that raise a lot of concern? With all of this enriched profile information, buyers of this data could use it for a range of nefarious purposes, ranging from phishing schemes to identity theft. The information obtained through the personal data leak could be used for business email compromise (BEC) schemes or for credential stuffing attacks. At the end of the day, 1.2 billion people is a huge number, equivalent to more than 1 in 7 people on the planet.

The second concern relates to firms such as PDL or Oxydata. Most people have never heard of these firms, and may not even realize that there is a vast online ecosystem of companies that simply scrape the Internet for as much personal data that they can find – and then turn around and sell it to the highest bidder. This vast “gray” economy is nearly unregulated and unsupervised. If you’re not security researchers digging around on the Dark Web, you’d never even know where to look for these firms.

And a third major concern involves breach notification. Technically, no “data breach” occurred, yet a lot of personal data information was exposed online as part of the personal data leak. Despite the involvement of law enforcement, it looks like nobody is taking responsibility for alerting 1.2 billion that sensitive personal information about them has been leaked. For obvious reasons, nobody is stepping up and admitting guilt, wrongdoing, or lack of care and precaution – and there is no regulator able to step in and compel companies to do so. For example, will a European data privacy regulator step in and get involved, due to the fact that the personal data leak almost assuredly involved personal information of European data subjects?

A dangerous new world of 360-degree profiles

Perhaps the biggest takeaway from this personal data leak is that everyone should realize that “360-degree profiles” now exist on them online, and that these profiles are widely traded, swapped and sold for profit by data brokers. Information from many different sources are being scraped and then stitched together, almost as if a worldwide surveillance network were being put into place to keep watch over them.

Tim Erlin, VP of product management and strategy at Tripwire, comments on the implications of this personal data leak: “We often worry about the exposure of sensitive data, but in this connected world, it’s the connections that matter most. Personal data that isn’t exactly secret, and might even be public, takes on new meaning when collected and connected. Repositories like these are concerning, not only because of the data they contain, but because as an industry we don’t really have a way to measure the impact of this type of exposure.”

Information exposed in massive personal data leak of 1.2 billion people could be scraped from many different public sources on the web. #respectdata Click to Tweet

Perhaps the only silver lining in this very dark cloud is that no payment card information seems to have been included in the massive dataset of 1.2 billion people and nearly 4 billion records. Going forward, the impetus will be on governments around the world to crack down on data enrichment companies, as well as the vast online data ecosystem that they mine daily for their personal profit.