By this point, you've hopefully gotten the message that your personal data can end up exposed in all sorts of unexpected internet backwaters. But increased awareness hasn't slowed the problem. In fact, it's only grown bigger—and more confounding.

Last week, security researchers Bob Diachenko and Vinny Troia discovered an unprotected, publicly accessible MongoDB database containing 150 gigabytes of detailed, plaintext marketing data—including 763 million unique email addresses. The pair are going public with their findings today. The trove is not only massive but also unusual; it contains data about individual consumers as well as what appears to be "business intelligence data," like employee and revenue figures from various companies. This diversity may stem from the information's source. The database, owned by the "email validation" firm Verifications.io, was taken offline the same day Diachenko reported it to the company.

While you've likely never heard of them, validators play a crucial role in the email marketing industry. They don't send out marketing emails on their own behalf, or facilitate automated mass email campaigns. Instead, they vet a customer's mailing list to ensure that the email addresses in it are valid and won't bounce back. Some email marketing firms offer this mechanism in-house. But fully verifying that an email address works involves sending a message to the address and confirming that it was delivered—essentially spamming people. That means evading protections of internet service providers and platforms like Gmail. (There are less invasive ways to validate email addresses, but they have a tradeoff of false positives.) Mainstream email marketing firms often outsource this work rather than take on the risk of having their infrastructure blacklisted by spam filters, or lowering their online reputation scores.

"Companies have email lists and want to start emailing them, but they’re not sure how valid they are," says Troia, who founded the firm Night Lion Security. "So they go to a company that will essentially send out spam." Troia speculates, but has not confirmed, that the database may be so large and varied because it comprises all of Verification.io's customers' data. WIRED was unable over the course of several days to contact the company or CEO Vlad Strelkov. On Monday, the entire Verifications.io website went offline and has not been restored since.

Record Setter

In general, the 809 million total records in the Verifications.io trove include standard information like names, email addresses, phone numbers, and physical addresses. But many also include things like gender, date of birth, personal mortgage amount, interest rate, Facebook, LinkedIn, and Instagram accounts associated with email addresses, and characterizations of people's credit scores (like average, above average, and so on). Meanwhile, other records in the collection seem related to generating sales leads at businesses, including company names, annual revenue figures, fax numbers, company websites, and industry identifiers for categorizing companies called "SIC" and "NAIC" codes.

The data doesn't contain Social Security numbers or credit card numbers, and the only passwords in the database are for Verifications.io's own infrastructure. Overall, most of the data is publicly available from various sources, but when criminals can get their hands on troves of aggregated data, it makes it much easier for them to run new social engineering scams, or expand their target pool.

"This is just another case where someone has my data, and hundreds of millions of other people’s data, and I’ve absolutely no idea how they got it." Security Researcher Troy Hunt

In the exposed database, the researchers also found some of what appear to be Verifications.io’s own internal tools like test email accounts, hundreds of SMTP (email sending) servers, the text of emails, anti-spam evasion infrastructure, keywords to avoid, and IP addresses to blacklist. Diachenko suggests that in the Verifications.io work flow, customers would upload an Excel spreadsheet listing the email addresses to validate, and then Verifications.io would run their tests and return lists of clean addresses and ones that bounced back. It's possible, given the piecemeal nature of the data and evidence that it was imported from numerous different Excel files, that Verifications.io also retained some or all of the data it received from customers after concluding its email address checks.