When hackers breached companies like Dropbox and LinkedIn in recent years—stealing 71 million and 117 million passwords, respectively—they at least had the decency to exploit those stolen credentials in secret, or sell them for thousands of dollars on the dark web. Now, it seems, someone has cobbled together those breached databases and many more into a gargantuan, unprecedented collection of 2.2 billion unique usernames and associated passwords and is freely distributing them on hacker forums and torrents, throwing out the private data of a significant fraction of humanity like last year's phone book.

Earlier this month, security researcher Troy Hunt identified the first tranche of that mega-dump, named Collection #1 by its anonymous creator, a patched-together set of breached databases Hunt said represented 773 million unique usernames and passwords. Now other researchers have obtained and analyzed an additional vast database called Collections #2–5, which amounts to 845 gigabytes of stolen data and 25 billion records in all. After accounting for duplicates, analysts at the Hasso Plattner Institute in Potsdam, Germany, found that the total haul represents close to three times the Collection #1 batch.

"This is the biggest collection of breaches we’ve ever seen," says Chris Rouland, a cybersecurity researcher and founder of the IoT security firm Phosphorus.io, who pulled Collections #1–5 in recent days from torrented files. He says the collection has already circulated widely among the hacker underground: He could see that the tracker file he downloaded was being "seeded" by more than 130 people who possessed the data dump, and that it had already been downloaded more than 1,000 times. "It's an unprecedented amount of information and credentials that will eventually get out into the public domain," Rouland says.

Size Over Substance

Despite its unthinkable size, which was first reported by the German news site Heise.de, most of the stolen data appears to come from previous thefts, like the breaches of Yahoo, LinkedIn, and Dropbox. WIRED examined a sample of the data and confirmed that the credentials are indeed valid, but mostly represent passwords from years-old leaks.

But the leak is still significant for its quantity of privacy violation, if not its quality. WIRED asked Rouland to search for more than a dozen people's email addresses; all but a couple turned up at least one password they had used for an online service that had been hacked in recent years.

"For the internet as a whole, this is still very impactful." Chris Rouland,

As another measure of the data's importance, Hasso Plattner Institute's researchers found that 750 million of the credentials weren't previously included in their database of leaked usernames and passwords, Info Leak Checker, and that 611 million of the credentials in Collections #2–5 weren't included in the Collection #1 data. Hasso Plattner Institute researcher David Jaeger suggests that some parts of the collection may come from the automated hacking of smaller, obscure websites to steal their password databases, which means that a significant fraction of the passwords are being leaked for the first time.

The sheer size of the collection also means it could offer a powerful tool for unskilled hackers to simply try previously leaked usernames and passwords on any public internet site in the hopes that people have reused passwords—a technique known as credential stuffing. "For the internet as a whole, this is still very impactful," Rouland says.

Rouland notes that he's in the process of reaching out to affected companies, and will also share the data with any chief information security officer that contacts him seeking to protect staff or users.

You can check for your own username in the breach using Hasso Plattner Institute's tool here, and should change the passwords for any breached sites it flags for which you haven't already. As always, don't reuse passwords, and use a password manager. (Troy Hunt's service HaveIBeenPwned offers another helpful check of whether your passwords have been compromised, though as of this writing it doesn't yet include Collections #2-5.)

Bargain Bin

Rouland speculates that the data may have been stitched together from older breaches and put up for sale, but then stolen or bought by a hacker who, perhaps to devalue an enemy's product, leaked it more broadly. The torrent tracker file he used to download the collection included a "readme" that requested downloaders "please seed for as long as possible," Rouland notes. "Someone wants this out there," he says. (The "readme" also noted that another dump of data missing from the current torrent collection might be coming soon.)