Reading the comments around the internet about my release of 10 million passwords, I realized that perhaps some people don’t quite grasp how bad the situation really is. It’s really bad. The target audience of my original article was IT security professionals and network administrators who see this stuff on a daily basis, but the news of my data release has reached far beyond that audience which has brought to my attention some misunderstanding of the context of my data release. I thought maybe it would be helpful for people to get a glimpse into what I see as I collect passwords.

My main source for passwords in the last few years has been Pastebin and similar sites. Pastebin is a web site where you can paste text data to share with others. You can also do so anonymously. There are twitter bots and web sites that monitor new pastes and look for hackers leaking — or dumping — sensitive data they have stolen. On a typical day it is common to see around a hundred of these leaks, about half of those contain both usernames and unencrypted passwords — often referred to as combos — that I collect.

Dump Monitor Twitter Bot

Because the data may show up in various formats, I have to parse it which I do with a tool I wrote named Hurl. This tool recognizes many different dump formats and parses out the usernames and passwords. Here is an example of the types of formats it recognizes.

What’s interesting about Pastebin is the number of scrapers there really are out there. Within less than a minute of posting this paste there were 81 views. After a few more minutes there were 173 views as shown below. As you can see there are more than a few people monitoring this stuff. Want to set up your own scraper? The Dumpmon source code is available although I personally prefer Pystemon.

Pastebin Scraper Views

Pastebin scrapers only catch pastes when they include the actual data in the paste. Sometimes it is just a link to another file so it is important to monitor links as well. Further, by taking those pastes and seeing who links to them you can find sources, such as twitter accounts, that announce these things. I monitor those as well.

After Pastebin there are several sites I keep up with that post leaks and stolen databases. Below is a screenshot of one of these sites.

Database Dumps

I then take the names of those files and set up google alerts (and sometimes Pastebin alerts) for them. This often leads me to file collections such as the two below:

A collection of password files

Another collection of password files

I also alert on combos that certain hackers frequently use to create accounts such as Cucum01:Ber02, zolushka:natasha, and many others. These combos are so common in password lists they always lead to more passwords. Take a look at this Google search and you’ll see how prevalent these are, the alerts keep my inbox full.

Furthermore, there are hundreds of forums that share passwords. Perhaps a few screenshots are the best way to see just how many passwords people are sharing:

Of course, this is only a small sampling of forums. There are more than any one person could ever monitor.

There are also hundreds of thousands of web sites that share hacked passwords for gaming, video, porn, and file sharing sites. These don’t always produce the best quality passwords, but I do have scripts to scrape a number of these sites. In a single day those scripts can produce well over a million passwords.

If you were shocked by my releasing password data, take an hour exploring the internet and you will see that 10 million passwords really is a drop in a bucket, even a drop in a thousand buckets. Keep in mind that a big part of the effort in producing my data was getting it all the way down to 10 million in a balanced manner (I couldn’t just remove millions from the end of the file). It took me about three weeks to whittle down and then sanitize the data.

What I have shown here is only a small number of sources available out there. Most of the forums listed above provide “VIP” access for a monthly payment. If you want to spend a little money you have access to tens of millions more passwords than the freebies shared publicly. There are also IRC channels, Usenet groups, torrents, file sharing sites, and of course a number of hidden sources on Tor.

Now not all of these passwords are plaintext. Many dumps include passwords in a hashed format that requires you to crack them yourself. But that’s no problem, there are tools such as Hashcat and John the Ripper as well as wordlists out there that make this a trivial task.

If this isn’t already overwhelming, keep in mind that this is just the stuff that certain hackers have decided to make public. Surely the troves of accounts that have been hacked over the years completely dwarf what has been publicly shared. There could be billions, or tens of billions more accounts that have been hacked. If you are worried that the data I released contains your password, you still aren’t worried enough. There is a very good chance your passwords have been hacked, go change them.

So who besides me collects these passwords? Use your imagination.