Frequently I get requests from students and security researchers to get a copy of my password research data. I typically decline to share the passwords but for quite some time I have wanted to provide a clean set of data to share with the world. A carefully-selected set of data provides great insight into user behavior and is valuable for furthering password security. So I built a data set of ten million usernames and passwords that I am releasing to the public domain.

But recent events have made me question the prudence of releasing this information, even for research purposes. The arrest and aggressive prosecution of Barrett Brown had a marked chilling effect on both journalists and security researchers. Suddenly even linking to data was an excuse to get raided by the FBI and potentially face serious charges. Even more concerning is that Brown linked to data that was already public and others had already linked to.

This is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution.

In 2011 and 2012 news stories about Anonymous, Wikileaks, LulzSec, and other groups were daily increasing and the FBI was looking more and more incompetent to the public. With these groups becoming more bold and boastful and pressure on the FBI building, it wasn’t too surprising to see Brown arrested. He was close to Anonymous and was in fact their spokesman. The FBI took advantage of him linking to a data dump to initiate charges of identity theft and trafficking of authentication features. Most of us expected that those charges would be dropped and some were, although they still influenced his sentence.

At Brown’s sentencing, Judge Lindsay was quoted as saying “What took place is not going to chill any 1st Amendment expression by Journalists.” But he was so wrong. Brown’s arrest and prosecution had a substantial chilling effect on journalism. Some journalists have simply stopped reporting on hacks from fear of retribution and others who still do are forced to employ extraordinary measures to protect themselves from prosecution.

Which brings me back to these ten million passwords.

Why the FBI Shouldn’t Arrest Me

Although researchers typically only release passwords, I am releasing usernames with the passwords. Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone. Most researchers are afraid to publish usernames and passwords together because combined they become an authentication feature. If simply linking to already released authentication features in a private IRC channel was considered trafficking, surely the FBI would consider releasing the actual data to the public a crime.

But is it against the law? There are several statutes that the government used against brown as summarized by the Digital Media Law Project:

Count One: Traffic in Stolen Authentication Features, 18 U.S.C. §§ 1028(a)(2), (b)(1)(B), and (c)(3)(A); Aid and Abet, 18 U.S.C. § 2: Transferring the hyperlink to stolen credit card account information from one IRC channel to his own (#ProjectPM), thereby making stolen information available to other persons without Stratfor or the card holders’ knowledge or consent; aiding and abetting in the trafficking of this stolen data.

Count Two: Access Device Fraud, 18 U.S.C. §§ 1029(a)(3) and (c)(1)(A)(i); Aid and Abet, 18 U.S.C. § 2: Aiding and abetting the possession of at least fifteen unauthorized access devices with intent to defraud by possessing card information without the card holders’ knowledge and authorization.

Counts Three Through Twelve: Aggravated Identity Theft, 18 U.S.C. § 1028A(a)(1); Aid and Abet, 18 U.S.C. § 2: Ten counts of aiding and abetting identity theft, for knowingly and without authorization transferring identification documents by transferring and possessing means of identifying ten individuals in Texas, Florida, and Arizona, in the form of their credit card numbers and the corresponding CVVs for authentication as well as personal addresses and other contact information.

While these particular indictments refer to credit card data, the laws do also reference authentication features. Two of the key points here are knowingly and with intent to defraud.

In the case of me releasing usernames and passwords, the intent here is certainly not to defraud, facilitate unauthorized access to a computer system, steal the identity of others, to aid any crime or to harm any individual or entity. The sole intent is to further research with the goal of making authentication more secure and therefore protect from fraud and unauthorized access.

To ensure that these logins cannot be used for illegal purposes, I have:

Limited identifying information by removing the domain portion from email addresses Combined data samples from thousands of global incidents from the last five years with other data mixed in going back an additional ten years so the accounts cannot be tied to any one company. Removed any keywords, such as company names, that might indicate the source of the login information. Manually reviewed much of the data to remove information that might be particularly linked to an individual Removed information that appeared to be a credit card or financial account number. Where possible, removed accounts belonging to employees of any government or military sources [Note: although I can identify government or military logins when they include full email addresses, sometimes these logins get posted without the domains, without mentioning the source, or aggregated on other lists and therefore it is impossible to know if I have removed all references.]

Furthermore, I believe these are primarily dead passwords, which cannot be defined as authentication features because dead passwords will not allow you to authenticate. The likelihood of any authentication information included still being valid is low and therefore this data is largely useless for illegal purposes. To my knowledge, these passwords are dead because:

All data currently is or was at one time generally available to anyone and discoverable via search engines in a plaintext (unhashed and unencrypted) format and therefore already widely available to those with an intent to defraud or gained unauthorized access to computer systems. The data has been publicly available long enough (up to ten years) for companies to reset passwords and notify users. In fact, I would consider any organization to be grossly negligent to be unaware of these leaks and still have not changed user passwords after these being publicly visible for such a long period of time. The data is collected by numerous web sites such as haveibeenpwned or pwnedlist and others where users can check and be notified if their own accounts have been compromised. Many companies, such as Facebook, also monitor public data dumps to identify user accounts in their user base that may have been compromised and proactively notify users. A portion of users, either on their own or required by policy, change their passwords on a regular basis regardless of being aware of compromised login information. Many organizations, particularly in some industries, actively identify unusual login patterns and automatically disable accounts or notify account owners.

Ultimately, to the best of my knowledge these passwords are no longer be valid and I have taken extraordinary measures to make this data ineffective in targeting particular users or organizations. This data is extremely valuable for academic and research purposes and for furthering authentication security and this is why I have released it to the public domain.

Having said all that, I think this is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution or legal harassment. I had wanted to write an article about the data itself but I will have to do that later because I had to write this lame thing trying to convince the FBI not to raid me.

I could have released this data anonymously like everyone else does but why should I have to? I clearly have no criminal intent here. It is beyond all reason that any researcher, student, or journalist have to be afraid of law enforcement agencies that are supposed to be protecting us instead of trying to find ways to use the laws against us.

Slippery Slopes

For now the laws are on my side because there has to be intent to commit or facilitate a crime. However, the White House has proposed some disturbing changes to the Computer Fraud and Abuse act that will make things much worse. Of particular note is 18 U.S.C. § 1030. (a)(6):

(6) knowingly and with intent to defraud willfully traffics (as defined in section 1029) in any password or similar information, or any other means of access, knowing or having reason to know that a protected computer would be accessed or damaged without authorization in a manner prohibited by this section as the result of such trafficking;

The key change here is the removal of an intent to defraud and replacing it with willfully; it will be illegal to share this information as long as you have any reason to know someone else might use it for unauthorized computer access.

It is troublesome to consider the unintended consequences resulting from this small change. I wrote about something back in 2007 that I’d like to say again:

…it reminds me of IT security best practices. Based on experience and the lessons we have learned in the history of IT security, we have come up with some basic rules that, when followed, go a long way to preventing serious problems later.

So many of us security professionals have made recommendations to software companies about potential security threats and often the response is that they don’t see why that particular threat is a big deal. For example, a bug might reveal the physical path to a web content directory. The software company might just say “so what?” because they cannot see how that would result in a compromise. Unfortunately, many companies have learned “so what” the hard way.

The fact is that it doesn’t matter if you can see the threat or not, and it doesn’t matter if the flaw ever leads to a vulnerability. You just always follow the core rules and everything else seems to fall into place.

This principle equally applies to the laws of our country; we should never violate basic rights even if the consequences aren’t immediately evident. As serious leaks become more common, surely we can expect tougher laws. But these laws are also making it difficult for those of us who wish to improve security by studying actual data. For years we have fought increasingly restrictive laws but the government’s argument has always been that it would only affect criminals.

The problem is that it is that the laws themselves change the very definition of a criminal and put many innocent professionals at risk.

The Download Link

Again, this is stupid that I have to do this, but:

BY DOWNLOADING THIS AUTHENTICATION DATA YOU AGREE NOT TO USE IT IN ANY MANNER WHICH IS UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL, OR IN CONNECTION WITH ANY UNLAWFUL, ILLEGAL, FRAUDULENT OR HARMFUL PURPOSE OR ACTIVITY INCLUDING BUT NOT LIMITED TO FRAUD, IDENTITY THEFT, OR UNAUTHORIZED COMPUTER SYSTEM ACCESS. THIS DATA IS ONLY MADE AVAILABLE FOR ACADEMIC AND RESEARCH PURPOSES.

Mega download (84.7 mb)

Magnet link:

magnet:?xt=urn:btih:32E50D9656E101F54120ADA3CE73F7A65EC9D5CB

For more information on this data, please see this FAQ.

As a final note, be aware that if your password is not on this list that means nothing. This is a random sampling of thousands of dumps consisting of upwards to a billion passwords. Please see the links in the article for a more thorough check to see if your password has been leaked. Or you could just Google it.