Introduction

I'd like to demonstrate why I think using sha256crypt or sha512crypt on current GNU/Linux operating systems is dangerous, and why I think the developers of GLIBC should move to scrypt or Argon2, or at least bcrypt or PBKDF2.

History and md5crypt

In 1994, Poul-Henning Kamp (PHK) added md5crypt to FreeBSD to address the weaknesses of DES-crypt that was common on the Unix and BSD systems of the early 1990s. DES-Crypt has a core flaw in that, not only DES reversible (which necessarily isn't a problem here), and incredibly fast, but it also limited password length to 8 characters (each of those limited to 7-bit ASCII to create a 56-bit DES key). When PHK created md5crypt, one of the things he made sure to implement as a feature was to support arbitrary-length passwords. In other words, unlike DES-Crypt, a user could have passwords greater than 9 or more characters.

This was "good enough" for 1994, but it had an interesting feature that I don't think PHK thought of at the time- md5crypt execution time is dependent on password length. To prove this, I wrote a simple Python script using passlib to hash passwords with md5crypt. I started with a single "a" character as my password, then increased the password length by appending more "a"s up until the password was 4,096 "a"s total.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

import time

from passlib. hash import md5_crypt



md5_results = [ None ] * 4096



for i in xrange ( 0 , 4096 ) :

print i ,

pw = "a" * ( i+ 1 )

start = time . clock ( )

md5_crypt. hash ( pw )

end = time . clock ( )

md5_results [ i ] = end - start



with open ( "md5crypt.txt" , "w" ) as f:

for i in xrange ( 0 , 4096 ) :

f. write ( "{0} {1}

" . format ( i+ 1 , md5_results [ i ] ) )

Nothing fancy. Start the timer, hash one "a" with md5crypt, stop the timer, and record the results. Start the timer, hash two "a"s with md5crypt, stop the timer, and record the results. Wash, rinse, repeat, until the password is 4,096 "a"s in length.

What do the timing results look like? Below are scatter plots of timing md5crypt for passwords of 1-128, 1-512, and 1-4,096 characters in length:

At first, you wouldn't think this is a big deal; in fact, you may even think you LIKE it (we're supposed to make things get slower, right? That's a good thing, right???). But, upon deeper inspection, this actually is a flaw in the algorithm's design for two reasons:

Long passwords can create a denial-of-service on the CPU (larger concern).

Passive observation of execution times can predict password length (smaller concern).

Now, to be fair, predicting password length based on execution time is ... meh. Let's be honest, the bulk of passwords will be between 7-10 characters. And because these algorithms operate in block sizes of 16, 32, or 64 bytes, an adversary learning "AHA! I know your password is between 1-16 characters" really isn't saying much. But, should this even exist in a cryptographic primitive? Probably not. Still, the larger concern would be users creating a DoS on the CPU, strictly by changing password length.

I know what you're thinking- it's 2018, so there should be no reason why any practical length password cannot be adequately hashed with md5crypt insanely quickly, and you're right. Except, md5crypt was invented in 1994, 24 years ago. According to PHK, he designed it to take about 36 milliseconds on the hardware he was testing, which would mean a speed about 28 per second. So, it doesn't take much to see that by increasing the password's length, you can increase execution time enough to affect a busy authentication server.

The question though, is why? Why is the execution time dependent on password length? This is because md5crypt processes the hash for every 16 bytes in the password. As a result, this creates the stepping behavior you see in the scatter plots above. A good password hashing design would not do this.

PHK eventually sunset md5crypt in 2012 with CVE-2012-3287. Jeremi Gosney, a professional password cracker, demonstrated with Hashcat and 8 clustered Nvidia GTX 1080Ti GPUS, that a password cracker could rip through 128.4 million md5crypt guesses per second.

You should no longer be implementing md5crypt for your password hashing.

sha2crypt and NIH syndrome

In 2007, Ulrich Drepper decided to improve things for GNU/Linux. He recognized the threat that GPU clusters, and even ASICs, posed on fast password cracking with md5crypt. One aspect of md5crypt was the hard-coded 1,000 iterations spent on the CPU, before the password hash was finalized. This cost was not configurable. Also, MD5 was already considered broken, with SHA-1 showing severe weaknesses, so he moved to SHA-2 for the core of his design.

The first thing addressed, was to make the cost configurable, so as hardware improved, you could increase the iteration count, thus keeping the cost for calculating the final hash expensive for password crackers. However, he also made a couple core changes to his design that differed from md5crypt, which ended up having some rather drastic effects on its execution.

Using code similar to above with Python's passlib, but rather using the sha256_crypt() and sha512_crypt() functions, we can create scatter plots of sha256crypt and sha512crypt for passwords up to 128-characters, 512-characters, and 4,096-characters total, just like we did weth md5crypt. How do they fall out? Take a look:

Curious. Not only do we see the same increasing execution time based on password length, but unlike md5crypt, that growth is polynomial. The changes Ulrich Drepper made from md5crypt are subtle, but critical. Essentially, not only do we process the hash for every character in the password per round, like md5crypt, but we process every character in the password three more times. First, we take the binary representation of each bit in the password length, and update the hash based on if we see a "1" or a "0". Second, for every character in the password, we update the hash. Finally, again, for every character in the password, we update the hash.

For those familiar with big-O notation, we end up with an execution run time of O(pw_length2 + pw_length*iterations). Now, while it is true that we want our password hashing functions to be slow, we also want the iterative cost to be the driving factor in that decision, but that isn't the case with md5crypt, and it's not the case with sha256crypt nor sha512crypt. In all three cases, the password length is the driving factor in the execution time, not the iteration count.

Again, why is this a problem? To remind you:

Long passwords can create a denial-of-service on the CPU (larger concern).

Passive observation of execution times can predict password length (smaller concern).

Now, granted, in practice, people aren't carrying around 4 kilobyte passwords. If you are a web service provider, you probably don't want people uploading 5 gigabyte "passwords" to your service, creating a network denial of service. So you would probably be interested in creating an adequate password maximum, such as what NIST recommends at 128 characters, to prevent that from occurring. However, if you have an adequate iterative cost (such as say, 640,000 rounds), then even moderately large passwords from staff, where such limits may not be imposed, could create a CPU denial of service on busy authentication servers.

As with md5crypt, we don't want this.

Now, here's what I find odd about Ulrich Drepper, and his design. In his post, he says about his specification (emphasis mine):

Well, there is a problem. I can already hear everybody complaining that I suffer from the NIH syndrome but this is not the reason. The same people who object to MD5 make their decisions on what to use also based on NIST guidelines. And Blowfish is not on the lists of the NIST. Therefore bcrypt() does not solve the problem. What is on the list is AES and the various SHA hash functions. Both are viable options. The AES variant can be based upon bcrypt(), the SHA variant could be based on the MD5 variant currently implemented. Since I had to solve the problem and I consider both solutions equally secure I went with the one which involves less code. The solution we use is based on SHA. More precisely, on SHA-256 and SHA-512.

PBKDF2 was standardized as an IETF standard in September 2000, a full 7 years before Ulrich Drepper created his password hashing functions. While PBKDF2 as a whole would not be blessed by NIST until 3 years later, in December 2010 in SP 800-132, PBKDF2 can be based on functions that, as he mentioned, were already in the NIST standards. So, just like his special design that is based on SHA-2, PBKDF2 can be based on SHA-2. Where he said "I went with the one which involves less code", he should have gone with PBKDF2, as code had already long since existed in all sorts of cryptographic software, including OpenSSL.

This seems to be a very clear case of NIH syndrome. Sure, I understand not wanting to go with bcrypt, as it's not part of the NIST standards . But don't roll your own crypto either, when algorithms already exist for this very purpose, that ARE based on designs that are part of NIST.

So, how does PBKDF2-HMAC-SHA512 perform? Using similar Python code with the passlib password hashing library, it was trivial to put together:

What this clearly demonstrates, is that the only factor driving execution time, is the number of iterations you apply to the password, before delivering the final password hash. This is what you want to achieve, not giving the opportunity for a user to create a denial-of-service based on password length, nor an adversary learn the length of the user's password based on execution time.

This is the sort of details that a cryptographer or cryptography expert would pay attention to, as opposed to an end-developer.

It's worth pointing out that PBKDF2-HMAC-SHA512 is the default password hashing function for Mac OS X, with a variable cost between 30,000 and 50,000 iterations (typical PBKDF2 default is 1,000).

OpenBSD, USENIX, and bcrypt

Because Ulrich Drepper brought up bcrypt, it's worth mentioning in this post. First off, let's get something straight- bcrypt IS NOT Blowfish. While it's true that bcrypt is based on Blowfish, they are two completely different cryptographic primitives. bcrypt is a one-way cryptographic password hashing function, where as Blowfish is a two-way 64-bit block symmetric cipher.

At the 1999 USENIX conference, Niels Provos and David Mazières, of OpenBSD, introduced bcrypt to the world (it was actually in OpenBSD 2.1, June 1, 1997). They were critical of md5crypt, stating the following (emphasis mine):

MD5 crypt hashes the password and salt in a number of different combinations to slow down the evaluation speed. Some steps in the algorithm make it doubtful that the scheme was designed from a cryptographic point of view--for instance, the binary representation of the password length at some point determines which data is hashed, for every zero bit the first byte of the password and for every set bit the first byte of a previous hash computation.

PHK was slightly offended by their off-handed remark that cryptography was not his core consideration when designing md5crypt. However, Niels Provos was a graduate student in the Computer Science PhD program at the University of Michigan at the time. By August 2003, he had earned his PhD. Since 1997, bcrypt has withstood the test of time, it has been considered "Best Practice" for hashing passwords, and is still well received today, even though better algorithms exist for hashing passwords.

bcrypt limits password input to 72 bytes. One way around the password limit is with pre-hashing. A common approach in pseudocode is to hash the password with SHA-256, encode the digest into base64, then feed the resulting ASCII string into bcrypt. In pseudocode:

pwhash = bcrypt(base64(sha-256(password)))

This results in a 44-byte password (including the "=" padding) that is within the bounds of the 72 byte bcrypt limitation. This prehashing allows users to have any length password, while only ever sending 44 bytes to bcrypt. My implementation in this benchmark uses the passlib.hash.bcrypt_sha256.hash() method. How does bcrypt compare to md5crypt, sha256crypt, and sha512crypt in execution time based on password length?

Now, to be fair, bcrypt is only ever hashing 44 byte passwords in the above results, because of my prehashing. So of course it's running in constant time. So, how does it look with hashing 1 to 72 character passwords without prehashing?

Again, we see consistent execution, driven entirely by iteration cost, not by password length.

Colin Percival, Tarsnap, and scrypt

In May 2009, mathematician Dr. Colin Percival presented to BSDCan'09 about a new adaptive password hashing function called scrypt, that was not only CPU expensive, but RAM expensive as well. The motivation was that even though bcrypt and PBKDF2 are CPU-intensive, FPGAs or ASICs could be built to work through the password hashes much more quickly, due to not requiring much RAM, around 4 KB. By adding a memory cost, in addition to a CPU cost to the password hashing function, we can now require the FPGA and ASIC designers to onboard a specific amount of RAM, thus financially increasing the cost of production. scrypt recommends a default RAM cost of at least 16 MB. I like to think of these expensive functions as "security by obesity".

scrypt was initially created as an expensive KDF for his backup service Tarsnap. Tarsnap generates client-side encryption keys, and encrypts your data on the client, before shipping the encrypted payload off to Tarsnap's servers. If at any event your client is lost or stolen, generating the encryption keys requires knowing the password that created them, and attempting to discover that password, just like typical password hashing functions, should be slow.

It's now been 9 years as of this post, since Dr. Percival introduced scrypt to the world, and like bcrypt, it has withstood the test of time. It has received, and continues to receive extensive cryptanalysis, is not showing any critical flaws or weaknesses, and as such is among the top choices as a recommendation from security professionals for password hashing and key derivation.

How does it fare with its execution time per password length?

I'm seeing a trend here.

The Password Hashing Competition winner Argon2

In 2013, an open public competition, in the spirit of AES and SHA-3, was held to create a password hashing function that approached password security from what we knew with modern cryptography and password security. There were many interesting designs submitted, including a favorite of mine by Dr. Thomas Pornin of StackExchange fame and BearSSL, that used delegation to reduce the work load on the honest, while still making it expensive for the password cracker.

In July 2015, the Argon2 algorithm was chosen as the winner of the competition. It comes with a clean approach of CPU and memory hardness, making the parameters easy to tweak, test, and benchmark. Even though the algorithm is relatively new, it has seen at least 5 years of analysis, as of this writing, and has quickly become the "Gold Standard" for password hashing. I fully recommend it for production use.

Any bets on how it will execution times will be affected by password length? Let's look:

Execution time is not affected by password length. Imagine that. It's as if cryptographers know what they're doing when designing this stuff.

Conclusion

Ulrich Drepper tried creating something more secure than md5crypt, on par with bcrypt, and ended up creating something worse. Don't use sha256crypt or sha512crypt; they're dangerous.

For hashing passwords, in order of preference, use with an appropriate cost:

Argon2 or scrypt (CPU and RAM hard) bcrypt or PBKDF2 (CPU hard only)

Avoid practically everything else:

md5crypt, sha256crypt, and sha512crypt Any generic cryptographic hashing function (MD5, SHA-1, SHA-2, SHA-3, BLAKE2, etc.) Any complex homebrew iterative design (10,000 iterations of salted SHA-256, etc.) Any encryption design (AES, Blowfish (ugh), ChaCha20, etc.)

UPDATE: A note about PBKDF2 that was brought up in a Twitter thread from @solardiz. PBKDF2-HMAC-SHA512 isn't really an upgrade from sha512crypt (nor PBKDF2-HMAC-SHA256 an upgrade from sha256crypt), because PBKDF2 really isn't GPU resistant in the way bcrypt is. However, bcrypt can be implemented cheaply on ASICs with only 4 KB of memory.

If your choice of password hashing in constrained to NIST standards, which includes PBKDF2, then unfortunately, bcrypt, scrypt, and Argon2 are out of the question; just make sure to use it properly, which includes choosing a high iteration count based on your authentication load capacity. At that point, password storage is probably not the worst of your security concerns.

However, if you're not limited to NIST constraints, then use the others.

Acknowledgement

Thanks to Steve Thomas (@Sc00bzT) for our discussions on Twitter for helping me see this quirky behavior with sha256crypt and sha512crypt.