Crypto enthousiast Christian ‘CodesInChaos’ Winnerlein recently tweeted:

plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd and eBkXQTfuBqp'cTcar&g* have the same PBKDF2-HMAC-SHA1 hash.

This intrigued me, so I decided to find out what is going on exactly, and why this happens. If you’re curious too, keep reading.

To confirm these findings, I wrote a Node.js script:

#!/usr/bin/env node



'use strict';



const crypto = require('crypto');

const assert = require('assert');



const salt = 'hunter2'; // can be anything

const iterations = 4; // can be any number

const keyLength = 16; // can be any number



const hash = (passphrase) => {

return crypto.pbkdf2Sync(passphrase, salt, iterations, keyLength).toString();

};



const string1 = 'plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd';

const string2 = 'eBkXQTfuBqp\'cTcar&g*';



const hash1 = hash(string1);

const hash2 = hash(string2);



assert(string1 != string2, 'Passwords should be different');

assert(hash1 == hash2, 'Hashes should be the same (collision)');

Running the script confirms that both strings indeed have the same PBKDF2-HMAC-SHA1 hash.

Explanation

PBKDF2 is a widely used method to derive a key of given length based on a given password, salt and number of iterations. In this case it specifically uses HMAC with the SHA-1 hash function, which is the default as per RFC2898.

HMAC has an interesting property: if a supplied key is longer than the block size of the hash function that’s being used, it uses the hash of the key rather than the key itself.

SHA-1 has a block size of 512 bits, which equals 64 bytes.

So in this case, if the supplied key takes up more than 64 bytes, then SHA1(key) is used as the key. More generally, for any chosen_password larger than 64 bytes, the following holds true (pseudo-code):

PBKDF2_HMAC_SHA1(chosen_password) == PBKDF2_HMAC_SHA1(HEX_TO_STRING(SHA1(chosen_password)))

Note that the smallest password of the two always has a length of 20 characters, because SHA1 hashes always consist of exactly 40 hexadecimal digits representing 20 bytes. One byte, i.e. two hexadecimal digits are used for each character in the colliding password.

For example, in Bash:

$ printf 'plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd' | sha1sum | xxd -r -p

eBkXQTfuBqp'cTcar&g*

That is why plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd and eBkXQTfuBqp'cTcar&g* have the same PBKDF2-HMAC-SHA1 hash.

Consequences

This effectively means you can come up with as many PBKDF2-HMAC-SHA1 collisions as you like. In fact, as long as PBKDF2 is used in combination with HMAC and any hashing algorithm, the same trick can be applied — the only variable is the hash function’s block size. It’s trivial to find colliding passwords when hashing with PBKDF2-HMAC-anything.

So why did Chris choose plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd and eBkXQTfuBqp'cTcar&g* , of all possible collisions? Wouldn’t it be more fun to tweet about a collision with, say, lolololololololololololololololololololololololololololololololol ?

$ printf 'lolololololololololololololololololololololololololololololololol' | sha1sum

6ff0b2d76dae24f8b58472ba19f918f07359c0c0



$ printf 'lolololololololololololololololololololololololololololololololol' | sha1sum | xxd -r -p

o��m�$�r���sY��

While it’s easy to find a collision for any given string larger than 64 bytes (just run the above Bash command), it gets trickier if you want the colliding password to consist of readable ASCII characters only. SHA1 hashes can contain any hexadecimal digits, and converting such a hash back into a string is likely to result in at least one character outside of the printable ASCII range ( [\x20-\x7E] ).

I wrote a Python script to brute-force PBKDF2-HMAC-SHA1 collisions where the large (> 64 bytes) password has a prefix of choice, and where the colliding password consists of printable ASCII characters only.

#!/usr/bin/env python

# coding=utf-8



import hashlib

import itertools

import re

import string

import sys



TOTAL_LENGTH = 65

PREFIX = sys.argv[1] if len(sys.argv) > 1 else ''



prefix_length = len(PREFIX)

brute_force_length = TOTAL_LENGTH - prefix_length

passwords = itertools.product(string.ascii_lowercase, repeat=brute_force_length)

regex_printable = re.compile('[\x20-\x7E]+$')

base_hasher = hashlib.sha1()

base_hasher.update(PREFIX)



for item in itertools.imap(''.join, passwords):

hasher = base_hasher.copy()

hasher.update(item)

sha1_hash = hasher.digest()

if regex_printable.match(sha1_hash):

print u'%s \U0001F4A5 %s'.encode('utf-8') % (PREFIX + item, sha1_hash)

First, I let it run for a few hours without specifying a prefix, and it found the following collisions:

$ ./brute-force.py

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaadzuyfdt 💥 /JRb+z%,6f{$;*|#\LHT

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaebihgje 💥 @3d9ggezHn@iy,vV/#YC

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaagkpicoe 💥 m;Ec4m@1JW)TOSgGl3ZO

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaidwsoeu 💥 Y#pt*^.[}~.6jx!:fu'P

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaitkpvbh 💥 RFvc?%tbygGt(fy7G*+,

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaajhixpyq 💥 @x!iEK2B*N]X`S$u"CEV

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaakidzupe 💥 1t_lP?o}R;YWoJPF7!GY

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaamyrpmpv 💥 &nbSlEfC.X`D0(l)x[tV

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaandkfdci 💥 %U;/> ,3S/4dv!fUku*N

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaowdgicp 💥 b<-'^;Qt7~G[8>\6=wH(

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaqdpodre 💥 EmjaaG|_\Eq;+Wgl%<@)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaqsmqyjo 💥 oZD49:*Cd)PFCubU[^)_

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasbvoipq 💥 Woyw!itp af;uJo'Z-x#

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaslsvwra 💥 +ME:wn{F[<f_Zw%yWN\j

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaatljisbf 💥 w<0k!([95gEP%G^?&tP*

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaavdramcj 💥 ?!`e6 m]e/JJubY`|ZM1

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaavutrypa 💥 d63mH`L=IW3Ucwb.FRec

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaawdxljmb 💥 h?2O+Pm5^x|^`du`A:@^

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxdovzru 💥 ks*A]XD!U1I4[`!:+@s)

…

Brute-forcing collisions with chosen prefixes is much more fun, though: