On November 5th, 2018 I was hanging out in the #cryptography-dev channel on Freenode IRC when the bot reported a new GitHub issue with a tantalizing — dare I say salacious sounding — title.

The Holy Grail.

The Philosopher’s Stone.

Had the bug reporter discovered a method to generate SHA256 collisions on demand? Was the internet broken? Cryptocurrencies?

Not quite.

But First, What Are Collisions?

SHA256 is a SHA-2 family (Secure Hash Algorithm 2) of cryptographic hash functions. Their job is to take incoming data of arbitrary size and return a random-seeming fixed-size chunk of data in return.

We say random seeming because hash algorithms are deterministic: if you put in the same input, you get the same output. It’s just really mixed up.

What makes cryptographic hash algorithms special, though, is that if you change even one bit of the incoming data the resulting hash is massively different.

Example:

“My name is Tim” 🠒 98DD8CA3645BB710E23B956A66613C132C514A4A7A478410E16327865107A7F2

“My name is Jim” 🠒

9DD0BAF8C65F92D13AAE725C038825C4DEFEBB5B7BB14152E5436132EA8A2FC1

Lot’s of things rely on hashes, such as account authentication (don’t store passwords, store the hashes of passwords to compare login attempts against!) and cryptocurrency mining.

A collision is discovering a given input A such that a different input A’ generates the same hash.

Being able to do so in a predictable manner would be the equivalent of printing your own digital gold: a mathematical Philosopher’s Stone.

A Python Demonstration?

First, before we go any further, the bug reporter submitted the issue with a sense of humility. They ask for a review of their claim and are not bold about it, and as we all should know: always seek to invalidate new assumptions.

The reproduction code is as follows:

from cryptography.hazmat.backends import default_backend

from cryptography.hazmat.primitives import hashes

digest = hashes.Hash(hashes.SHA256(), backend=default_backend())

digest.update(bytes(241))

digest.update(bytes(151))

digest.update(bytes(7))

print(digest.finalize())

digest1 = hashes.Hash(hashes.SHA256(), backend=default_backend())

digest1.update(bytes(19))

digest1.update(bytes(151))

digest1.update(bytes(229))

print(digest1.finalize())

And the result of both digest and digest1 are the same:

( b'\x815\x8eX\xee\xcd:6\xc8\x8f@\x07\x172\xdb\x0b\xe7\x9c\xe4\xa7\x10\xe2\x9c\xca \xb2\xf7i\xbb\xaa\xc7k'`)

By the time I had finished copying the code and running it on my own box with the Python interpreter the issue had already been closed by Alex Gaynor.

11 Minutes Later

Alex had reviewed the code and pointed out the fatal logic flaw in the reporter’s implementation.

Following along his short, and quite polite mind you, explanation it was easy to see that the bug reporter had misunderstood the bytes() function and its default arguments and return values.

The Python Built-in Functions doc explains that bytes() follows the constructor arguments of bytearray() and when supplied with only an integer n returns an array of size n full of null bytes.

digest then must be a digest of 241+151+7 = 399 null bytes digest1 then must be a digest of 19+151+229 = 399 null bytes

In other words, despite ordering the construction of the digest and digest1 differently the bug reporter had actually created the exact same input digest for both cases.

Naturally, the resulting hashes were identical.

Thankfully for us and the rest of the cryptographic world no major bug with OpenSSL or Python’s cryptography package was unveiled: just a good anecdote about double-checking assumptions.