Hash functions, one-way functions and P = NP

Take a function \(f\) that maps from some bit string of length \(N\) to another bit string also of length \(N\).

Suppose that this function is a 'hash function' with some properties we want:

1) Given an output value, it takes on average \(2^{N-1}\) evaluations of different inputs passed to \(f\) before we get the given output value. Note that if we try all \(2^N\) different inputs, we will definitely find the input that gives the output we are searching for, if it exists. So we want our hash function to require half than number of evaluations on average before we find it, e.g. \({2^N}/2 = 2^{N-1}\).

2) \(f\) runs in polynomial time on the length of the input \(N\). All practical hash functions have this property.

So let's consider a search problem — call it \(S\). Instances of \(S\) consist of an output string \(y\) (of \(N\) bits length), and the problem is to find the input \(x\) such that \(f(x) = y\).

The first property we are assuming of \(f\) implies that any algorithm that solves this problem runs in time \(2^{N-1}\), which is exponential in \(N\), so not polynomial in \(N\). So \(S\) is not an element of \(P\), the set of problems that are solvable in polynomial time.

Now, let's suppose that we have some proposed input value \(x\). We can verify that \(f(x) = y\) in polynomial time, since we have assumed that \(f\) runs in polynomial time. If \(f(x)

eq y\), the proposed solution will be rejected in polynomial time. This means that we have an algorithm (the verification algorithm) that verifies a proposed solution in polynomial time.

But we know that the complexity class \(NP\) (non-deterministic polynomial) is the class of problems that can be verified in polynomial time. Therefore \(S\) is in \(NP\).

Since we already proved (given our assumptions about \(f\)) that \(S\) is not in \(P\), we therefore have \(P

eq NP\).

So we have the following situation:

The existence of a hash function with the properties we want implies \(P

eq NP\).

Since no-one knows if \(P = NP\), we can deduce that no-one knows if hash functions with the properties we want exist.

Fixed length outputs

Note that this proof only applies for hash functions from strings of N bits to strings of N bits. Some cryptographic hash functions in the real world are defined like this, such as the Skein hash function. However, most cryptographic functions produce a fixed length output, such as SHA-256 that has a 256 bit output length.

Regardless, I think the same general principle applies. As far as I know, no hash function with a fixed length output has been proved to have the analogous properties as listed above. In other words, no hash function with fixed length output N has been proved to take exponential time on N to find an input that maps to the given output.

One-way functions

This property of hash functions we are thinking about is also called the 'one-way' function property. A hash function with this property is one-way in the following sense: we can compute the output \(f(x)\), from the input \(x\), easily and efficiently. However going in the 'other direction', e.g. computing the inverse, \(f^{-1}(y)\), is not computable efficiently.

Remarkably we don't know if such one-way functions exist.

Implications

The implication of the above proof and comments is that no-one knows if cryptographic hash functions like the SHA hash functions really have the secure properties that people hope they do.

Many aspects of modern software depend on such hash functions, such as message authentication codes, code signing of executables, and even systems such as bitcoin, which uses the SHA-256 hash function in its proof-of-work scheme.

That these hash functions are not proved to be cryptographically secure is somewhat well known, but it's not that uncommon to hear people assuming that the security or properties of such systems have actually been proved.

This is quite similar to the situation with factoring of large numbers which is used in public key cryptography, which is suspected, or commonly treated as if it is not solvable in polynomial time, even though it has not been proved.

More reading:

'One way functions' on Wikipedia

Introduction to Algorithms, Third Edition

Edit: Discussion on Reddit