In a recent post, Adam Langley complains that “SHA-3 is slow”. Similar comments appear from time to time on the web (see also David Wong's post). But what does it mean precisely? Let us dig into it.

Hardware and software

There are a couple of ambiguities in this statement. Let's start with the first one: where would it be “slow”?

Keccak, the winner of the SHA-3 competition, is blazing fast when implemented on dedicated (ASIC) or programmable (FPGA) hardware. Its throughput for a given circuit area is an order of magnitude higher than SHA-2 or any of the SHA-3 finalists. And if you care beyond plain speed, note that it also consumes much less energy per bit. In this sense, Keccak is a green cryptographic primitive.

Keccak has other implementation advantages, like efficient protections against side-channel attacks, but let's go to the point: what seems to be at stake is the speed in software.

Did you say “SHA-3”?

How come then, that SHA3-512 is slower than SHA-512 on modern processors? This brings up to the second ambiguity of the statement: what is “SHA-3”?

“SHA-3” initially refers to the target of the competition that NIST organized from 2008 to 2012, namely a new hash standard that would complement SHA-2 in case it is broken. Hence, the initially-intended outcome of the competition is a set of four functions called SHA3-224, SHA3-256, SHA3-384, and SHA3-512.

If “SHA-3” means these four functions, then indeed SHA3-512 is unnecessarily slowed down by an excessive security parameter. Due to an absurd rule in the competition, followed by a fierce controversy in 2013, the parameters of SHA3-512 are stuck at offering 512 bits of pre-image security, a nonsense for anyone who can compute powers of two.

It turned out that Keccak has more to offer than just the drop-in replacements for SHA-{224, 256, 384, 512}. In FIPS 202 (“the SHA-3 standard”), NIST also approves two extendable-output functions (XOFs) called SHAKE128 and SHAKE256. These generalize the traditional hashing paradigm by allowing any arbitrary output length and by being parameterized by their security strength, instead of their output length.

If the term “SHA-3” embraces the aforementioned XOFs, then SHAKE{128, 256} are on par with SHA-2 on common processors. But could “SHA-3” actually be super fast in software?

Parallelism

To answer this last question, let us go further down the standardization road. Recently, NIST released the SP 800-185 standard (“SHA-3 derived functions”), which proposes a framework for customizable functions, called cSHAKE, that generalize SHAKE{128, 256}. And, as an application of this framework, it approves the ParallelHash functions. These functions internally use a parallel mode of operation, which an implementation can exploit to speed-up the processing.

With these standard functions included, “SHA-3” can actually outperform SHA-2 and even SHA-1 (broken but usually considered fast) for long messages on modern processors.

Security and performance hand-in-hand

Of course, a cryptographic functions should carefully balance speed and security. Keccak makes use of sound design properties, like fast linear and differential propagation or purposely weak alignment, and it clearly stays away from the ARX (modular additions, rotations and exclusive or's) approach. What we understand of Keccak now made us wonder: aren't 24 rounds too much?

KangarooTwelve is a recently proposed variant of Keccak, in which the number of rounds has been reduced to 12—yet, with exactly the same round function, no tweak! It may seem like a drastic reduction, but it is in line with our proposed solution for authenticated encryption, Keyak, submitted to the CAESAR competition. And, more importantly, this wouldn't have been possible if Keccak hadn't had the chance of getting such a rather extensive scrutiny from cryptanalysis experts throughout the years—and to resist to them.

KangarooTwelve is defined on the “SHA-3” components from FIPS 202 and may please the aficionados of extreme software speed. But beware: with such speeds, the hard drive or the network connection has long become the bottleneck for most applications.