[noise] ChaCha speed update

A year ago I wrote: > Concretely, several generations of Intel chips have run 12-round > ChaCha12-256 at practically the same speed as 12-round AES-192 (with a > similar security margin), even though AES-192 has "hardware support", a > smaller key, a smaller block size, and smaller data limits. For example: > > * Both ciphers are ~1.7 cycles/byte on Westmere (introduced 2010). > * Both ciphers are ~1.5 cycles/byte on Ivy Bridge (introduced 2012). > * Both ciphers are ~0.8 cycles/byte on Skylake (introduced 2015). > > ChaCha20-256 is slower than ChaCha12-256 but this is entirely because it > has a much larger security margin. For reasons explained below, I > wouldn't be surprised to see ChaCha20-256 running _faster_ than AES-256 > on future Intel chips. Romain Dolbeau has now submitted benchmarks for an Intel Skylake with AVX-512: * 0.48 cycles/byte: 12-round ChaCha12-256. * 0.48 cycles/byte: 12-round Salsa20/12-256. * 0.69 cycles/byte: 20-round ChaCha20-256. * 0.69 cycles/byte: 20-round Salsa20-256. * 0.87 cycles/byte: 14-round AES-256. https://bench.cr.yp.to/results-stream.html#amd64-manny1024 Thanks to Intel for giving up on AES and joining the monoculture! :-) The code is C code from Dolbeau, using _mm512_rol_epi32() etc. Meanwhile there's a burst of papers this month from people struggling with the security limitations of AES: https://eprint.iacr.org/2017/697 https://eprint.iacr.org/2017/702 https://eprint.iacr.org/2017/708 Admittedly, the performance comparison is currently the other way around on AMD Ryzen, which is basically a 128-bit machine (256-bit instructions take two operations) with two AES units. But the gap will close. Mixing integer operations with vector operations will speed up Salsa and ChaCha on these chips, as on NEON. More importantly, subsequent AMD chips, just like Intel chips, will be pressured to improve performance of general-purpose vector instructions. ---Dan P.S. In news that's not unrelated, the soon-to-be-online "NTRU Prime" software includes int32-array-sorting software that, on modern Intel CPUs, solidly beats Intel's "performance" library.