[curves] new 25519 measurements of formally verified implementations

Hello, For use in WireGuard [0], I've been evaluating hacl-star [1] and fiat-crypto [2], which both have formally verified implementations of 25519. The former has a 64-bit implementation that has made its way into NSS and the latter has a 32-bit implementation that has made its way into BoringSSL. In trying to benchmark these, I found ordinary techniques much too noisy, so I've devised a way of loading this code into the kernel and disabling all interrupts. This results in very little variation between trials. If you're running this on your workstation, you'll encounter a complete lockup of all interactivity while the tests are in progress. You can find the tool here: https://git.zx2c4.com/kbench9000/about/ Loading in a few implementations and running them on my big laptop and on my mini laptop yielded these results: Xeon E3-1505M v5 [3] -- 14nm Skylake Laptop donna64: 121794 cycles per call hacl64: 109783 cycles per call sandy2x: 103071 cycles per call amd64: 108580 cycles per call fiat32: 232710 cycles per call donna32: 411588 cycles per call i5-5200U [4] -- 14 nm Broadwell Laptop donna64: 137200 cycles per call hacl64: 127001 cycles per call sandy2x: 126201 cycles per call amd64: 123694 cycles per call fiat32: 253234 cycles per call donna32: 438816 cycles per call So it seems that the formally verified implementations are now quite competitive. The speed-up with fiat also comes with the advantage of using significantly less stack space, which is a big help (to me at least) on smaller systems. Regards, Jason [0] https://www.wireguard.com [1] https://github.com/mitls/hacl-star [2] https://github.com/mit-plv/fiat-crypto [3] https://ark.intel.com/products/89608/Intel-Xeon-Processor-E3-1505M-v5-8M-Cache-2_80-GHz [4] https://ark.intel.com/products/85212/Intel-Core-i5-5200U-Processor-3M-Cache-up-to-2_70-GHz