Speed Benchmarks

Monocypher ships with a couple benchmarks. Run them on your platform if you're not sure Monocypher is fast enough. There are also benchmarks for Libsodium, TweetNaCl, and Libhydrogen so you can compare.

All results are presented in megabytes per second, or in operations per second ("13.5K" means 13500 operations per second). To avoid a false sense of accuracy, most reported numbers are rounded to two significant digits.

Overview

The following test the speed of Monocypher, Libsodium, TweetNaCl, Libhydrogen, and c25519 on my 64-bit Skylake core i5 Intel CPU, running Ubuntu 18.04. Everything is compiled with Ubuntu's GCC 7.4.0. Libsodium is compiled with the default options, as recommended by the installation page. Everything else uses -O3 -march=native .

+------+------+------+------+------+-------+ x86 | AEAD | Hash | Pw | Key | Sig | Check | 64 | | | hash | exch | | | +-------------+------+------+------+------+------+-------+ | Monocypher | 307 | 683 | 511 | 8100 | 14K | 6000 | | Libsodium | 1000 | 870 | 701 | 21K | 33K | 13K | | TweetNaCl | 51 | 40 | | 1800 | 650 | 330 | | Libhydrogen | 94 | 162 | | | 9200 | 5500 | +-------------+------+------+------+------+------+-------+

The same speeds, relative to Monocypher:

+------+------+------+------+------+-------+ x86 | AEAD | Hash | Pw | Key | Sig | Check | 64 | | | hash | exch | | | +-------------+------+------+------+------+------+-------+ | Monocypher | 307 | 683 | 511 | 8100 | 14K | 6000 | | Libsodium | ×3.4 | ×1.3 | ×1.7 | ×2.5 | ×2.3 | ×2.2 | | TweetNaCl | ÷6.0 | ÷17 | | ÷4.5 | ÷22 | ÷19 | | Libhydrogen | ÷3.3 | ÷4.2 | | | ÷1.6 | ÷1.1 | +-------------+------+------+------+------+------+-------+

Unsurprisingly, Libsodium is the fastest of them all, thanks to its use of vector instructions and 128-bit arithmetic. If you want speed on desktops and servers, this is the one.

Despite restricting itself to portable C code, Monocypher does not lag too far behind. Authenticated encryption can't keep up with Libsodium's excellent vector implementation (from Dolbeau), but the rest is more tolerable.

TweetNaCl is almost exclusively optimised for source code size. Performance wasn't a consideration, so its slow speed is not surprising. Note that part of the poor performance of hashing comes from using SHA-512, which is slower than Blake2b.

Libhydrogen is mostly meant for constrained environments, but has been included here anyway, because its incompatibility with Libsodium means that using it on IoT likely means using it on the server as well. Its relatively poor performance on symmetric crypto is mostly explained by the choice of the Gimli permutation, which is slower than RAX designs like Chacha20 and Blake2b when implemented in software (Hardware implementations are more efficient).

Effect of compilation options

The -O3 -march=native flags are most aggressive. Not everyone approves of these. Here's how changing flags affect Monocypher:

+------+------+------+------+------+-------+ x86 | AEAD | Hash | Pw | Key | Sig | Check | 64 | | | hash | exch | | | +-------------+------+------+------+------+------+-------+ | -03 -native | 307 | 683 | 511 | 8100 | 14K | 6000 | | -O3 | 98% | 95% | 87% | 99% | 99% | 98% | | -O2 | 90% | 84% | 72% | 94% | 85% | 92% | | -Os | 76% | 67% | 66% | 93% | 81% | 91% | +-------------+------+------+------+------+------+-------+

(Note: if -Os is used with -DBLAKE2_NO_UNROLLING to reduce Blake2 code size even further, Blake2 performance drops to 57%)

Sticking to portable instructions have almost no effect, and optimising for size is mostly tolerable. Be careful about password hashing though: if it runs slower, people will lower its security to compensate.

R-Pi overview (based on old benchmarks)

This comparison uses Monocypher 2.0.0 and Libsodium 1.0.16. They should be redone.

+--------+--------+--------+-------+--------+ R-pi | AEAD | Hash | Pw | Key | Sig | | | | hash | exch | | +------------+--------+--------+--------+-------+--------+ | Monocypher | 32MB/s | 26MB/s | 19MB/s | 680/s | 1310/s | | Libsodium | 156% | 100% | 100% | 101% | 130% | | TweetNaCl | 22% | 42% | | 11% | 3% | +------------+--------+--------+--------+-------+--------+

Third party benchmarks

The following paper by Koen Zandberg & al compares various cryptographic libraries in a constrained environment. They concentrate on firmware updates, for which signature verification is often a bottleneck.

They report the following:

The time it takes to verify a signature (seconds),

The stack size (kilobytes),

and the binary size on the flash (kilobytes).

They evaluated version 2.0.5 of Monocypher (version 2.0.6 performs the same, but uses less stack). Numbers are rounded for readability, see the paper for the raw data. Smaller is better.

+------------+-------+--------+ Cortex M0+ | Signature | Stack | Binary | | check time | size | size | +---------+-------------+------------+-------+--------+ | | Monocypher | .53s | 5.2kb | 13kb | | | HACL* | 7.1s | 3.2kb | 17kb | | | TweetNaCl | 8.0s | 3.8kb | 5.6kb | | Ed25519 | uNaCl | 8.1s | 3.8kb | 5.6kb | | | C25519 | 4.2s | .98kb | 4.6kb | | | WolfSSL | 3.7s | 1.3kb | 5.7kb | +---------+-------------+------------+-------+--------+ | P256r1 | TinyCrypt | 1.1s | .60kb | 5.0kb | | | Mbed TLS | 1.6s | .79kb | 17kb | +---------+-------------+------------+-------+--------+ | Others | qDSA | 0.13s | .49kb | 15kb | | | Libhydrogen | 1.1s | .49kb | 2.2kb | +---------+-------------+------------+-------+--------+ +------------+-------+--------+ Cortex M3 | Signature | Stack | Binary | | check time | size | size | +---------+-------------+------------+-------+--------+ | | Monocypher | .072s | 5.1kb | 10kb | | | HACL* | 1.5s | 3.3kb | 19kb | | | TweetNaCl | 2.0s | 3.8kb | 5.6kb | | Ed25519 | uNaCl | 1.8s | 3.8kb | 5.5kb | | | C25519 | 3.3s | 1.0kb | 4.8kb | | | WolfSSL | 2.7s | 1.3kb | 5.9kb | +---------+-------------+------------+-------+--------+ | P256r1 | TinyCrypt | .44s | .68kb | 4.9kb | | | Mbed TLS | 1.1s | .80kb | 15kb | +---------+-------------+------------+-------+--------+ | Others | qDSA | 1.9s | .79kb | 12kb | | | Libhydrogen | .22s | .47kb | 2.2kb | +---------+-------------+------------+-------+--------+ +------------+-------+--------+ Cortex M4 | Signature | Stack | Binary | | check time | size | size | +---------+-------------+------------+-------+--------+ | | Monocypher | .045s | 5.1kb | 10kb | | | HACL* | 1.3s | 3.3kb | 19kb | | | TweetNaCl | 1.5s | 3.8kb | 5.6kb | | Ed25519 | uNaCl | 1.5s | 3.8kb | 5.5kb | | | C25519 | 1.9s | 1.0kb | 4.8kb | | | WolfSSL | 1.7s | 1.3kb | 5.9kb | +---------+-------------+------------+-------+--------+ | P256r1 | TinyCrypt | .35s | .66kb | 4.9kb | | | Mbed TLS | .84s | .80kb | 15kb | +---------+-------------+------------+-------+--------+ | Others | qDSA | 1.3s | .97kb | 12kb | | | Libhydrogen | .24s | .44kb | 2.2kb | +---------+-------------+------------+-------+--------+

A relative comparison gives a better sense of scale:

+------------+-------+--------+ Cortex M0+ | Signature | Stack | Binary | | check time | size | size | +---------+-------------+------------+-------+--------+ | | Monocypher | 530ms | 5200 | 13000 | | | HACL* | ×13 | ÷1.6 | ×1.3 | | | TweetNaCl | ×15 | ÷1.4 | ÷2.3 | | Ed25519 | uNaCl | ×15 | ÷1.4 | ÷2.3 | | | C25519 | ×7.9 | ÷5.3 | ÷2.7 | | | WolfSSL | ×6.9 | ÷4.0 | ÷2.2 | +---------+-------------+------------+-------+--------+ | P256r1 | TinyCrypt | ×2.2 | ÷8.6 | ÷2.5 | | | Mbed TLS | ×2.9 | ÷6.6 | ×1.3 | +---------+-------------+------------+-------+--------+ | Others | qDSA | ÷3.9 | ÷11 | ×1.2 | | | Libhydrogen | ×2.0 | ÷11 | ÷5.7 | +---------+-------------+------------+-------+--------+ +------------+-------+--------+ Cortex M3 | Signature | Stack | Binary | | check time | size | size | +---------+-------------+------------+-------+--------+ | | Monocypher | 72ms | 5088 | 10334 | | | HACL* | ×21 | ÷1.6 | ×1.8 | | | TweetNaCl | ×27 | ÷1.3 | ÷1.9 | | Ed25519 | uNaCl | ×25 | ÷1.3 | ÷1.9 | | | C25519 | ×46 | ÷4.9 | ÷2.1 | | | WolfSSL | ×37 | ÷3.8 | ÷1.7 | +---------+-------------+------------+-------+--------+ | P256r1 | TinyCrypt | ×6.1 | ÷7.5 | ÷2.1 | | | Mbed TLS | ×16 | ÷6.4 | ×1.5 | +---------+-------------+------------+-------+--------+ | Others | qDSA | ×27 | ÷6.4 | ×1.2 | | | Libhydrogen | ×3.0 | ÷11 | ÷4.7 | +---------+-------------+------------+-------+--------+ +------------+-------+--------+ Cortex M4 | Signature | Stack | Binary | | check time | size | size | +---------+-------------+------------+-------+--------+ | | Monocypher | 45ms | 5088 | 10358 | | | HACL* | ×28 | ÷1.6 | ×1.8 | | | TweetNaCl | ×32 | ÷1.4 | ÷1.9 | | Ed25519 | uNaCl | ×33 | ÷1.4 | ÷1.9 | | | C25519 | ×43 | ÷5.0 | ÷2.1 | | | WolfSSL | ×38 | ÷3.8 | ÷1.8 | +---------+-------------+------------+-------+--------+ | P256r1 | TinyCrypt | ×7.7 | ÷7.7 | ÷2.1 | | | Mbed TLS | ×19 | ÷6.4 | ×1.5 | +---------+-------------+------------+-------+--------+ | Others | qDSA | ×29 | ÷5.2 | ×1.2 | | | Libhydrogen | ×5.3 | ÷12 | ÷4.8 | +---------+-------------+------------+-------+--------+

Monocypher is fast. Among all tested libraries, the only thing that outperforms it is qDSA on Cortex M0+, because it uses hand optimised assembly. And if we limit ourselves to Ed25519 (so we can use Libsodium on the server side), Monocypher blows everything out of the water.

On the other hand, Monocypher is also a bit bloated. The binary tends to lean on the bigger size, and its 5KB stack is the tallest of them all. This problem was partially addressed in version 2.0.6, which reduced stack usage down to about 3KB, without losing any performance.

Monocypher won't fit on every embedded platform¹. But when it does, it's a speed demon. And it can talk to Libsodium, which is even faster on the server.

(1) use -DBLAKE2_NO_UNROLLING to reduce code size. It may even run faster on small processors.

Raw data

Monocypher 2.0.6 (core i5 Skylake, Ubuntu 16.04)

Compiled with -O3 -march=native

Chacha20 : 410 megabytes per second Poly1305 : 1218 megabytes per second Auth'd encryption: 307 megabytes per second Blake2b : 683 megabytes per second Sha512 : 302 megabytes per second Argon2i, 3 passes: 511 megabytes per second x25519 : 8124 exchanges per second EdDSA(sign) : 14418 signatures per second EdDSA(check) : 6091 checks per second

Compiled with -O3

Chacha20 : 402 megabytes per second Poly1305 : 1202 megabytes per second Auth'd encryption: 301 megabytes per second Blake2b : 651 megabytes per second Sha512 : 248 megabytes per second Argon2i, 3 passes: 445 megabytes per second x25519 : 8008 exchanges per second EdDSA(sign) : 14292 signatures per second EdDSA(check) : 5964 checks per second

Compiled with -O2

Chacha20 : 372 megabytes per second Poly1305 : 1089 megabytes per second Auth'd encryption: 277 megabytes per second Blake2b : 579 megabytes per second Sha512 : 240 megabytes per second Argon2i, 3 passes: 368 megabytes per second x25519 : 7642 exchanges per second EdDSA(sign) : 12249 signatures per second EdDSA(check) : 5616 checks per second

Compiled with -Os

Chacha20 : 317 megabytes per second Poly1305 : 915 megabytes per second Auth'd encryption: 235 megabytes per second Blake2b : 462 megabytes per second Sha512 : 245 megabytes per second Argon2i, 3 passes: 337 megabytes per second x25519 : 7589 exchanges per second EdDSA(sign) : 11648 signatures per second EdDSA(check) : 5528 checks per second

Libsodium 1.0.18 (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)

Compiled with default options:

$ ./configure $ make && make check $ sudo make install Chacha20 : 1900 megabytes per second Poly1305 : 2337 megabytes per second Auth'd encryption: 1048 megabytes per second Blake2b : 870 megabytes per second Sha512 : 296 megabytes per second Argon2i, 3 passes: 701 megabytes per second x25519 : 20688 exchanges per second EdDSA(sign) : 32899 signatures per second EdDSA(check) : 13208 checks per second

TweetNaCl (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)

Compiled with -O3 -march=native

Salsa20 : 202 megabytes per second Poly1305 : 69 megabytes per second Auth'd encryption: 51 megabytes per second Sha512 : 40 megabytes per second x25519 : 1797 exchanges per second EdDSA(sign) : 648 signatures per second EdDSA(check) : 325 checks per second

Libhydrogen (core i5 Skylake, Ubuntu 18.04, gcc7.4.0)

No packaged release as of 2019/10. Used git commit f1f061d 2019-10-02.

Compiled with -O3 -march=native (the default is -Os -march=native ).

Random : 200 megabytes per second Auth'd encryption: 94 megabytes per second Hash : 162 megabytes per second sign : 9233 signatures per second check : 5513 checks per second

Monocypher 2.0.0 (Raspberry-Pi, model 3B)

(Note: EdDSA performance roughly doubled between 2.0.0 and 2.0.6.)

Compiled with -O3 -march=native

Chacha20 : 63 megabytes per second Poly1305 : 67 megabytes per second Auth'd encryption: 32 megabytes per second Blake2b : 26 megabytes per second SHA-512 : 13 megabytes per second Argon2i, 3 passes: 19 megabytes per second x25519 : 679 exchanges per second EdDSA(sign) : 1311 signatures per second EdDSA(check) : 514 checks per second

Libsodium 1.0.16. (Raspberry-Pi, model 3B)

Compiled with default flags.

Chacha20 : 72 megabytes per second Poly1305 : 166 megabytes per second Auth'd encryption: 50 megabytes per second Blake2b : 26 megabytes per second SHA-512 : 11 megabytes per second Argon2i, 3 passes: 19 megabytes per second x25519 : 686 exchanges per second EdDSA(sign) : 1702 signatures per second EdDSA(check) : 618 checks per second

TweetNaCl (Raspberry-Pi, model 3B )

Compiled with -O3 march=native