The new Zstandard 1.0 compression algorithm, recently open sourced by Facebook, is one of the few compression algorithms that is both faster and more efficient than zlib, the current “reigning standard”, write Facebook engineers Yann Collet and Chip Turner. Facebook Zstandard leverages previous work done by Collet, also author of LZ4, who originally released an initial version of his new algorithm in 2015.

According to Facebook benchmarks, Zstandard outperforms zlib for any combination of compression ratio and bandwidth.

In particular, Zstandard showed outstanding performance against zlib when using the standard lossless compression Silesia corpus:

it was ~3–5x faster at the same compression rate

it produced 10–15% smaller files at the same compression speed

it decompressed 2x faster regardless of compression ratio

it scaled to much higher compression ratio (~4x vs. ~3.15).

Zstandard uses Finite State Entropy, based on Jarek Duda’s work on Asymmetric Numeral Systems (ANS) for entropy coding. ANS aims to “end the trade-off between speed and rate” and can be used both for precise coding and very fast encoding, with support for data encryption. But, at the root of Zstandard better performance are a number of other design and implementation choices:

while zlib is limited to a 32KB window, Zstandard leverages the much greater availability of memory in modern environments, including mobile and embedded environments, and does not impose any inherent limit

a new Huffman decoder, Huff0, is used to decode symbols in parallel thanks to multiple ALUs by reducing the data dependencies between arithmetic operations

Zstandard attempts to be as branchless as possible, thus minimizing the highly expensive pipeline flushes due to incorrect branch predictions. For example, this is how a while loop can be rewritten without using branches: /* classic version */ while (nbBitsUsed >= 8) { /* each while test is a branch */ accumulator <<= 8; accumulator += *byte++; nbBitsUsed -= 8; } /* branch-less version */ nbBytesUsed = nbBitsUsed >> 3; nbBitsUsed &= 7; ptr += nbBytesUsed; accumulator = read64(ptr);

repcode modeling highly improves the compression of sequences that only differ by a few bytes

Zstandard is both a command line tool and a library, both written in C. It provides more than 20 levels of compression that allow to carefully fine-tune its use for the concrete available hardware, data to compress, and bottlenecks to optimize. Facebook recommends starting out with the default level 3, which is suitable for most cases, and then trying with higher levels up to level 9 to ensure a reasonable trade-off of speed versus space, or higher for better compression ratios, saving levels 20+ for those cases where you do not care about compression speed.

Collet and Turner also provided some hints at what future versions of Zstandard will bring, including support for multi-threading, and new compression levels allowing for faster compressions as well as higher ratios.

Zstandard follows on Apple’s ZLFSE and Google’s Brotli, both open source, each trying to optimize for a specific use case: Brotli seems to be tuned to ensure high compression rates for Web assets and Android APKs, while LZFSE aims to be faster than Zlib at the same compression ratio but with lower power consumption.