AV1 encoding is slow. At least, that was the status quo. But with both rav1e (backed by Xiph, Mozilla and Vimeo) and SVT-AV1 (backed by Netflix and Intel) in heavy development, this notion is changing fast.

Today, I’m going to demonstrate that SVT-AV1 can be both faster while delivering higher quality at identical bitrate simultaneously. Together with fast AV1 decoding with dav1d this makes the AV1 codec ready for broad adoption.

Part 1: Quality

The whole promise of AV1, aside from being free and open-source, was delivering higher quality at the same bitrate. The rav1e developers have build a great tool for comparing video quality, called AreWeCompressedYet?. I submitted runs for the libvpx, x265 and SVT-AV1 encoders, with the first two being the status quo for high quality encoding.

On the X-axis the bitrate is displayed in bits per pixel. So 0.02 means on average 0.02 bits are spend per pixel and 0.1 means 0.1 bits per pixel. For example, for 1080p 30fps that would result in 1,25 Mb/s and 6,25 Mb/s respectively.

For x265 and libvpx their highest quality modes where used, veryslow and cpu-used 0 respectively (x265 placebo resulted in worse quality on this test set). SVT-AV1 uses Enc-mode 4 and 6 in the graphs below. SVT-AV1 has both faster and slower modes, ranging all the way from mode 8 to 0.

First are PSNR and MS SSIM, two objective metrics that both calculate the mathematical error between the input and output video stream. These values are displayed on the Y-axis, higher value means higher quality.

Both SVT-AV1 modes 6 (green) and mode 4 (yellow) provide better objective quality than x265 (red) and libvpx (blue), on both the PSNR and MS-SSIM metrics.

Then we have a subjective metric, which should better represent how the user experiences video quality. VMAF was developed by Netflix to better assess quality perceived by it’s users.

SVT-AV1’s subjective quality is a little worse than it’s objective quality. Enc-mode 4 still trumps libvpx and x265, but Enc-mode 6 is a little worse.

Part 2: Speed

To compare speed I spun up some Google Cloud instances to get a fair comparison. Both instances use 16 vCPUs (8 cores, 16 threads) on the Cascade Lake platform with 64 GB DDR4 (SVT-AV1 ran fine on 16 GB btw). This setup should be comparable to a high-end desktop PC with Ryzen 7 3700X or Core i9-9900K.

All encoders were compiled with GCC 8.3.0 with Release configuration. I benchmarked two files for two scenario’s: 1250 frames of a 1080p 8-bit 4:2:0 clip representing regular 1080p content, and 250 frames of a 2160p 10-bit 4:2:0 representing high-end HDR movie content. Each encoder was run twice and the fastest run was used. Below the results in frames per second:

As we can see, SVT-AV1 enc-modes 5, 6 and 7 are clearly faster than both libvpx and x265. Also, libaom is even at a fast preset (cpu-used=5) very slow.

When normalized, the differences become even clearer.

On 8-bit concent (Sintel), enc-mode 4 is 32% faster than libvpx and 4% slower than x265. enc-mode 7 . On 10-bit (Foodmarket) results are even larger.

Part 3: Diving in deep

So, we now know globally that SVT-AV1 can be simultaneously faster and delivering higher quality at the same bitrate than both libvpx (VP9) and x265 (H.265). In this section I compare different SVT-AV1 encoder modes a little more in depth. We will mainly be looking at MS-SSIM for objective quality and VMAF for subjective quality.

Enc-mode 7

In mode 7 SVT-AV1 still needs 3,6% to 9,2% more bits to reach similar MS SSIM (objective) quality as libvpx. For similar VMAF (subjective) quality this is even higher at 9,5% to 23,4%, depending on resolution.

Compared to x265 the results are a little better, it uses 1,6% more to 10,9% less bits to reach similar MS-SSIM quality, and 10,4% more to 1,4% less for VMAF.

Meanwhile, it’s 6,42 times faster than libvpx and 4,68 faster than x265.