Three months after dav1d 0.1.0 “Gazelle” was released version 0.2.0 just got tagged. Under code name “Antelope” huge improvements were made to the AV1 decoder for older PC’s and mobile devices, on 8-bit content. By hand-writing SSSE3 and NEON assembly code, most of the C functions were sped up by factors ranging anywhere from 2 to 20, resulting in hugely higher frame rates.

This blog will provide an overview of dav1d’s performance, compared to both the 0.1.0 release and the AV1 reference decoder, aomdec.

PC: SSSE3 for older x86 CPUs

Where dav1d 0.1.0 was all about AVX2 performance, an extended instruction set used by newer processors (Intel Haswell / AMD Zen and newer), 0.2.0 focuses on speeding up SSSE3 performance for older and lower-end processors. According to the Steam Hardware Survey (Feb. 2019) 97,23% of their user base supports SSSE3, while only about two-thirds supports AVX2.

Since different videos use different functions of the AV1 codec in different proportions, some saw larger increases than others. Below are the results for three 1080p videos, comparing dav1d at 0.1.0 release to the current head.

With both single-threaded and multi-threaded the improvements are huge, averaging around 2,25x for ST and 2,5x for MT. Looking at raw frame rates, this means on almost any device with SSSE3 1080p at 30fps is playable without a hitch, while quad-core high frequency processors should also be able to handle up to 1440p at 60fps and 2160p at 30fps.

The following results were reached on a Intel Core i5-4590 (Haswell, 4c/4t, 3,5 GHz) using only SSSE3 instructions:

If we normalize the values we can closer examine the gains, averaging around 2,23x:

On average, dav1d 0.2.0 is 2,23x faster on 8-bit content than 0.1.0.

x86 performance compared to aomdec

The target to beat for dav1d is aomdec, the AV1 reference decoder. At the release of dav1d 0.1.0 the performance on AVX2 CPUs was already spectacular, but older and lower-end processors that didn’t support it where at the time better performing with aomdec. With the release of dav1d 0.2.0, that changes.

All the numbers below are for 8-bit color depths with 4:2:0 chroma subsampling. For multi-thread aomdec used 4 threads, while dav1d used 8 framethreads and 4 tilethreads. Both give optimal performance on a quad-core CPU.

Comparing SSSE3 performance, with a single-thread dav1d and aomdec perform about the same. Multi-threaded dav1d is 2,5 to 3 times faster.