Good News: AV1 Encoding Times Drop to Near-Reasonable Levels

When I first tested AV1 encoding back in August 2018 encoding times were glacial and seriously detracted from the potential usability of the codec. Table 1 from that story tells the tale. Unless otherwise indicated, all encoding times are on my HP ZBook notebook powered by a single 2.8 GHz Intel Xeon E3-1505M v5 CPU. In addition, LibVPx is the implementation of VP9 in FFmpeg, and all references to AV1 refer to the AV1 codec available in FFmpeg.

Table 1. Encoding times for the first release of AV1

Starting in late 2018, I wrote that researchers had reported AV1 encoding times as low as 10x LibVPx encoding times. So, when I started a recent codec evaluation project, I was eager to see if I could match those times. I just finished that project, and Table 2 shows where things stand now. Since I know you’re wondering, the VMAF quality for the clip compressed last year was 96.18; the quality for the clip referenced in Table 2 is 95.55. Since it takes 6 VMF points to make a Just Noticeable Difference, even the sharpest-eyed viewer wouldn’t notice this .63 differential.

Table 2. Current optimized encoding times for AV1

Based upon the old encoding times from the August 2018 review for the other codecs, AV1 was down to about 3x the encoding times of x265 and LibVPx. As you’ll read below, things aren’t exactly apples to apples, and I’m not exactly being fair to the other codecs, but all that notwithstanding it’s a pretty impressive speedup, wouldn’t you agree?

If you’re in a TL/DR kind of mood, you can jump down to Table 6 and see a comparison that is fair to the other codecs. If you want to take the time to learn how I got there, let’s break down the components.

Encoder Speed Improvements

The command string used back in our initial tests was this:

ffmpeg -y -i input.mp4 -c:v libaom-av1 -strict -2 -b:v 3000K -maxrate 6000K -cpu-used 8 -pass 1 -f matroska NUL & \

ffmpeg -i input.mp4 -c:v libaom-av1 -strict -2 -b:v 3000K -maxrate 6000K -cpu-used 0 -pass 2 output_AV1.mkv

If you use this same exact string with the current version of FFmpeg (I tested version N-93083-g8522d219ce), the encoding time drops from 226,080 seconds (45K times real-time) to 18,196 seconds, or about 3,639 times real-time, a speedup of about 12x. Still about 63 times slower than x265 and 80 times slower than LibVPx, but a huge improvement that takes us to the results shown in Table 3. The VMAF score for the AV1 file created in Table 3 was 95.91 so there was a very small and insubstantial quality drop from last year’s 96.18.

Table 3. Using original command line with current code (AOM’s improvements)

Table 3 shows apples-to-apples performance with our initial tests. All other encoding time decreases relate to changes to the encoding string.

Finding AV1’s Optimal Speed/Quality Tradeoff

Let’s get practical. Most codecs have presets that lets you trade off encoding time for quality. For example, with x264 and x265, the presets have names like slow, very slow, fast, very fast, and placebo. With AV1, the presets are controlled via the cpu-usedswitch, and you can see in the batch above that I used cpu-used8in pass 1 and cpu-used0in pass 2.

If you load the AV1 help notes in FFmpeg (ffmpeg -h encoder=libaom-av1), you’ll see the following:

-cpu-used <int> Quality/Speed ratio modifier (from 0 to 8) (default 1)

With LibVPx and AV1, first-pass quality doesn’t impact the second pass, so you typically run the first pass at the fastest/lowest quality setting. At Google’s direction for the August First Look, I ran the second pass at the highest possible quality, which was cpu-used 0. Encoding times were so slow that I didn’t take time to experiment with these settings as I’ve done before for x264, x265, and LibVPx.

Figure 1 shows the graph I typically create for each codec/preset/encoder before I start serious testing or production encoding. The red line tracks available quality while the blue line tracks encoding time. At cpu-used5, for example, encoding time is 6.63% of the maximum (00:20:06 compared to 5:03:16) while quality is 99.64% of the maximum (95.56 VMAF compared to 95.91).

Figure 1. AV1’s quality/speed curve

If you’re a researcher trying to measure the absolute best quality available from a particular codec, you ignore the graph and encode at cpu-used 0. If you’re a video producer, you probably encode at cpu-used 5, since lower settings deliver minimum time savings and higher settings deliver minimal quality improvements. Of course, based upon the numbers shown in Figure 1, no one would call you crazy if you opted for cpu-used 8. Assuming that you encoded with cpu-used 5, Table 4 shows how encoding times compare.

Table 4. Current version of FFmpeg, cpu-used 5

Can a single encode of a five-second clip accurately predict the quality/speed curve for a broader range of clips encoded at multiple data rates? In the project I recently completed, I used the same approach on a different codec and the curve based upon a single encode of a five-second test clip predicted a quality differential of 1.3% between the preset used and maximum quality (and the preset cut encoding time from 18 minutes to 3 minutes). I later measured the actual differential between the preset used and maximum quality over a five-rung ladder and six test clips and the actual differential was 1.4%. So, while more data is always better, a single encode should be a reasonably accurate predictor.

Please enable JavaScript to view the comments powered by Disqus.

Related Articles