Encoding for YouTube: How to Get the Best Results

[Editor's Note, 7/20/12: After we published Jan's article, he received a note from Colleen Henry, Video Hacker, Google Video Infrastructure, with further recommendations for encoding for YouTube. We've added that information as an addendum at the end of the article.]

If we didn't live with YouTube's scale on a daily basis, and somehow seemingly take it for granted, it would almost be too big to imagine. According to one source, in February, 2012, YouTube accounted for 75.9% of all visits to video-related websites, with AOL Video and Hulu tied for second with 2.8%. According to YouTube's FAQ, over 800 million unique viewers visit YouTube each month watching over 3 billion hours of video. In 2011, YouTube had more than 1 trilllion views, or around 140 views for every person on Earth. These are U.S. federal deficit types of numbers.

When YouTube started, poor video quality and a young audience caused many organizations to ask, "Why bother?" Today, many Fortune 100 companies have their own corporate YouTube channels, as does the AARP. Many companies, including Deloitte and IBM, use YouTube to host the videos posted on their own corporate websites. At this point, if you're not posting your videos on YouTube, it's more appropriate to ask, "Why not?"

If confusion about the best way to post videos on YouTube was part of the reason, you're reading the right article. Seasoned YouTube posters might also find value in this article, where I synthesize YouTube's encoding recommendations, advice from several compression experts, a look at presets from common encoders, and the results of my own tests, to provide specific advice about how to configure your video for optimal quality and the most efficient upload time.

Let's start by chatting with our experts.

What the Experts Say

Telestream Episode product manager Kevin Louden started by noting that YouTube re-encodes all video uploaded to the site, which has several implications. First, don't try to match YouTube's output so it won't re-encode your video; it always re-encodes. Second, your job is to provide the highest possible quality file to YouTube, since, as we all know, video is a garbage-in/garbage out medium. The higher the input quality, the higher the output quality.

Louden also noted that because it's all about quality, you don't have to worry about playability, which frees you to create a file that might not play smoothly on many computers. This relates both to data rate and encoding parameters like IDR frames and adaptive B-frames. For example, the rule of thumb when encoding for playback is that each key frame should be an IDR frame. However, IDR key frames are larger than normal key frames, which is less efficient compression-wise. When encoding for upload to YouTube, playability isn't an issue, so Louden recommends an IDR frame every third keyframe.

Ditto for adaptive B-frames, which improve quality, but some producers avoid because they can cause playback issues on some devices. Since YouTube's encoding engines can handle adaptive B-frames, Louden recommends using them when encoding for upload.

I also spoke with Randon Morford and Coby Rich from Sorenson Media. Randon is general manager of the Squeeze Desktop product, while Coby is an encoding-savvy director of marketing who created some of Squeeze's YouTube presets. Morford and Rich pointed out that while quality is important, there's almost always a tension between quality and upload time. This is particularly true among many small business video producers who might be uploading at under 1 mbps.

When creating their YouTube presets, Sorenson tested at multiple data rates and ran-quality measuring PSNR and SSIM tests to find the data rate that provided the optimal blend of quality and upload time. For 720p files, for example, their tests showed minimal quality improvements beyond 5 mbps. Doubling the data rate would obviously double upload time, and increase the risk of some type of upload failure, but would not appreciably improve quality. The magic number for 1080p was 6 mbps, though Rich recommending bumping that to 8 mbps if the video content contained high motion or was otherwise challenging from a compression perspective.

With these recommendations as perspective, let's have a look at YouTube's own encoding recommendations.

The Rules According to YouTube

Several years ago, when YouTube first came to prominence, the company seemed to enjoy working as an opaque black box, providing little direction regarding the best way to prepare your file for upload. When I started researching this article, I was pleasantly surprised to learn that YouTube now provides very specific encoding parameters.

Note that these are true recommendations in the sense that YouTube won't reject the files if you don't follow them. Rather, YouTube will attempt to encode pretty much any file you throw at it; you'll just get the most predictable results by following the recommendations.

At a high level, YouTube provides recommendations for two classes of users; standard quality and those with "professional quality content" and "enterprise quality internet connections." After covering the basics, I'll share these specific resolution and data rate recommendations with you.

Support file formats: although YouTube accepts multiple formats including .MOV, .AVI, .WMV, and .FLV, the advanced specifications page recommends H.264/AAC in an MP4 container format. YouTube recommends putting the moov atom at the start of the file, which for many encoders means activating the Fast Start option.

Audio recommendations: YouTube recommends uploading stereo or 5.1 audio in either 48/96khz. As you'll see, many of the presets I reviewed use 44.1khz, which I recommend changing to 48.

H.264 parameters: YouTube recommends using the High profile with CABAC entropy coding, with variable bitrate encoding. Other recommendations include a B-frame interval of 2, which is curious given that YouTube produces their H.264 video with no B-frames. Recommendations also include a closed GOP with a GOP size of half the frame rate, which means two key frames a second, an interval that none of the encoders matched.

Frame rate: Don't change the frame rate for uploading; if you shoot at 24p you should upload at 24p.

Frame composition: If you're working with interlaced source content, YouTube recommends deinterlacing before uploading. While you can upload at resolutions up to 4K, you shouldn't burn letterboxing or pillarboxing into your video; produce native 16:9 video at a 16:9 resolution, native 4:3 video at a native 4:3 resolution. YouTube also recommends against uploading files with a pixel aspect ratio other than 1.0, otherwise known as square pixel output.

Again, YouTube identifies two classes of users, standard and high quality. Here are the suggested data rates for varying video resolutions:

Table 1: Standard quality recommendations.

High Quality Uploads

Table 2: High quality recommendations.

What YouTube Delivers

We've focused on the input side so far. Let's shift to the output side to help focus our later discussions. Briefly, we've all seen the different quality files produced by YouTube, which are selectable via a control on the bottom right of the player. As you would suspect, the number and size of files relates to the size of the file that you upload. This is shown in Table 3, where the columns are the resolutions of the files that I uploaded during my tests, and the rows are the files produced by YouTube from these files.

Table 3. Files produced by YouTube.

On the extreme left, a 1080p upload triggers the creation of six files, which drops to three if you upload a 640x360 file. Table 4 shows the details of the files produced by YouTube, which actually varied very little depending upon the input. That is, the 240p file produced from the 1080p input was configured identically to the file created from the 640x360 input. All files except for the mobile file had a pixel aspect ratio of 1.0; the mobile had a pixel aspect ratio of 1.2.

Table 4. File details.

I derived these configuration details from file analysis tool MediaInfo, and from Inlet (now Cisco) Semaphore, which unfortunately, could only load a portion of the files. As mentioned, none of the H.264 files that I could test used B-frames, while all files that I could test had a key frame interval of 60 frames. All my test files were 29.97 fps, so that meant a key frame interval of every 2 seconds.

With this as background, let's review the presets available from several popular encoding programs to see how they conformed to YouTube's recommendations and whether they make sense given the outputs YouTube is producing from the uploaded files. In the interest of time, relevance and my editor's lack of patience with tables, I'm going to focus primarily on 1080p and 720p file creation, with a smattering of attention at DV resolutions.

1080p

I do most of my personal and corporate shooting at 720p when possible, and until very recently, advised most clients to ignore 1080p uploads. However, between the increase of connected television sets in the living room and the very high resolution iPad 3 (I know, I know), 1080p is starting to make much more sense, particularly when it's free. In our conversations, Telestream's Louden shared that Episode users were increasingly focused on the higher resolution as well.

Speaking of this, Episode was the only encoder that actually addressed both the professional and standard video quality groups with two sets of presets that matched most of YouTube's recommendations. For relevance, I'll focus solely on the Standard Quality Recommendations.

Table 5 contains the most prominent configuration parameters from the 1080p preset from the listed programs. As you can see, there's a great deal of non-conformance with the YouTube recommendations, including key frame interval and audio frequency. At the target data rate, I tend to think that key frame interval is not particularly relevant, but I would recommend changing all presets to 48 Khz as YouTube recommends.

Table 5. Summary of encoding parameters from 1080p YouTube preset.

The biggest head scratcher relates to data rate, with Apple Compressor at 20 mbps and the other three much, much lower. As we can see in Table 4, YouTube produces the 1080p files at 5.8 mbps; surely you would see some quality difference between the 20 mbps file produced by Compressor and files produced by the other encoders at less than half that data rate.

To test this, I rendered my 3 minute HD test file from Compressor, Adobe Media Encoder, and Squeeze and uploaded the files to YouTube. For comparison purposes, the file produced by Compressor was 458 MB, while the file produced by Adobe Media Encoder was 197 MB, and the Squeeze file 152 MB. I didn't track upload times, but with my pathetic 800 kbps DSL upload speed, I'm sure the difference was very significant. Then I downloaded the files created by YouTube using the FireFox plug-in Download Helper, imported them into Adobe Premiere Pro and compared their quality.

By way of background, the file is comprised of PBS footage provided by SimmonsArt, Inc and stock footage provided by Artbeats. In short, it's broadcast quality, and is comprised of a range of scenes, from talking head to fast motion. Remarkably, with the exception of one clip with extreme high motion and lots of fine detail, the YouTube output was virtually identical. You can see frame comparisons in Figure 1, and the differences are exceptionally minor.

Figure 1. I saw some minor quality difference in the text in this clip, but that was it.

Again, video compression is a garbage in/garbage out medium, and more is always better when it comes to data rate. However, if upload time is a concern, as it is with me, I wouldn't go higher than 8 mbps unless the footage was extremely difficult to encode.

Not that it's significant, but only the Adobe Media Encoder produced the file using the two recommended B-frames. Compressor used 1 B-frame, which you can't change, while the x.264 preset used by both Sorenson and Telestream also produced an interval of 1, though you can change this in these programs. I tend to think that B-frames are more overrated than Johnny Depp, which is saying something, and the fact that YouTube produces their H.264 files with no B-frames tends to confirm this view. Long story short, while I would use two B-frames to honor YouTube's recommendations, I wouldn't change encoders if my tool didn't give me this level of control.

Going forward, when producing a 1080p file to upload to YouTube, I will boost the audio data rate to at least 256 kbps. While I'm guessing that you couldn't hear the difference between 128 kbps or 256 kbps input, producing 192 kbps audio from 128 kbps audio just doesn't feel right.

720p

Table 6 summarizes the 720p presets from the listed encoding tools, with most of the same concerns. Again, I would conform audio to 48 khz with more alacrity than I would drop by key frame setting to 2 per second, and would boost audio to at least 256 kbps.

Table 6. Summary of encoding parameters from 720p YouTube presets.

I ran the same quality related tests with video uploaded from Compressor, Adobe Media Encoder and Squeeze with the same results; the 5 mbps files looked identical to Apple's 10 mbps files. The bottom line is while you're free to encode at any data rate, you're unlikely to see any benefit from rates beyond 5 mbps unless you're working with exceptionally hard to encode video.

SD Resolutions and Thereabouts

If you're working with HD video, it's hard to imagine a usage case where you'd want to upload a file smaller than 720p; see above for recommendations there. If you're uploading DV or similar footage, I would heed YouTube's recommendations and convert to a square pixel resolution before uploading. I tested by uploading both 4:3 and 16:9 DV files, and the aspect ratios were slightly off in both. Nothing tragic, but definitely discernible to the trained eye.

Specifically, YouTube encoded the 4:3 DV file to a resolution of 654x480, rather than 640x480, and produced the 16:9 DV file to 854x470 rather than 854x480. Rather than uploading at DV resolutions and aspect ratios, I would convert 4:3 DV footage to a progressive 640x480 file, and the 16:9 DV file to 854x480. Three of the vendors, Adobe, Sorenson and Telestream, provided 640x480 presets, and the data rate ranged from 2 - 2.5 mbps, which sounds about right. The 854x480 file has more pixels, so I would boost this to no more than 3 mbps, with 128 kbps audio good for both resolutions.

So, that's it. Hopefully, you know a lot more about encoding for YouTube upload than you did at the start, with links to get more information should you need it.

A Word from Google

Well, that's almost it. I received a note after this article posted from Colleen Henry, Video Hacker, Google Video Infrastructure, with some great suggestions to help you get the best video quality on YouTube:

It's important to think of the files you upload to YouTube as golden masters, as they will be used as source material to generate video streams for years to come. Simply put, the better the quality of the file you upload to YouTube today, the better quality the viewer's experience will be throughout your video's life on YouTube.

As displays increase in size, compression techniques become more efficient, playback devices become more sophisticated, and internet connections improve, so will the quality YouTube will be able to provide to the viewers of your videos. This means, while you may reach a limit on perceived benefits from higher bitrates or more efficient encoding if you were to test it today, that does not mean you should stop there. You will see a huge benefit over the lifetime of your video being available on YouTube, as internet speeds, hardware, and software evolve. Upload the best quality video that you can create and squeeze through your internet connection!

Bonus tips:

Many encoders can spend more CPU time to create a much more efficient file. If you have a powerful computer, but a slow internet connection, look into using more complex and efficient encoding to save upload time.

You can noticeably improve the quality of your video on YouTube by using a sophisticated, scene aware, denoising filter prior to uploading.

Keyframe interval doesn't really matter much at this moment in time, but please keep it under 5 for VOD.

The sample rate of your audio should match your source's sample rate in which it was produced.

If you make sure to use a streaming format, like an mkv, .mp4 or a .mov, with the metadata at the front of the container we will begin processing your video WHILE you are uploading it, drastically reducing overall turnaround time. This will make things MUCH faster, with no negative side effect. You can add the metadata atom to the front of your file with something qtfaststart, or select it when you are creating the file in Squeeze, Episode, etc.

It is ideal to use constant quality encoding. This will let you create a high quality variable bitrate file, at the speed of a single pass. It will maintain a consistent target quality throughout the file, rather than trying to allocate bits to hit an arbitrary bitrate, which can easily under-shoot or over-shoot, and with two pass, take extra long to create.

You can put uncompressed PCM audio in an .mov or .mkv container and deliver it to us if you like. However, make sure not to create multiple discrete mono streams when you do it.

Please enable JavaScript to view the comments powered by Disqus.

Related Articles

Companies and Suppliers Mentioned