Ogg and the multimedia container format struggle

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Video codecs attract most of the attention in the multimedia format wars, from Theora adoption in HTML5 to debates about the subjective quality and objective technical demands of Dirac versus H.264. But the oft-overlooked container format is just as important; it adds overhead, it determines seekability, subtitle support, and other important features, and it can introduce patent-licensing issues for open source projects. Xiph.org's Ogg container format is the most well-known in open source, though as recent events show it has its critics and its competition.

Ogg has been under development since the beginnings of Xiph.org in 1994, and was originally designed for use with the Vorbis audio codec. As the Xiph project undertook additional codecs, Ogg continued to evolve to support them. FFmpeg developer Mans Rullgard posted a lengthy criticism of Ogg on his personal blog in March, accusing the format of falling short in six areas: poor generality, excessive overhead, high end-to-end latency, lack of random-access seeking, ill-defined timestamps, and unnecessary complexity. Rullgard cites examples from the specification and several "typical usage" numbers to support his claims.

Ogg debates

As the blog post was picked up, debate about the merits of the complaints quickly erupted in the comments and on web discussion forums. Several commenters chided Rullgard for claiming that Ogg's latency and seek times were unsuitably "bad" without citing any numbers from other formats for comparison, and for overstating the size of the problem (such as the overhead created by Ogg's headers at close to 1%) or its relative importance.

On some points, there was more of a genuine disagreement on principle, however. Rullgard said that Ogg wastes space by using a full byte for the "version" field, where a single flag bit would suffice. Xiph's Greg Maxwell explained in the Reddit discussion of the article that a byte is used for the field in order to keep the header structure byte-aligned for simplicity. Maxwell also disagreed with Rullgard's assertion that Ogg's 32-bit checksum was a waste of space, noting that Ogg also uses a 32-bit capture pattern at the beginning of each page, as opposed to the 64-bit capture pattern in FFmpeg's NUT format — thus using the same number of bits, but providing error-detection "for free."

Rullgard also argues that Ogg's ability to concatenate different streams one-after-another creates undue complexity for the decoder, without providing any practical benefit. But one blog commenter claimed to take advantage of this feature when ripping CDs with seamless track transitions.

Xiph's Christopher "Monty" Montgomery replied at length in the Slashdot discussion of the critique, admitting that Ogg has its flaws, and conceding that several design decisions made years ago would be made differently today, but attributing more of Rullgard's complaints to long-standing bad blood between the projects. Moreover, he said, even with its flaws, Ogg remains the best free option for important cases like streaming video. Neither the popular MP4 nor Matroska container formats are well-suited for streaming (particularly live streaming), and MP4 is also patent-encumbered. Additionally, he said, making changes to the Ogg format as suggested by Rullgard might improve performance at high-bitrate video, but would be detrimental to low-bitrate and audio payloads where Ogg excels.

Further work

Montgomery said that after the Rullgard blog post gained attention, Xiph decided that part of the problem with its reception was poor documentation on the Ogg format. He subsequently began rewriting and expanding the documentation, some of which is already available online. There are changes that Xiph would like to make, he added, as well as ongoing work in the metadata layer. "One of the legitimately weird things about Ogg is that we knew metadata was going to be a source of constant flux, so we moved as much as we possibly could out of the container itself. The Ogg container only does framing and delivery. [ ...] Most folks are used to this being part of the container, and so consider it 'part of Ogg' which it isn't really."

The Ogg Skeleton project is the primary focus in this area. Skeleton is essentially a "metadata track" that can hold information like MIME-types, protocol messages, and timestamps to allow the decoder to easily seek within the media. A Skeleton track could then be multiplexed or interleaved within an Ogg container file, alongside video and audio tracks.

Skeleton's timestamping capabilities are documented at the Ogg Index page, and are introduced in Skeleton 3.3. A sample indexer called OggIndex is available, and both the ffmpeg2theora converter and development builds of Firefox support it.

Montgomery concludes his Slashdot comments by noting that breaking compatibility with the existing hardware and software Ogg decoders (most of which see only Vorbis and Theora content) is probably not going to happen until the next major new codec release from Xiph.org.

The competition

Regardless of whether one finds any or all of Rullgard's criticisms valid, there are other container format options out there for content providers. The most popular on the Internet writ large is the .MP4 file, which is properly known as MPEG-4 Part 14, and was approved by ISO as ISO/IEC 14496-14:2003. A part of the larger MPEG-4 specification, MPEG-4 Part 14 is a revision of two earlier standards, MPEG-4 Part 1 and MPEG-4 Part 12. Part 14 is based on the QuickTime container format created by Apple and recognized by the .MOV file extension. It can hold content in any codec (including free codecs like Vorbis and Theora), although there is a "registered" codec list maintained by the MPEG.

There is a degree of uncertainty regarding the ability to write MPEG-4 Part 14 decoders, however. The rest of the MPEG-4 specification, like all MPEG standards, is patented, and implementations must adhere to the license terms made available by the MPEG-LA licensing authority. Part 14 was once available as part of the MPEG-4 Systems patent pool, which has subsequently been withdrawn. Many people on mailing lists and discussion forums assume that the format is free to implement since it is not explicitly mentioned in the remaining MPEG-4 patent pools: "MPEG-4 Visual" and "AVC/H.264," but this is not officially stated. MPEG-LA makes it difficult to find specific information about specific patents in its technologies, preferring instead to steer all customers into the "patent pool" products instead. The ISO specification, which should document specific patent claims, is only available to paying customers. When asked, MPEG-LA representatives said that they did not know the specific status of Part 14 in the current patent pools.

The Matroska format, like Ogg, was created to serve as an open, patent-unencumbered container. The two formats do differ in emphasis, a fact that both projects readily acknowledge. Whereas Ogg was designed alongside Vorbis with streaming audio as its primary use case, Matroska was designed to support as many codecs as possible. Xiph.org says that Matroska has better support for seeking, editing files, and using menus and chapter markers, while Matroska says that Ogg is superior at streaming media delivery (for example, Matroska only recently added support for interleaving frames from different tracks).

Matroska was launched in 2002 as a derivative of the older Multimedia Container Format (MCF). The copyright on the specification and the trademark of the name Matroska are both held by CoreCodec, Inc., but the specification is available free-of-charge. A reference library is available for download under the LGPL, and a "core parser" is offered upon request under the BSD license. The format is generally seen with the .MKV file extension for video content, although .MKA for audio is also valid.

The NUT file format Maxwell mentioned on Reddit was created by developers on the FFmpeg and MPlayer teams, but appears to be supported only in that project. The NUT project site is sparse, with a broken link to the actual specification, but there is a mailing list that indicates that development is still underway, albeit slowly. Montgomery described it as very Ogg-like in its design, incorporating some design choices that would improve Ogg, such as a simpler way of encoding the packet-length in each header (which was one of Rullgard's complaints).

Container formats are far less exciting than multimedia codecs, but the choice of containers has a very real impact on what a content provider can do. Quickly and accurately seeking within a file — while important — is just one example; another active topic right now is support for subtitle tracks. As multimedia content on the Internet grows, having subtitles accessible in their own track (or tracks, with multiple languages supported) has implications for accessibility, internationalization, and subtitle-based searching. For the record, Ogg, MP4, Matroska, and NUT all support subtitles.

As usual, the right choice depends on the usage. To some, non-free formats like MP4 ought to be avoided at all costs, even if MPEG-LA is not likely to request licensing fees. If streaming, audio-only, or low-bitrate performance are important, Ogg remains the simplest and probably best option. For seeking, video editing, menu/chapter support, or combining a wide array of codecs, Matroska offers functionality Ogg cannot. Moving forward, the relative weight of those factors may shift as either the codecs or the container formats evolve — but until then, choice is good.