LPC: The past, present, and future of Linux audio

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

The history, status, and future of audio for Linux systems was the topic of two talks—coming at the theme from two different directions—at the Linux Plumbers Conference (LPC). Ardour and JACK developer Paul Davis looked at audio from mostly the professional audio perspective, while PulseAudio developer Lennart Poettering, unsurprisingly, discussed desktop audio. Davis's talk ranged over the full history of Linux audio and gave a look at where he'd like to see things go, while Poettering focused on the changes since last year's conference and "action items" for the coming year.

Davis: origins and futures

Davis started using Linux as the second employee at Amazon in 1994, and started working on audio and MIDI software for Linux in 1998. So, he has been working in Linux audio for more than ten years. His presentation was meant to provide a historical overview on why "audio on linux still sucks, even though I had my fingers in all the pies that make it suck". In addition, Davis believes there are lessons to be learned from the other two major desktop operating systems, Windows and Mac OS X, which may help in getting to better Linux audio.

He outlined what kind of audio support is needed for Linux, or, really, any operating system. Audio data should be able to be brought in or sent out of the system via any available audio interface as well as via the network. Audio data, as well as audio routing information, should be able to be shared between applications, and that routing should be able to changed on the fly based on user requests or hardware reconfiguration. There needs to be a "unified approach" to mixer controls, as well. Most important, perhaps, is that the system needs to be "easy to understand and to reason about".

Some history

Linux audio support began in the early 1990s with the Creative SoundBlaster driver, which became the foundation for the Open Sound System (OSS). By 1998, Davis said, there was growing dissatisfaction with the design of OSS, which led Jaroslav Kysela and others to begin work on the Advanced Linux Sound Architecture (ALSA).

Between 1999 and 2001, ALSA was redesigned several times, each time requiring audio applications to change because they would no longer compile. The ALSA sequencer, a kernel-space MIDI router, was also added during this time frame. By the end of 2001, ALSA was adopted as the official Linux audio system in favor instead of OSS. But, OSS didn't disappear and is still developed and used both on Linux and other UNIXes.

In the early parts of this decade, the Linux audio developer community started discussing techniques for connecting audio applications together, something that is not supported directly by ALSA. At roughly the same time, Davis started working on the Ardour digital audio workstation, which led to JACK. The audio handling engine from Ardour was turned into JACK, which is an "audio connection kit" that works on most operating systems. JACK is mostly concerned with the low-latency requirements of professional audio and music creation, rather than the needs of desktop users.

Since that time, the kernel has made strides in supporting realtime scheduling that can be used by JACK and others to provide skip-free audio performance, but much of that work is not available to users. Access to realtime scheduling is tightly controlled, so there is a significant amount of per-system configuration that must be done to access this functionality. Most distributions do not provide a means for regular users to enable realtime scheduling for audio applications, so most users are not benefiting from those changes.

In the mid-2000s, Poettering started work on the PulseAudio server, KDE stopped using the aRts sound server, GStreamer emerged as a means for intra-application audio streaming, and so on. Desktops wanted "simple" audio access APIs and created things like Phonon and libsydney, but meanwhile JACK was the only way to access Firewire audio. All of that led to great confusion for Linux audio users, Davis said.

Audio application models

At the bottom, audio hardware works in a very simple manner. For record (or capture), there is a circular buffer in memory to which the hardware writes, and from which the software reads. Playback is just the reverse. In both cases, user space can add buffering on top of the circular buffer used by the hardware, which is useful for some purposes, and not for others.

There are two separate models that can be used between the software and the hardware. In a "push" model, the application decides when to read or write data and how much, while the "pull" model reverses that, requiring the hardware to determine when and how much I/O needs to be done. Supporting a push model requires buffering in the system to smooth over arbitrary application behavior. The pull model requires an application that can meet deadlines imposed by the hardware.

Davis maintains that supporting push functionality on top of pull is easy, just by adding buffering and an API. But supporting pull on top of push is difficult and tends to perform poorly. So, audio support needs to be based on the pull model at the low levels, with a push-based API added in on top, he said.

Audio and video have much in common

OSS is based around the standard POSIX system calls, such as open() , read() , write() , mmap() , etc., while ALSA (which supports those same calls) is generally accessed through libasound , which has a "huge set of functions". Those functions provide ways to control hardware and software configuration along with a large number of commands to support various application styles.

In many ways, audio is like video, Davis said. Both generate a "human sensory experience" by rescanning a data buffer and "rendering" it to the output device. There are differences as well, mostly in refresh rates and the effect of missing refresh deadlines. Unlike audio, video data doesn't change that frequently when someone is just running a GUI—unless they are playing back a video. Missed video deadlines are often imperceptible, which is generally not true for audio.

So, Davis asked, does anyone seriously propose that video/graphics applications should talk to the hardware directly via open/read/write/etc.? For graphics, that has been mediated by a server or server-like API for many years. Audio should be the same way, even though some disagree, "but they are wrong", he said.

The problem with UNIX

The standard UNIX methods of device handling, using open/read/write/etc., are not necessarily suitable interfaces for interacting with realtime hardware. Davis noted that he has been using UNIX for 25 years and loves it, but that the driver API lacks some important pieces for handling audio (and video). Both temporal and data format semantics are not part of that API, but are necessary for handling that audio/video data. The standard interfaces can be used, but don't promote a pull-based application design.

What is needed is a "server-esque architecture" and API that can explicitly handle data format, routing, latency inquiries, and synchronization. That server would mediate all device interaction, and would live in user space. The API would not require that various services be put into the kernel. Applications would have to stop believing that they can and should directly control the hardware.

The OSS API must die

The OSS API requires any services (like data format conversion, routing, etc.) be implemented in the kernel. It also encourages applications to do things that do not work well with other applications that are also trying to do some kind of audio task. OSS applications are written such that they believe they completely control the hardware.

Because of that, Davis was quite clear that the "OSS API must die". He noted that Fedora no longer supports OSS and was hopeful that other distributions would follow that lead.

When ALSA was adopted, there might have been an opportunity to get rid of OSS, but, at the time, there were a number of reasons not to do that, Davis said. Backward compatibility with OSS was felt to be important, and there was concern that doing realtime processing in user space was not going to be possible—which turned out to be wrong. He noted that even today there is nothing stopping users or distributors from installing OSS, nor anything stopping developers from writing OSS applications.

Looking at OS X and Windows audio

Apple took a completely different approach when they redesigned the audio API for Mac OS X. Mac OS 9 had a "crude audio architecture" that was completely replaced in OS X. No backward compatibility was supported and developers were just told to rewrite their applications. So, the CoreAudio component provides a single API that can support users on the desktop as well as professional audio applications.

On the other side of the coin, Windows has had three separate audio interfaces along the way. Each maintained backward compatibility at the API level, so that application developers did not need to change their code, though driver writers were required to. Windows has taken much longer to get low latency audio than either Linux or Mac OS X.

The clear implication is that backward compatibility tends to slow things down, which may not be a big surprise.

JACK and PulseAudio: are both needed?

JACK and PulseAudio currently serve different needs, but, according to Davis, there is hope that there could be convergence between them down the road. JACK is primarily concerned with low latency, while PulseAudio is targeted at the desktop, where application compatibility and power consumption are two of the highest priorities.

Both are certainly needed right now, as JACK conflicts with the application design of many desktop applications, while PulseAudio is not able to support professional audio applications. Even if an interface were designed to handle all of the requirements that are currently filled by JACK and PulseAudio, Davis wondered if there were a way to force the adoption of a new API. Distributions dropping support for OSS may provide the "stick" to move application developers away from that interface, but could something similar be done for a new API in the future?

If not, there are some real questions about how to improve the Linux audio infrastructure, Davis said. The continued existence of both JACK and PulseAudio, along with supporting older APIs, just leads to "continued confusion" about what the right way to do audio on Linux really is. He believes a unified API is possible from a technical perspective—Apple's CoreAudio is a good example—but it can only happen with "political and social manipulation".

Poettering: The state of Linux audio

The focus of Poettering's talk was desktop audio, rather than embedded or professional audio applications. He started by looking at what had changed since last year's LPC, noting that EsounD and OSS were officially gone ("RIP"), at least in Fedora. OSS can still be enabled in Fedora, but it was a "great achievement" to have it removed, he said.

There were only bugs reported against three applications because of the OSS removal, VMware and quake2 amongst them. He said that there "weren't many complaints", but an audience member noted the "12,000 screaming users" of VMware as a significant problem. Poettering shrugged that off, saying that he encouraged other distributions to follow suit.

Confusion at last year's LPC led him to create the "Linux Audio API Guide", which has helped clarify the situation, though there were complaints about what he said about KDE and OSS.

Coming in Fedora 12, and in other distributions at "roughly the same time", is using realtime scheduling by default on the desktop for audio applications. There is a new mechanism to hand out realtime priority (RealtimeKit) that will prevent buggy or malicious applications from monopolizing the CPU—essentially causing a denial of service. The desktop now makes use of the high-resolution timers, because they "really needed to get better than 1/HZ resolution" for audio applications.

Support for buffers of up to 2 seconds has been added. ALSA used to restrict the buffer size to 64K, which equates to 70ms 370ms of CD quality audio. Allowing bigger buffers is "the best thing you can do for power consumption" as well as dropouts, he said.

Several things were moved into the audio server, including timer-based audio scheduling which allows the server to "make decisions with respect to latency and interrupt rates". A new mixer abstraction was added, even though there are four existing already in ALSA. Those were very hardware specific, Poettering said, while the new one is a very basic abstraction.

Audio hardware has acquired udev integration over the last year, and there is now "Bluetooth audio that actually works". Poettering also noted that audio often didn't work "out of the box" because there was no mixer information available for the hardware. Since last year, an ALSA mixer initialization database has been created and populated: "It's pretty complete", he said.

Challenges for the next year

There were a number of issues with the current sound drivers that Poettering listed as needing attention in the coming year. Currently, for power saving purposes, PulseAudio shuts down devices two seconds after they become idle. That can lead to problems with drivers that make noise when they are opened or closed.

In addition, there are areas where the drivers do not report correct information to the system. Decibel range of the device is one of those, along with the device strings that are either broken or missing in many drivers, which makes it difficult to automatically discover the hardware. The various mixer element names are often wrong as well; in the past it "usually didn't matter much", but it is becoming increasingly important for those elements to be consistently named by drivers. Some drivers are missing from the mixer initialization database, which should be fixed as well.

The negotiation logic for sample rates, data formats, and so on are not standardized. The order in which those parameters are changed can be interpreted differently by each driver which leads to problems at the higher levels, he said. There are also problems with timing for synchronization between audio and video that need to be addressed at the driver level.

Poettering also had a whole slew of changes that need to be made to the ALSA API so that PulseAudio (and others) can get more information about the hardware. Things like the routing and mixer element mappings as well as jack status (and any re-routing that is done on jack insertion) and data transfer parameters such as the timing and the granularity of transfers. Many of the current assumptions are based on consumer-grade hardware which doesn't work for professional or embedded hardware, he said. It would be "great if ALSA could give us a hint how stuff is connected".

There is also a need to synchronize multiple PCM clocks within a device, along with adding atomic mixer updates that sync to the PCM clock. Latency control, better channel mapping, atomic status updates, and HDMI negotiation are all on his list as well.

Further out, there are a number of additional problems to be solved. Codec pass-through—sending unaltered codec data, such as SPDIF, HDMI, or A2DP, to the device—is "very messy" and no one has figured out how to handle synchronization issues with that. There is a need for a simpler, higher-level PCM API, Poettering said, so that applications can use the pull model, rather than being forced into the push model.

Another area that needs work is handling 20 second buffering. There are a whole new set of problems that come with that change. As an example, Poettering pointed out the problems that can occur if the user changes some setting after that much audio data has been buffered. There need to be ways to revoke the data that has been buffered or there will be up to 20 second lags between user action and changes to the audio.

Conclusion

Both presentations gave a clear sense that things are getting better in the Linux audio space, though perhaps not with the speed that users would like to see. Progress has clearly been made and there is a roadmap for the near future. Whether Davis's vision of a unified API for Linux audio can be realized remains to be seen, but there are lots of smart hackers working on Linux audio. Sooner or later, the "one true Linux audio API" may come to pass.

