The problems with HTML5 <Audio>

Background

It is missing functionality

Problem #2: Latency

Music Apps and Games sometime require playback of short-buffers

Especially on mobile

Microsoft deserves credit

I've been having a Twitter back and forth discussion with Giorgio Sardo, Microsoft's HTML5/IE evangelist on Audio, but I find 140 characters too limiting to explain the issues, and Giorgio seems more interested in snark and attacking Chrome, than attacking the root of the problem.This all started because after the release of Angry Birds at Google I/O, people noticed that it was requesting Flash. Angry Birds is written in GWT and uses a GWT library written by Fred Sauer called GWT-voices. This library not only supports HTML5 audio, but has fallbacks to flash and even <bgsound> on IE6! There was speculation the Flash requirement was done for nefarious purposes (to block iOS) or because Chrome includes Flash, but the reality is, it was done *both* because Chrome has some bugs, and HTML5 <audio> just isn't good for games or professional music applications.I first noticed the shortcomings of the audio tag last year when we ported Quake2 to HTML5 (GwtQuake) shown at last years I/O, where I also demoed a Commodore 64 SID music emulator. There are two issues with using HTML5 Audio, which was originally designed to support applications like streaming music players.The HTML5 audio element permits operations like seeking, looping, and volume control, which are great for jukebox applications, but you cannot synthesize sound on the fly, retrieve sound samples, process sound samples, apply environmental effects, or even do basic stereo panning. Quake2 required 3D sound based OpenAL's inverse distance damping method as well as stereo panning, I did my best, and implemented distance damping with the volume control, but had no ability to position sounds left or right.For sound synthesis, there is no official way to play back dynamically created buffers. The workaround is to use Javascript to encode sample buffers into PCM or OGG in realtime, convert them to data URLs and use those as the source for an audio element, which is very computationally expensive and chews up browser memory. For developers wishing to create even basic music visualizers, it creates huge difficulties.Audio applications require low latency. Studies have shown human beings can perceive audio latency down to the millisecond, but in general, lower than 7ms is considered good enough. This means in some circumstances, you need to schedule sounds within 7ms of one another, for example, if you need to simultaneously start two sounds, one on the left ear, and one on the right ear, or if you need to concatenate several sounds together in series.Giorgio has a neat demo here of playing piano notes in sequence , and hats off to Microsoft for providing a great <audio> implementation. It's a cool demo, but I still hear latency variation in playback between notes and occasional glitches. No one's going to build something even 1/10th as good as Garage Band on iPad using this technique. That's because the one way you can schedule audio in HTML5 is via the browser's event-loop using setInterval or setTimeout, and that's problematic for several reasons.First, it's unreliable. Over the years, setInterval/Timeout has been clamped to different minimal resolutions, depending on the browser and operating system. On some systems, it was tied to vertical refresh and would clamp to 16ms, then vendors started clamping to 10ms, and now they clamp as low as 4ms. But 4ms isn't a guarantee, it's a request. Many things can stand in the way of that request, for example, by just mousing over the page, user interface events can trigger Javascript handlers, CSS rules which force a relayout, and excessive Javascript work can trigger garbage collection.Secondly, aggressive setInterval periods can delay response to user input, making the browser feel sluggish. If the user tabs to another window, the browser must decide whether or not to clamp timeouts to a much higher value (say 1 second), to avoid needlessly burning CPU which could harm background playback. Unlikewhich solves this problem for graphics, there's no "requestSoundEvent".Some of the sounds in Quake2, for example, the hyper-blaster are sample buffers as small as 300 bytes. At 44khz, this is a hard deadline of 8ms to schedule the playback of the next sound in the sequence. With all of the other stuff going on within a frame, processing physics, AI, rendering, it is highly unlikely to be consistent, and do we really want JS performing this scheduling task.Remember, mobile devices are HTML5 devices as well, and are continually getting better at HTML5, but they are much more resource constrained, and Javascript is even slower. Here, native scheduling is even more beneficial, and intensive Javascript scheduling of playback would be difficult, and waste battery.That's why the Web Audio API is important, because it permits complex audio schedule tasks, application of environmental effects, convolutions, etc to bewithout involving the Javascript engine in many cases. This takes pressure off the CPU, off of memory and the garbage collector, and makes timing overall more consistent. Here's a neat demo recently shown at Google I/O They made massive improvements in support of HTML5 from IE8 and IE9 especially in and , and they deserve the right to feel proud and evangelize them. We celebrate that. It's why Angry Birds works, to some people's shock on other browsers, and it's not by accident. We built in fallbacks in our core library for 2d canvas, and tested on non-WebGL capable browsers like IE9 which have excellent GPU accelerated 2d support.Angry Birds was not an attempt to make non-Chrome browsers look bad, but to make HTML5 look good, because when developers start realizing that professionally developed and polished games and applications can be done in HTML5, we all win.But now is not the time to rest on our laurels. HTML5 is not done. There are many things incomplete and broken in the spec. I am sad to see Microsoft trying to talk down the experimentation that is going on in Firefox and Chrome, vis-a-vis WebGL and new Audio APIs, just because they are on a slower release cycle and do not have these bleeding edge features.Giorgio seems to be suggesting in his tweets that the basic HTML5 tag is "good enough" and that the current IE9 implementation covers use cases sufficiently, and I disagree with that strongly.We need 3d on the web. We need high quality, low latency, audio. We need to be able to do the things that OpenAL and DirectX can do with sound on the Web. And we're not going to get there by sticking our head in the sand and declaring premature victory.

Labels: angry birds, audio, chrome, gwt, html5, ie9