Simply put, the Web Audio API is awesome and powerful! It makes it possible to synthesize, manipulate, spatialize, and visualize any sound, limited only by your imagination and processing power. The ubiquity of the web browser allows for an unprecedented environment to compose, instantly share your music, and collaborate with the world. The Web Audio API also gives you a unique advantage over any other music making tool, the ability to connect and control your sound with any API on the web (SoundCloud, NASA, Twitter) or on your laptop/smartphone (accelerometer[motion/tilt], geolocation, vibration), you can connect to them all.

Unlike the previously covered Web MIDI API, the Web Audio API has been available in every major browser since late 2013 and its features are, for the most part, pretty stable. In the next two installments of this series, I’ll be covering many of the key aspects of using the Web Audio API, giving you plenty to get started making music in the browser.

In Part 1 and Part 2 of Making Music in the Browser, Web Audio API, I’ll cover the core concepts and give interactive examples of using the various components, or nodes, of the API.

Note: As usual, basic understanding of JavaScript, or another C-based language, will help make this article a lot more digestible.*

There’s a lot to cover, so let’s get started!

AudioContext

Everything in the Web Audio API lives inside of an AudioContext. The AudioContext can be thought of as a canvas for sound, where everything is processed / decoded, connected together, and controlled. The AudioContext comprises a variety of AudioNodes which process audio and output the new values to other AudioNodes or AudioParams. Nothing can be done without an AudioContext, in the Web Audio API.

The AudioContext also exposes various properties and methods, such as sample-rate, currentTime (in seconds, from the start of AudioContext creation), destination, and the methods for creating all of the various AudioNodes; OscillatorNode, GainNode, BiquadFilterNode, etc…

All of your sound is routed to the AudioContext’s destination node, typically your computer or audio-interface, and is the final destination for all of your sound, so no other nodes can be connected to it, as it doesn’t have any outputs. The ‘sampleRate’ parameter is determined by either your system setting or audio-interface and currently can’t be changed programmatically. The Web Audio API supports sample-rates within the range 22,050Hz – 96,000Hz and can work with up to 32 channels. These ranges and limits are only limits of the browser/OS that the API is implemented in and not limitations to the API. The Web Audio Spec states that these limits and ranges are ‘minimal requirements’ of implementation, so it’s feasible to be able to work with higher limits and ranges, depending on implementation and hardware.

Since the AudioContext is the foundation of anything you do with the Web Audio API, the first thing that you need to do is to create an AudioContext.

As of this writing, you still need to prefix the declaration of the AudioContext on all webkit-based browser. Luckily, that’s all of the browser prefixing that you need to deal with.

Modular Routing

One of the many great aspects of the API is that every AudioNode is modular and can connect to not only any other node, but any other node’s AudioParams, very similar to the way Modules in a EuroRack connect to each other. For instance, you can connect the output of an OscillatorNode, set to a low frequency, creating an LFO (Low Frequency Oscillator), producing a periodic change in a sound’s gain, a tremolo effect.

Connections are made with the AudioNode’s connect() method as shown in the tremolo example. The connections you make don’t have to be in any particular order in your code, but it’s a good idea to write them in a logical route flow so you can reference your routing easily by looking at your connection. It’s also possible to have complex connections between AudioNodes and their parameters, such as, multiple connections from 1 or more inputs to 1 or more outputs, typically referred to ‘fan-out’ and ‘fan-in.’ For instance, you could have one AudioSourceNode’s output (tone, mp3, etc…) connect to 4 BiquadFilterNodes, 2 GainNodes, and a PannerNode.

AudioNodes

AudioNodes are the building blocks of the API. They are the various modules that you would use while building web audio applications; audio sources, processing modules, and the audio destination and they perform a specific task, such as waveform generation, gain, filtering, convolution, compression, spatialization, etc… Some important examples of AudioNodes are the OscillatorNode, GainNode, and BiquadFilterNode. We’ll cover all of them- around 20 AudioNodes- in detail throughout this series. Each AudioNode can have inputs and outputs which are used to connect them to other AudioNodes/AudioParams. Some don’t have any outputs or inputs, typically starting and endpoint nodes like SourceNode (0 inputs, 1 output) and AudioDestinationNode (1 input, 0 outputs). Generally, an AudioNode takes in audio data, processes it in some way, and sends these new values to its output.

AudioParams

AudioParams are the audio-related parameters of AudioNodes, such as ‘gain’ or ‘frequency’. Each parameter can be set in value directly, over a period of time, or scheduled to change in the future. These changes can be linear, exponential, or defined by a custom curve of your making. When setting an AudioParam directly, you will always need to use AudioNode.param.value = yourvalue. When scheduling an AudioParam, ‘.value’ needs to be omitted, AudioNode.param.exponentialRampToValueAtTime(value, time) . You can get the current value of any AudioParam by using the same syntax as when directly setting a value by omitting the numerical value you would set. Gain Node.gain.value // 1 . You can also get an AudioNodes default value, useful for resetting a param, by accessing the AudioParam’s ‘defaultValue’ paramter. OscillatorNode.frequency.defaultValue // 440

Timing and Scheduling

The Web Audio API has an extremely high-resolution and reliable timing system which differs in many ways from the JavaScript timing system of which the maximum resolution is only 1 millisecond, 0.001. The Web Audio API timing system has a resolution of 1/1000 of a picosecond, 0.000000000000001, or 1 femtosecond. This resolution is designed to be able to specify alignment on the sample-level, even with high-resolution audio. The timing system is always relative to the creation of the AudioContext, always starting at 0, and progressing in high-resolution seconds. The value can be accessed at any time by calling the ‘currentTime’ parameter of the AudioContext. This timing system makes it possible to build accurate sequencers, drum-machines, or time-reliant effects.

var audioContext = new AudioContext;

audioContext.currentTime; // example: 507.570793650793631

Any numerical AudioParam can be automated and scheduled. For instance, you could schedule the cutoff frequency of a filter to change over time, creating a filter sweep, or fadeout a sound over time. There are a variety of scheduling methods that we’ll be using throughout this series, briefly explained below:

Simple Synthesis

We’ll go in-depth with effects/synthesis in our addendum to Emmett Corman’s Simple Synthesis series in future posts to this series. I’ll build an interactive Web Audio API example for each topic he covered in his series, with the goal of givig you a deeper understanding of each topic through experimentation.

Getting the most out of these examples:

Ears:

Some of these examples may be rather loud to you, so please start of with your computer volume set low until you find a comfortable level for your ears.

Code:

The code for each example can be explored by clicking on the ‘JavaScript’ tab in the top of each example. I’ve added comments to each important section in the code and will try and let the example code guide you rather than take you through each line. If you want to copy or modify the code, I suggest clicking on the ‘Edit in JSFiddle’ link in the top right of each example. If you’re using FireFox, their developer console has a Web Audio Editor, which allows you to visually see AudioNode connections and change their values in realtime.

Values:

The sliders are intentionally set to very large ranges to better showcase the power of the Web Audio API, this unfortunately makes it a bit awkward to make small adjustments. I recommend clicking on a sliders knob to select and then using the left/right arrow keys to make smaller adjustments.

OK! Let’s put all of this together and make some sound!

Tone and Volume, The OscillatorNode and GainNode:

At the basic level of sound creation we have a simple waveform and a way to control that waveform’s amplitude, or volume. The OscillatorNode produces a periodic wave of which you can set the shape, using ‘ OscillatorNode.type ‘, and frequency, using ‘ OscillatorNode.frequency.value ‘. The built-in types are: sine (default), square, sawtooth, and triangle. The Web Audio API also allows you to create your own type of waveform by setting type to ‘custom’ and then using the OscillatorNode.setPeriodicWave method. For this tutorial, we’ll only cover the built-in types. An OscillatorNode is also a sourceNode (starting-point of an audio signal), so it has one output, and no inputs. In addition to a ‘frequency’ param, the OscillatorNode has a ‘detune’ param which can be used to control the node’s frequency. This param works in relation to either the default frequency.value, 440, or an explicitly set oscillator.frequency.value = 440 , and detunes in cents. When you want to change an oscillator’s note in a musical way, changing intervals, it’s best to use detune so you don’t have to calculate what the frequency difference between two notes are. For example, to go down a major third (-400 cents), A4 440 – F4 349.23, oscillator.detune.value = -400 .

Each OscillatorNode needs to be started by using the OscillatorNode.start(time) method, which takes the start-time as its parameter. As noted above, the start-time is relative to the AudioContext.currentTime value. If you pass in a start-time of 0, or less than zero, your OscillatorNode will start immediately. If you want to schedule an OscillatorNode to start in the future, simply reference the currentTime and add the number of seconds you want the OscillatorNode to start in. Similarly, you can use the OscillatorNode.stop(time) method to stop an OscillatorNode. It is important to note that once you stop an oscillator it cannot be restarted again and must be recreated. This seems inefficient, but the API is highly optimized for this behavior. Another way to emulate the stopping of playback is to either disconnect, and later reconnect to resume playing, or to set a connected GainNode’s gain value to 0, but having an oscillator always running in the background when it is not heard will eat up your browser’s memory, so it’s best to use the ‘create, connect, start, stop’ pattern when you need an oscillator to actually stop.

The GainNode has one input and one output. It simply multiples the amplitude of its input and returns it to its output, causing a change in volume. Its only settable parameter is the value representing the amount of gain to apply the the sound source. GainNode.gain.value // defaults to 1

The example below allows you to start/stop and oscillator, change its waveforms, change its frequency, and control its gain.

Using soundfiles, the AudioBuffer and AudioBufferSourceNode

When you want to use a soundfile as your audio source you need to load your soundfile into a AudioBuffer. An AudioBuffer represents a reference to a soundfile and can be used by multiple BufferSourceNodes for playback. The AudioBuffer can be thought of as a record and a BufferSourceNode can be thought of as a record player. Typically, you will want to use an AudioBuffer when the sample you’re working with is around 1 minute long. For longer soundfiles, you should use an audio or video HTML element along with the AudioContext.createMediaElementSource() method to create AudioSourceNodes, as seen in figure 2. I’ll cover this type of soundfile loading in detail in Part 2. We’ll be using an AudioBuffer in our example since our soundfile is around 1 minute long.

When loading in samples into AudioBuffers, you’ll want to avoid blocking your webpage interface from being unresponsive while the soundfile loads, so the bet approach is to load the samples in via XMLHttpRequest(), XHR/AJAX. We’ll need to request an ‘arraybuffer’ for our responseType in our XHR request. Once we have our data we need to decode it, using the audioContext.decodeAudioData(request.respone) method, and load it into a buffer to use. Next, we will need to create a BufferSourceNode, which loads our decoded audio data into memory so we can use it for playback.

The BufferSourceNode is what you will want to connect to any AudioNodes for processing or your AudioContext.destination to listen to. With the BufferSourceNode you can set things like looping/looping start and end points, playbackRate, and detuning. If you want seamless loops you will need to use samples in the ‘ogg‘ file format. Mp3s and other formats produce a brief pause before starting the loop each time due to a fixed number of samples in each frame, even if the audio data has stopped, the rest of the frame is filled with silence. Starting and stopping playback is similar to starting and stopping an OscillatorNode, thus, it is important to note that once you stop a BufferSourceNode from playing, you cannot restart it. You can use the same ‘create, connect, start, stop’ pattern we saw with the OscillatorNode to stop and start samples. Fortunately, if you’re using an HTML media element like, <audio> or <video> as your sound source, there is a built in pause() method.

The example below loads our sample via XHR and allows you to control playbackRate and looping.

Filtering sounds, The BiquadFilterNode

The BiquadFilterNode is used to attenuate or amplify ranges of frequencies in a sound. Multiple BiquadFilterNodes can be connected together to form more complex filters or to build a graphic equalizer. The Web Audio API has 8 built-in types: “lowpass”, “highpass”, “bandpass”, “lowshelf”, “highshelf”, “peaking”, “notch”, and “allpass.” Most filter types have frequency, Q Factor, and gain parameters. The gain parameter is expressed in dB (decibels) in the range of -40dB to 40dB. The higher the Q Factor, the narrower and ‘sharper’ the peak/knee is in the filter curve. Notable exceptions are the “lowshelf” and “highshelf” types, which do not use Q, and along with the ‘peaking’ type, use gain, while the others do not. The BiquadFilterNode allows you to get the frequency response of a set of frequencies using the getFrequencyResponse() method.

The example below allows you to try out all of the available filter types and set their respective frequency, Q, and gain values.

Simple Spatialization, The StereoPannnerNode

The Web Audio API can be used in complex multi-channel 3D spatialization using the PannerNode, which is a more complex concept to cover than we have time for in this post. I will cover 3D spatialization using the PannerNode in Part 2.

The API offers a much simpler way to spatialize audio by using the StereoPannerNode, which positions a sound left or right using equal-power panning. The pan position is set using the ‘pan’ parameter in the ranges of -1 (hard left) to 1 (hard-right), StereoPannerNode.pan.value = -0.5 .

The example below allows you to move a sound left or right, with optional mouse control.



Wow, we covered a lot! I hope that you have a decent overview of what the Web Audio API can do. In Part 2, and possibly Part 3, we’ll cover AudioParam scheduling, DelayNode, ConvolverNode, DynamicsCompressorNode, 3D spatialization, getting live audio from your computer’s mic, visualizing your audio, and using the HTML audio and video elements as our audio sources.

Play around with the code and see what you can build! See you soon for Part 2!

* Note: Basic understanding of JavaScript, or another C-based language, will help make this article a lot more digestible.

You can use Codecademy to get up to speed with JavaScript pretty quickly if you’re interested, and you can and should consult theMozilla Developer Network often to get a more thorough understanding of the language.

Tools: Developer Console ((Mac) Command + Opt + J, (Windows) Control + Shift + J), Code Editor, such as SublimeText.

Here are live stats on Web Audio browser support

Can I Use audio-api? Data on support for the audio-api feature across the major browsers from caniuse.com.

Highly Recommend Resources:

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API

https://webaudio.github.io/web-audio-api/

https://webaudiodemos.appspot.com/

http://blog.chrislowis.co.uk/

The samples used in this tutorial all come from Matt Hettich’s post about ALM Dinky’s Taiko, which can be downloaded from there.