Introducing: The Web Audio API

If you know anything about anything about generating sound on the web, then you’ll have guessed that this would be the first section. There’s a ton of introductions to the Web Audio API online, so feel free to have a read of those if you have no idea what I’m talking about. I’ll explain the parts that I use as and when I use them, so you can stick around here too, which would be better for my bounce rate.

To get started, we need something called an AudioContext, which provides us with the capabilities we need to work with the Web Audio API. An AudioContext allows us to create sound inputs, route them through various effects, and then send that processed sound through to an output. This creates an audio processing graph. Each input, effect, and output is a node on this graph (and all inherit from the AudioNode interface of the Web Audio API). We can get our hands on an AudioContext thanks to this line of JavaScript:

const audioContext = new AudioContext();

The first thing an AudioContext gives us is the ability to create a sound in the browser. We can do this by using its createOscillator method, which gives us back an oscillator, or ‘OscillatorNode’, to play with. We can specify a periodic waveform that the oscillator should use (for example, a sine wave or a square wave), the frequency in Hz that the wave should oscillate at (which dictates its pitch), and then tell the oscillator to generate one of these waves:

const oscillator = audioContext.createOscillator();

oscillator.type = 'sine';

oscillator.frequency.value = 100;

oscillator.connect(audioContext.destination);

oscillator.start();

That penultimate line connects the OscillatorNode up to the AudioContext’s AudioDestinationNode, which is usually your device’s speakers. Without this line, we wouldn’t be able to hear anything. So, here we have an input node connected to an output node, and our audio graph is born.

The speakers then start to propagate this wave through the air, and if it hits somebody in the ear, they’ll be able to hear it. My task is to arrange a bunch of these waves so that they hit somebody in the ear together in a way that makes some musical sense.

I’m playing all the right notes, but not necessarily in the right order

Ok, so our browser is generating a sound, that’s our first task successfully completed. Unfortunately for us, very few people would call a constant sine wave at 100 Hz particularly musical. We need to figure out how to generate different pitches at different times. Some would say that’s the minimum requirement for a piece of music.

Let’s start with pitch. This one’s easy. We can change the pitch of the sound created by the oscillator by telling it to oscillate at a different frequency. If we know the frequency values of some musical notes, we can use those values in our oscillators, and end up with something ‘musical’. But which frequencies correspond to which musical notes? There’s a whole load of different tuning systems out there, but I’ll use the one most common to popular western music, which is 12 Tone Equal Temperament. A discussion around what that is is a long blog post in itself, but for now all we need to know is that it maps each note to an exact frequency value. So for example, we have a C at 16.4 Hz, a D flat at 17.3 Hz, a D at 18.4 Hz, and so on. To make life easy for myself, I’ve stored a mapping of these in a file, called notes.js .

What about time? We need a way of scheduling when these notes are going to be played. The Web Audio API helps us out here too. Javascript’s clock isn’t precise enough to deal with the timing of these audio events, and can be delayed by other events in the browser. Now, thanks to the Web Audio API, we can get a far, far more precise timestamp, which can handily be passed into the oscillator’s start method that we used previously. This has the effect of starting the oscillator at the timestamp we provided, rather than immediately. And unless you’re a drummer playing between songs at a band practice, you usually want to be able to stop too:

oscillator.start(audioContext.currentTime);

oscillator.stop(audioContext.currentTime + 1);

Ok, awesome. So we can get an oscillator to play us a note at a pitch, at a given time, for a given duration. Let’s put all this together to create our first simple synthesiser:

import { notes } from './notes';

const { A2, C3 } = notes; const audioContext = new AudioContext(); function playNote(startTime, duration, pitch) {

const oscillator = audioContext.createOscillator();

oscillator.connect(audioContext.destination);

oscillator.frequency.value = pitch;

oscillator.type = 'sine'; oscillator.start(startTime);

oscillator.stop(startTime + duration);

} playNote(audioContext.currentTime, 1, A2);

playNote(audioContext.currentTime + 1, 1, C3);

A2 and C3 are two of the pitch values I saved earlier, which correspond to an A at 110 Hz, and a C at 130.813 Hz. The code above will play the A for one second, immediately followed by the C for one second. I guess this counts as a song?

You may have noticed that I created a new oscillator for every note, rather than creating it outside the function and then changing the pitch and re-using it. Once you stop an oscillator you can’t start it up again. The Web Audio API has been designed in this way, and optimised for this sort of usage pattern.

That’s not a song.

Ok, it’s a stretch to call those two notes a piece of music. We could call our playNote function over and over again, passing it the pitch we want to hear and when that note should be played, but that’s going to get tedious. What we need is some way of scoring our music. In real life, a musician will read something like this:

Sheet music.

In this image, each black dot represents a note that the musician needs to play. Those horizontal lines that go across the page are called a staff. The higher the dot is placed on the staff, the higher the pitch of the note. The musician knows which order to play the notes in by reading them from left to right down the page. You can also see that the staff is broken up into sections by vertical lines. This splits the staff up into ‘bars’, which each contain the same number of beats. We can number these bars, to let us know where we are in the song.

I’m not going to write any sheet music, but I am going to need to be able to represent the same things in code. Here’s what I came up with:

import { notes } from './notes';

const { A1, F1, C2, D2 } = notes; const score = [

[

{ tick: 0, pitch: A1 },

{ tick: 4, pitch: C2 },

{ tick: 8, pitch: A1 },

{ tick: 12, pitch: C2 }

],

[

{ tick: 0, pitch: F1 },

{ tick: 4, pitch: D2 },

{ tick: 8, pitch: F1 },

{ tick: 12, pitch: D2 }

]

];

So, we have an array here called score , which (unsurprisingly) represents the score that our instrument will play from. Each element inside this array represents one bar of music. Each bar is an array of objects, and each object represents one note that our instrument needs to play. This object contains two entries: the pitch , which is the frequency of the note that should be played, and the tick within the bar that that note should be played on.

What the hell is a tick?

Take one bar of music. This bar can be broken down into ‘beats’. In pop music, there are usually four beats in a bar. But we need to get more granular than that. We need to be able to play notes that don’t fall on the beat. So, instead of splitting the bar up into four equal sections, I’ve decided that I want to split it up into 16 equal sections. I’ve called each of these a ‘tick’. 16 probably won’t be granular enough if we want to write anything interesting, but it will do for now.

I’ve stolen this terminology from Logic Pro, which is the software I use to write music when I’m not trying to make music in the browser.

Let’s play the thing

We have a very simple synthesiser, and a very simple score. It’s time to get the synthesiser to play the score:

const tempo = 120;

const secondsPerBeat = 60 / tempo;

const secondsPerBar = secondsPerBeat * 4;

const secondsPerTick = secondsPerBeat / 4; score.forEach((bar, barNumber) => {

bar.forEach((note) => {

const timeToAdd =

secondsPerBar * barNumber + note.tick * secondsPerTick; playNote(

audioContext.currentTime + timeToAdd,

secondsPerTick,

note.pitch

);

});

});

Here we’ve set a tempo for the track at 120 beats per minute. Then we’ve figured out how long a beat, bar, and tick should each last in seconds based on that tempo. We can then iterate over each bar of the score, and call our playNote function for each note within that bar. We have to take into account how much time should pass before the note should be played, and what the pitch of that note should be. The duration of each note has been set to last for one tick (my score doesn’t specify a duration that the note should last for, which is something we’ll need to add in at some point).

Awesome, now we have our sine wave synthesiser playing back the notes we asked it to, in the right order at the right time. I’m getting bored of listening to sine waves though…

Making our synthesiser sound more interesting

There are two ways we’re going to change up the sound of our synthesiser. We can experiment with different waveforms until we find a sound we like, and we can introduce some audio effects to sculpt that sound further. The first synthesiser I want to create is a nice, full bodied, bass synth.

Currently, we have two AudioNodes in our audio graph, an oscillator and a destination. We can route our oscillator’s output through some more AudioNodes before it reaches its destination, and these nodes can each process the sound before sending it on to the next node in the chain. This is analogous to my usual life as a guitarist — I plug my guitar into a distortion pedal, and then into a reverb pedal, before routing that effected sound out into an amplifier.

My bass synth is going to use two of these AudioNodes:

BiquadFilterNode : this takes an input sound, and either filters out or boosts frequencies from that sound. The frequencies that are filtered depend on the type of filter that you use. For example, you can use a ‘highpass’ filter at a frequency of 1000 Hz to only allow frequencies above 1000 Hz through the filter, or a ‘lowpass’ filter at 500 Hz to only allow frequencies below 500 Hz through the filter.

: this takes an input sound, and either filters out or boosts frequencies from that sound. The frequencies that are filtered depend on the type of filter that you use. For example, you can use a ‘highpass’ filter at a frequency of 1000 Hz to only allow frequencies above 1000 Hz through the filter, or a ‘lowpass’ filter at 500 Hz to only allow frequencies below 500 Hz through the filter. GainNode: This is effectively a volume control. A gain of 1 allows the input sound through at full volume, whereas a gain of 0 will mute the sound, and you can provide any value in between.

Here’s the code to create the bass synth, putting together the elements I just introduced:

export default function createBass(audioContext) {

const filter = audioContext.createBiquadFilter();

filter.type = 'lowpass';

filter.frequency.value = 1000;

filter.connect(audioContext.destination); const gain = audioContext.createGain();

gain.connect(filter); return function(time, pitch) {

const subOsc = audioContext.createOscillator();

subOsc.connect(gain);

subOsc.frequency.value = pitch;

subOsc.type = 'sine'; const triangleOsc = audioContext.createOscillator();

triangleOsc.connect(gain);

triangleOsc.frequency.value = pitch;

triangleOsc.type = 'triangle'; gain.gain.setValueAtTime(0.0001, time);

gain.gain.exponentialRampToValueAtTime(0.3, time + 0.3); subOsc.start(time);

subOsc.stop(time + 0.5); triangleOsc.start(time);

triangleOsc.stop(time + 0.5);

}

}

This function takes an audioContext that can be used to create our oscillators and processing nodes. It then creates a filter, to cut out any frequencies above 1000 Hz, and a gain node which can be used to set the volume level. It then returns a function that we can call to play one note on this synth. This function creates two oscillators at the given pitch, one sine wave and one triangle wave.