OK full disclosure, the title should probably be “A failed Web Audio experiment…”. Never mind, what follows below is a journey through learning the Web Audio API and attempting to make a nifty little app. I succeeded on all counts with the exception of producing the app.

Last things first, the app in all its shoddy glory. The objective was to detect which piano key is being played.

Pardon the vigorous thumping of the keys

The plan

The original plan was to make a webapp for practising sight reading sheet music. That’s where you look at a piece of music and can immediately pump out da tune on a keyboard. The app would show a note, you would play the note, and the app would tell you if you got it right. Or play sad-trombone if you got it wrong.

This started with attempting to detect the pitch of a note, and finished abruptly because I got a job and had to stop aimlessly messing around with APIs all day.

The code

Step one was to get my hands on the sounds coming from the user’s microphone. I thought this was going to be some giant pain and involve learning some complex API. I was delighted to find that it’s a piece of cake.

The audio context

You may have worked with <canvas> before and know that the ‘context’ is a reference to the canvas that you use to pass instructions to the canvas. Well it’s the same with audio. There is an ‘audio context’ that is at the center of everything you do with the audio you’re working with.

Naturally, the next step was to create an audio context. Then I could tell the context that its source of audio would be the stream from the user’s mic. Then I’d connect a thing called an ‘analyser’ which would give me some data about the audio. Sounds complex, right?

People stuck on ES5 have my sympathy

That’s pretty cool. It almost reads like English. “Hello audio context, I’d like to create a media stream source for you to listen to, here’s the stream. I’d then like to connect an analyser to that stream.

Next I needed to turn that rather abstract ‘stream’ into something more useful, like an array of numbers.

The analyser is my little helper, listening all the time to the audio coming in from the mic, and I can ask it to give me a snapshot of what it’s hearing whenever I like. But first I ask the analyser how much data it will give me each time I request it (analyser.frequencyBinCount) and create an empty array of that length.

Then, I wait…

…for 300 milliseconds; just to let the page warm up.

A this point I had no idea what the analyser would actually give me. So faced with either reading the documentation or dumping it to the console I did what any sane person would do.

Now that’s a good lookin’ list of numbers

This shocks a lot of people when I mention it, but I understand things better if I can visualise them. So I’ll loop through the array and spit it out onto a canvas.

The output is this squiggly little fellow:

If you’re not familiar with audio stuff, it’s actually quite simple. Here’s three bullet points:

The Web Audio API is recording at 44.1 kHz. That means it’s recording 44,100 points of data every second. (Hz just means “times per second”)

When I ask my friendly analyser for some data, it gives me an array of 1024 numbers (about 23 milliseconds’ worth).

The ‘pitch’ of a note is defined by how many waves there are in one second.

Let’s do some math! Here I have measured the length of one wobble.