You can’t pass the class itself (or an instance) to Polymer, because without sugar the class is a function and the Web Components registerElement function that Polymer calls expects an object as its second parameter, not a function. It also expect a tag name as its first, so I used a getter for is because it appears as a property on the prototype. I guess I could have done this.constructor.prototype.is = 'my-rad-element' , but getters look neater to me.

Another side-effect of this approach is that you don’t get to use an instance of the class anywhere, so anything you would have done in constructor now needs to be done in the created and attached callbacks, which is a bit limiting but also no big deal. I guess that’s just the nature of using a class / function instead of an object.

All of this isn’t strictly necessary, or even remotely so; there’s nothing wrong with giving Polymer an object.

All of this isn’t strictly necessary, or even remotely so; there’s nothing wrong with giving Polymer an object. But I like ES6 Classes (controversial, I know) and if I’m in ES6 world, or want to be, why not just try and get it all working nicely? Yes? Winner.

Audio Analysis the wrong way

With elements in place, let’s talk about analysing audio, because I thought this bit was going to be relatively easy to do. I was wrong. Very wrong. Essentially I’m an idiot and still haven’t learned to estimate work well. But let me see if I can’t make it easier for the next troubled soul who attempts to do something similar.

To begin with let me tell you about failing. Not real failing, though, the Edison style of failing:

"I have not failed. I've just found 10,000 ways that won't work." Thomas A. Edison

Attempt number one, then: Fast Fourier Transforms, or FFTs. If you’re not familiar with them, what they do is give you a breakdown of the current audio in frequency buckets. The Web Audio API can let you get access to that data in – say – a requestAnimationFrame with an AnalyserNode , on which you call getFloatFrequencyData .

An FFT of some audio. I assumed I would be looking for peaks, and peaks would tell me what note I was playing.

I thought that if I took an FFT of the audio, I would be able to step through that, look for the most active frequency. Then it’s a case of figuring out which string it’s likely to be based on the frequency, and then providing “tune up”, “tune down”, or “in tune” messages accordingly.

Then performance happened. And harmonics. Mainly harmonics.

Performance

In the end this approach yielded something with a frame rate that fluctuated wildly between 30 and 60fps, and something which can only be described by its friends as a “CPU melter”.

In order to get enough resolution on frequencies, you need a colossal FFT for this approach. With an FFT of 32K (the largest you can get), each bucket in the array represents a frequency range just shy of 3Hz.

Filling up an array of that size takes somewhere in the region of 11ms on a Nexus 5 on a good day with a following wind. If you’re trying to do that in a requestAnimationFrame callback, you’re going to have a bad time. Doubly bad is the fact that you’re also going to have to process the audio data after getting it. For 60fps you have about 8-10ms of JavaScript time at the absolute maximum. The browser has housekeeping to do, so you have to share CPU time. In the end this approach yielded something with a frame rate that fluctuated wildly between 30 and 60fps, and something which can only be described by its friends as a “CPU melter”.

Harmonics

And then the harmonics. A B3 note has a frequency of 493.883Hz, so one may reasonably expect an FFT like the one above, but with a peak at ~493Hz.

In fact, this is what the frequencies looks like when you hit a B3 string:

An FFT when a B3 string is plucked. Harmonics give you peaks in unexpected places, just to mess with you.

See how there are peaks all over the place? Each string brings its own special combination of frequencies with it, called harmonics. One thing is for sure: it’s not a “pure” sample where you can infer that you’re hitting a given string just from the most active frequency.

I’m a little hard of understanding sometimes, so I attempted to work around this with some good ol’ fashioned number fishing and fudging. It kind of worked under very specific circumstances, but it really wasn’t robust.

Audio Analysis the better way

Then Chris Wilson helped me. For context, I’d got to the end of my hack-fudge approach and started googling for things like “please i am an idiot how do you do simple pitch detection?” As you might expect, the top results were Wikipedia articles that may as well be written in Ancient Egyptian hieroglyphics for all the sense they make. They’re seemingly written by people who already understand these topics, and whose sole aim seems to be to ensure that you won’t. I got the same deal when I made a 3D engine a few years back and, as with that period in my life, all of me screamed out for simple, treat-me-like-a-normal-human explanations. Thankfully that’s exactly what Chris provided over the course of several hours.

Autocorrelation

Autocorrelation seemed to be an ancient sacrificial method that required the innards of at least two doves, a penguin, and 17 pounds of lard.

Attempt number two: autocorrelation. To be fair, autocorrelation had come up in my hieroglyphics studies, but it seemed to be an ancient sacrificial method that required the innards of at least two doves, a penguin, and 17 pounds of lard. I was surprised that Chris suggested such a barbaric approach, given what a nice chap he is. Turns out I misunderstood what autocorrelation involves.

In retrospect I guess the name is a clue: auto- (self-) and correlation (matching). The idea is if you have an audio wave you can compare it to itself at various offsets. If you find a match then you have found where this wave repeats itself, even factoring in harmonics (more on that in a moment). Once you know when a wave repeats itself you have theoretically found its frequency.

Autocorrelation is where you attempt to match a wave to itself. The amount you have to move it gives you its periodicity, and therefore the pitch.

You can get the wave data from the Web Audio API (of course you can, what a lovely API) with getFloatTimeDomainData , which has nearly zero documentation and also sounds like a function named after buzz words’ greatest hits. But it does precisely what we need it to: it populates an array with floating point wave data with values ranging between -1 and 1.

The fftSize property on the AnalyserNode is used to determine how much data you get. If you were to set fftSize to 48,000 (which you can’t because the max limit is 32K and needs to be a power of 2, but stick with me), and you had a sample rate of 48kHz, you would get one second’s worth of wave audio data. As it happens I set my fftSize to 4,096, which gives me 4,096 / 48,000 ~= 85ms of wave data. Because I planned to compare the wave against itself, I had half of that, around 42ms of audio data, available to me for each pass.

My first attempt at autocorrelation compared the wave across all offsets (half the buffer, or 2,048 elements) and then returned the offset which had provided the nearest match: