joesul.li/van

Beat Detection Using JavaScript and the Web Audio API By Joe Sullivan

This article explores using the Web Audio API and JavaScript to accomplish beat detection in the browser. Let’s define beat detection as determining (1) the location of significant drum hits within a song in order to (2) establish a tempo, in beats per minute (BPM).

If you’re not familiar with using the WebAudio API and creating buffers, this tutorial will get you up to speed. We’ll use Go from Grimes as our sample track for the examples below.

Presence of "peaks" in a song above 0.8, 0.9, and 0.95 respectively.

Premise

The central premise is that the peak/tempo data is readily available; people can, after all, very easily dance to music. Therefore, applying common sense to the data should reveal that information.

Setup

I’ve selected a sample song. We’ll take various cracks at analyzing it and coming up with a reasonable algorithm for determining tempo, then we’ll look out-of-sample to test our algorithm.

Song data

A song is a piece of music. Here we’ll deal with it as an array of floats that represent the waveform of that music measured at 44,100 hz.

Getting started

I placed my initial effort at the top of the screen. What you see is a chart of all the floats in the buffer (again, representing the song) above a certain threshold, mapped to their position within the song. There does appear to be some regularity to the peaks, but their distribution is hardly uniform within any threshold.

If we can find a way to graph the song with a very uniform distribution of peaks, it should be simple to determine tempo. Therefore I’ll try to manipulate the audio buffer in order to achieve that type of distribution. This is where the Web Audio API comes in: I can use it to transform the audio in a relatively intuitive way.

// Function to identify peaks function getPeaksAtThreshold ( data , threshold ) { var peaksArray = []; var length = data . length ; for ( var i = 0 ; i < length ;) { if ( data [ i ] > threshold ) { peaksArray . push ( i ); // Skip forward ~ 1/4s to get past this peak. i += 10000 ; } i ++ ; } return peaksArray ; }

Filtering

How simple is it to transform our song with the Web Audio API? Here’s an example that passes the song through a low-pass filter and returns it.

// Create offline context var offlineContext = new OfflineAudioContext ( 1 , buffer . length , buffer . sampleRate ); // Create buffer source var source = offlineContext . createBufferSource (); source . buffer = buffer ; // Create filter var filter = offlineContext . createBiquadFilter (); filter . type = "lowpass" ; // Pipe the song into the filter, and the filter into the offline context source . connect ( filter ); filter . connect ( offlineContext . destination ); // Schedule the song to start playing at time:0 source . start ( 0 ); // Render the song offlineContext . startRendering () // Act on the result offlineContext . oncomplete = function ( e ) { // Filtered buffer! var filteredBuffer = e . renderedBuffer ; };

Now I’ll run a low-, mid-, and high- pass filter over the song and graph the peaks measured from them. Because certain instruments are more faithful representatives of tempo (kick drum, snare drum, claps), I’m hoping that these filters will draw them out more so that “song peaks” will turn into “instrument peaks”, or instrument hits.

Peaks with a lowpass, highpass, and bandpass filter, respectively.

Each of these graphs have some pattern to them; hopefully, we’re now looking at the pattern of individual instruments. The low-pass graph, in fact, is looking particularly good. I’d venture to guess that each peak we see on the low-pass graph represents a kick drum hit.

At this point it might be a good idea to take a look at what the data should look like, so we can start evaluating our method visually.

The actual tempo of the song is around 140 beats per minute. Let’s take a look at that by drawing a line for each measure, then our lowpass filter again:

Each measure of the song is represented by a single line (top); Lowpass peaks as seen before (bottom)

We seem to have a problem in that much of the song, especially the beginning and the end, is not being picked up in the peaks. Of course, this is because the beginning and end of the song are quieter, i.e. have less peaks, than the rest of the song.

The volume of the song at a given measure.

Low-pass peaks, with the height of each bar representing the track volume at that point.

Let's try something...

I think that the lowpass graph is tracking the drum rhythm well enough that we can now measure the length of a drum pattern and jump from there to tempo. But how to measure the length of a pattern?

Let’s draw a histogram listing the average interval of time between peaks:

3.42s 109 6.85s 72 2.14s 22 1.71s 19 5.14s 17 5.57s 15 4.71s 14 2.79s 13 6.22s 12 4.05s 12 4.06s 11 2.12s 11 6.22s 11 6.21s 10 4.06s 10 2.76s 10 6.22s 10 8.14s 10 2.79s 10 2.80s 10 Average interval between peaks in a song.

This measurement is not just of the closest neighbor but of all neighbors in the vicinity of a given peak.

This is very cool. I didn’t expect to see a strong mode emerge, but with 3.42 and 3.42*2 taking first and second place, I think we can make one conclusion: 3.42s is a very important interval in this song!

Just how does 3.42s relate to the tempo? Might it be the length of a measure? Well, a song with measures of length 3.42s would have beats of length 0.855s, of which you can fit 70.175 into a minute. In other words: measure length 3.42s maps to 70BPM.

Of course, the magic number we’re looking for is double that: 140. If you double the tempo, you halve the measure length, so the interval we’re looking for is 1.72s, and as you can see we measured that duration 19 times. Isn’t that nice?

// Function used to return a histogram of peak intervals function countIntervalsBetweenNearbyPeaks ( peaks ) { var intervalCounts = []; peaks . forEach ( function ( peak , index ) { for ( var i = 0 ; i < 10 ; i ++ ) { var interval = peaks [ index + i ] - peak ; var foundInterval = intervalCounts . some ( function ( intervalCount ) { if ( intervalCount . interval === interval ) return intervalCount . count ++ ; }); if ( ! foundInterval ) { intervalCounts . push ({ interval : interval , count : 1 }); } } }); return intervalCounts ; }

Let’s draw the histogram again, this time grouping by target tempo. Because most dance music is between 90BPM and 180BPM (actually, most dance music is between 120-140BPM), I’ll just assume that any interval that suggests a tempo outside of that range is actually representing multiple measures or some other power of 2 of the tempo. So: an interval suggesting a BPM of 70 will be interpreted as 140. (This logical step is probably a good example of why out-of-sample testing is important.)

140 BPM 200 112 BPM 22 93.33 BPM 17 172.3 BPM 15 101.8 BPM 14 Frequency of intervals grouped by the tempo they suggest.

It’s quite clear which candidate is a winner: nearly 10x more intervals point to 140BPM than to its nearest competitor.

// Function used to return a histogram of tempo candidates. function groupNeighborsByTempo ( intervalCounts ) { var tempoCounts = [] intervalCounts . forEach ( function ( intervalCount , i ) { // Convert an interval to tempo var theoreticalTempo = 60 / ( intervalCount . interval / 44100 ); // Adjust the tempo to fit within the 90-180 BPM range while ( theoreticalTempo < 90 ) theoreticalTempo *= 2 ; while ( theoreticalTempo > 180 ) theoreticalTempo /= 2 ; var foundTempo = tempoCounts . some ( function ( tempoCount ) { if ( tempoCount . tempo === theoreticalTempo ) return tempoCount . count += intervalCount . count ; }); if ( ! foundTempo ) { tempoCounts . push ({ tempo : theoreticalTempo , count : intervalCount . count }); } }); }

Review

As far as I’m concerned, we’ve now established the tempo of this song. Let’s go back over our algorithm before doing some out-of-sample testing:

Run the song through a low-pass filter to isolate the kick drum Identify peaks in the song, which we can interpret as drum hits Create an array composed of the most common intervals between drum hits Group the count of those intervals by tempos they might represent. We assume that any interval is some power-of-two of the length of a measure

We assume the tempo is between 90-180BPM Select the tempo that the highest number of intervals point to

Testing

Let’s now test it! I’m using for a dataset the 22 Jump Street soundtrack because it’s full of contemporary dance music. I pulled down the 30-second iTunes previews, then grabbed the actual tempos from Beatport. Below are the top five guesses for each song. I’ve highlighted in green where a guess matches the actual tempo.

turn-down-for-what (100) 100.0 BPM 12 115.2 BPM 7 130.6 BPM 5 99.94 BPM 5 99.00 BPM 4 work-hard-play-hard (140) 140 BPM 22 112 BPM 19 93.33 BPM 7 125.2 BPM 6 127.5 BPM 5 nrg (128) 127.9 BPM 23 146.2 BPM 14 170.7 BPM 13 102.3 BPM 10 102.4 BPM 8 3030 (128) 127.9 BPM 77 146.2 BPM 41 170.7 BPM 30 102.3 BPM 27 113.8 BPM 22 rattle (128) 127.9 BPM 32 146.2 BPM 16 102.3 BPM 15 170.7 BPM 14 113.7 BPM 9 stupid-facedd (112) 112 BPM 32 137.8 BPM 20 162.8 BPM 10 127.9 BPM 10 94.24 BPM 9 wasted (112) 112 BPM 17 149.2 BPM 15 100.8 BPM 8 98.27 BPM 7 130.0 BPM 7 a**-n-t****es (130) No tempo guesses! cant-you-see (128) 143.2 BPM 5 156.1 BPM 3 134.6 BPM 3 121.5 BPM 3 109.4 BPM 3 check-my-steezo (130) 130.0 BPM 16 148.5 BPM 9 173.3 BPM 7 104.0 BPM 6 115.5 BPM 5 midnight-city (105) 105 BPM 58 140 BPM 58 168 BPM 38 120 BPM 24 93.33 BPM 17 models-and-bottles (160) 160 BPM 11 127.3 BPM 4 179.3 BPM 3 134.4 BPM 3 98.54 BPM 3 too-turnt-up (145) 144.9 BPM 15 178.4 BPM 13 116.0 BPM 7 122.0 BPM 6 115.9 BPM 5 i-own-it (124) 123.9 BPM 13 124.0 BPM 7 90.38 BPM 5 99.19 BPM 5 122.0 BPM 4 express-yourself (108) 144 BPM 18 108 BPM 16 172.8 BPM 8 123.4 BPM 6 96 BPM 3 boy-oh-boy (130) 130.0 BPM 33 115.5 BPM 18 173.3 BPM 16 104.3 BPM 13 129.7 BPM 11 freak (128) 102.3 BPM 6 103.0 BPM 4 144.4 BPM 4 157.7 BPM 4 128.1 BPM 3 22-jump-street (133) 132.9 BPM 43 106.3 BPM 36 177.2 BPM 25 151.9 BPM 17 118.1 BPM 16

Well, that’s pretty reassuring! The algorithm accurately predicts tempo 14 times out of 18 tracks, or 78% of the time.

Furthermore, it looks like there’s a strong correlation between # of peaks measured and guess accuracy: there’s no instance in which it makes more than three observations of the correct tempo where it doesn’t place the correct tempo in the top two guesses. That suggests to me that (1) our assumptions about the relationship between neighboring peaks and tempo are correct and (2) by tweaking the algorithm to surface more peaks we can increase its accuracy.

There’s an interesting pattern present in wasted, midnight-city, and express-yourself: the two top tempo candidates are related by a ratio of 4/3, and the lower of the two is correct. In midnight-city, the two are tied, whereas in express-yourself the higher of the two knocks out the actual tempo for the top spot. Listening to each sample, they all have triplets in the drum section, which explains the presence of the higher ratio. The algorithm should be tweaked to identify this sympathetic tempos and fold their results into the lower of the two.

However, this article has exhausted its scope. The bottom line is: you can do pretty good beat detection using JavaScript, the Web Audio API, and a couple of rules of thumb.