Generating More of My Favorite Aphex Twin Track

Have you ever heard a song you liked so much you wished it would last forever?

Take a moment to listen if you’ve never heard this before.

“aisatsana” is the final track off Aphex Twin’s 2012 release, Syro. A departure from the synthy dance tunes which make up the majority of Aphex Twin’s catalog, aisatsana is quiet, calm, and perfect for listening to during activities which require concentration. But with a measly running time just shy of five and a half minutes, the track isn’t nearly long enough to sustain a session of reading or coding. Playing the track on repeat isn’t satisfactory; exact repetition becomes monotonous quickly. I wished there were an hour-long version of the track, or even better, some system which could generate an endless performance of the track without repetition. Since I build software for a living, I decided to try creating such a system.

Musical Structure

Allow me to explain the relatively simple structure of aisatsana. If you’re not familiar with music theory, I’ll do my best to define the terms I’ll be using and avoid any that aren’t necessary. Try not to get hung up on the vocabulary though.

A beat is an abstract unit of time. For example, I could tell you to snap your fingers on beats one and three, and clap your hands on beats two and four, and the resulting song would go snap, clap, snap, clap. The speed at which a song is played can be measured in beats per minute, or BPM. If I told you to play that same song at 60 BPM (or one beat per second), you’d either be snapping or clapping with each tick of a clock, and it would take four seconds to play the whole song. aisatsana is played at 102 BPM.

aisatsana follows a very simple pattern. If you start counting at the first note, every 16 beats contains a sequence of notes, which I call a phrase. For example, from 3 seconds to 12 seconds contains one phrase, and from 12 seconds to 21 seconds contains another phrase. The piece continues to play a new phrase every 16 beats until the end, for a total of 32 phrases.

This is much simpler than most examples of popular music, which usually have markedly different sections like the chorus, verses, and maybe a bridge or a pre-chorus.

With this perspective, one could describe aisatsana succinctly as an algorithm:

Every 16 beats, play a 16 beat phrase

I think about this algorithm in two parts:

Every 16 beats, do something. Play a 16 beat phrase.

To create a system which could play aisatsana-like music endlessly, it would need to be able to fulfill these two requirements. Part one is easy (remember, a beat is just a unit of time, so it can be read as “every X seconds, do something”). Part two is a little trickier.

An Algorithm For Writing Music

One totally valid strategy for making aisatsana last forever would be to separate the 32 phrases and write a program to select and play a phrase every 16 beats. The result would probably be nicely varied, and no doubt enjoyable to listen to for longer than simply playing the original track over and over. However, I feel this approach would still be too repetitive. My brain would learn to recognize all 32 phrases and the output of such a system would become boring.

It was important to me that the system could create and play new phrases in addition to the original ones. The challenge was to generate new phrases which sounded similar to the originals; the system shouldn’t just play random notes for 16 beats. Ideally, someone who’d never heard aisatsana before could listen to my system without knowing which phrases were new and which were from the original track.

I began researching methods for generating new music which was similar to some input. Naturally, I found many solutions which involved deep learning techniques. However, with a paltry sample size of 32 phrases, I was worried these techniques would require significantly more input than I had to offer. Instead, I decided to try an older technique which used Markov Chains.

You can learn all about Markov chains with a quick internet search, but I’ll try to explain it with an example. A Markov chain records a set of possible states and the probabilities of transitioning from one state to another. Let’s pretend in your whole life you only ever go to three places: your home, your workplace, and the grocery store. In this sad existence of yours there are three states: either you’re at home (state one), or you’re at your workplace (state two), or you’re at the grocery store (state three).

If I were creepy, I could follow you around for some time and record where you go. Eventually, I could analyze my data to determine the probability of where you’ll go next based on where you currently are. For example, perhaps I’ve observed that when you’re at home there’s an 80% chance the next place you’ll go is your workplace, and a 20% the next place you’ll go is the grocery store. When you’re at work, there’s a 50% chance you’ll next go to the grocery store and a 50% chance you’ll go home. Finally, when you’re at the grocery store, there’s a 95% chance you’ll go home next and only a 5% chance you’ll go to work next.

This is all that’s needed to create a Markov chain: states, and the probabilities of transitioning from each state to the others. I’ll apply this to music with a simple example.

Below I’ve drawn out two four-beat phrases in standard musical notation. I’ve also written the note names next to each note and the beat count at the top in case you don’t read music. So to play phrase one, you’d play an A on beat one, an F on beat two, an A again on beat three, and an F again on beat four. To play phrase two, you’d play an E on beat one, a C on beat two, an A on beat three, and a C on beat four. Easy!