Over the past weeks I’ve been slowly learning about recent developments in Machine Learning, specifically Neural Networks.

I’ve seen really mind-blowing examples of the power of such architectures, from recreating images using particular art styles to automatically forming word representations that account for pretty high-level semantic relations.

Recurrent Neural Networks (their most frecuent form, LSTMs) are particularly interesting in that they learn patterns in sequences. A great article showing their tremendous power is Andrej Karpathy’s Unreasonable Effectivenes of RNNs. If you haven’t, you should go read it now.

After seeing the wonderful things they can do, I decided to try and use them to tackle a problem I’ve been interested in for years: programmatic music composition. This is something I’ve attempted in the past using Markov models, Genetic Algorithms, and other techniques with pretty disappointing results.

The Experiment

I was really surprised when a project I thought would take me a whole weekend actually took me ~5 hours of coding at a busy Starbucks. (!)

Let me show you a short 8-second sample first, and then tell you how I did it:

That’s one of the first pieces my RNN composed. Not very good, right? But bear in mind that it learned how to generate that just by looking at examples of existing music. I didn’t have to program (or know about) any rule about how to structure notes to form a melody. Let’s see how:

First thing I did was looking for a good representation for the music to be composed. I eventually found the abc notation, which was particularly good for my purposes because it includes concepts of chords and melodies, which makes it easier to procedurally create more meaningful sounds.

Here’s a simple example I just wrote:

X : 1 T :"Hello world in abc notation" M : 4 / 4 K :C "Am" C , D , E , F , | "F" G , A , B , C | "C" D E F G | "G" A B e c

I won’t go over the details of the format, but if you know some musical notation it will look familiar. If you’re curious about how this sounds, you can listen to this in Ubuntu by saving the above snippet into a hello.abc file and running:

$ sudo apt-get install abcmidi timidity $ abc2midi hello.abc -o hello.mid && timidity hello.mid

Given that abc is a text format, I decided to give Karpathy’s char-rnn a spin. I actually ended up using Sherjil Ozair’s TensorFlow version of char-rnn, because TensorFlow (Google’s new Machine Learning framework) is way easier to play with than Torch. You should know that hardcore Machine Learning researchers don’t often use TensorFlow because it’s 3x slower (Justin Johnson told this to me and although I haven’t benchmarked it myself, most serious work I’ve seen uses either Caffe or Torch.

I then needed the data to train the RNN on. After googling a bit I came across the Nottingham Music Database, which has over 1000 Folk Tunes in abc notation. I downloaded this pretty small dataset, compared the >1M data points typically used to train Neural Nets for real.

I concatenated all the abc files together and started training the network on a AWS g2.2xlarge instance:

$ python train.py --data_dir data/music/

And voilà:

Without much work I was training a RNN on a folk music dataset.

After only 500 batches of training, the network produces mostly noise, but you could begin to guess a trace of the abc notation:

After 500 batches of training the RNN produced invalid abc notation.

Wait a couple more minutes, and with 1000 trained batches the outcome changes completely: While still strictly invalid abc format, this looks much like the correct notation.

As you can see, the network even starts generating titles for its creations (found in the T: field).

After 7200 batches, the network produces mostly fully correct abc notation pieces. Here’s an example: A valid abc notation piece produced by the trained RNN after 7200 batches.

OK, so the RNN can learn to produce valid abc notation files, but that doesn’t say anything about their actual musical quality. I’ll let you judge by yourself (I think they’re quite shitty but at least non-random. I don’t think I could have programmed an algorithm to generate better non-trivial music)

Results

I’ll list 4 more non-hand-picked pieces I just generated from the fully-trained network, along with their abc version and music sheet. The names are hilarious.

A GanneG Maman [0:33] (music sheet)

X : 1 T :A GanneG Maman P :A / 1 M : 4 / 4 L : 1 / 4 K :D z / 2 | :"D" FD / 2 F / 2 FA | "G" GB / 2 d / 2 BG | "D" F / 2 D / 2 A / 2 B / 2 A3 / 2 F / 2 | "Em" GE GG | "A7" E3 / 2 E / 2 "D" ED :: : "D" a / 2 f / 2 d / 2 f / 2 "Bm" fd / 2 f / 2 | "A" ea fe | "D" d2 A :|

Ad 197, via PR [1:01] (music sheet)

X : 1 T :Ad 197 , via PR M : 4 / 4 L : 1 / 4 K :D G | :"A" B / 2 c / 2 A / 2 G / 2 A / 2 G / 2 E / 2 | "D" FF / 2 E / 2 Fd | "D" FD "A" FF / 2 E / 2 | \ "A" EA / 2 G / 2 c3 / 2 B / 2 | \ "A" c / 2 c / 2 A A / 2 G / 2 E / 2 G / 2 | "A" Ac2 -|| "A" \ P : 3 f '/2f/2e/2 f/2a/2g/2f/2|ef' 3 / 2 b / 2 a / 2 g / 2 a / 2 c / 2 | "A" ae / 2 a / 2 a / 2 g / 2 f / 2 a / 2 | "D" d #"fe "E"c2:: "A" ef / 2 e / 2 "Bm" d3 / 2 f / 2 | "Em" g / 2 f / 2 e / 2 f / 2 "A" e / 2 d / 2 c / 2 B / 2 | "A" AB / 2 c / 2 "D" d / 2 e / 2 f / 2 e / 2 | \ "Em" de "A7" d / 2 c / 2 B / 2 A / 2 | "D" BA / 2 G / 2 A / 2 F / 2 D / 2 F / 2 | "D" D / 2 F / 2 A / 2 F / 2 DA | "D" df / 2 e / 2 a / 2 g / 2 f / 2 e / 2 | "G" d / 2 c / 2 B / 2 G / 2 "E7" BG / 2 A / 2 || "Bm" df / 2 d / 2 "E7" e / 2 d / 2 c / 2 B / 2 | "A" AA / 2 B / 2 eA

X : 1 T :Thacrack , via EF M : 6 / 8 K :D fg | fed e2d | "D" AFD F2A | "D" fgf "A" c2A | "D" ABf "C" ege | "D" d3 d2 :| P :B f / 2 e / 2 | "D" ddf f2f | "Bm" edc ABc | BdB BdB | "A" A3 "D" Ace | "A" agf fde | "E7" gfe ddB | "A7" ABA B2A | "D" FGF "A7" EDE | "D" d , 2 D "A7" AGF | "G" GF G2B | GBd gfg | "D" FdA A2d | "D" fed "A7" edc | "D" d2A AFG | "G" GFG "E7/f+" G2B | "A" A2c c2e | "D" dfa d2f | "Bm" fag B = cd | "Em" gfe "A7" A ^ GA | "D" d3 - d2 :|

Lea Oxlee [1:07] (music sheet) - my favorite!

X : : 2 T :Lea Oxlee % Nottingham Music Database S :Kevin Briggs , via EF M : 4 / 4 L : 1 / 4 K :D P :A A / 2 G / 2 | "D" F / 2 A / 2 F / 2 A / 2 FA | "G" Bd / 2 c / 2 B / 2 d / 2 B / 2 d / 2 | \ "A" c3c / 2 B / 2 | "D" Af f / 2 e / 2 f | "Em" e / 2 d / 2 c / 2 B / 2 "A7" c / 2 = c / 2 B / 2 A / 2 | "D" G / 2 A / 2 G / 2 F / 2 "Em" E / 2 F / 2 D / 2 E / 2 :| "F#" df fe / 2 f / 2 | "E7" gf z "A7" A / 2 B / 2 d / 2 ^ c / 2 | \ "A" ce "E7" e / 2 d / 2 c / 2 B / 2 | "A" AA A :| P :B e / 2 f / 2 | "A" ea ee / 2 f / 2 | "D" gf "A" ec | "D" dd / 2 f / 2 "A" ec | "D#m" d3 / 2 e / 2 dA | "A7" GE E2 | "D" FD D :|

Conclusions

Machine Learning is at a tipping point. The tools are getting better. Complexities are being abstracted. You don’t have to have a PhD to make apps that use state of the art network models. Very good frameworks and models are available for everyone to use. It’s a good time to hack something in Machine Learning. Even if you’re not an expert, you can achieve fun results.