The Problem

Before delving into their relationship, let me first define the problem. I began this project with the simple desire to generate pop music using deep learning, or ‘A.I.’ as laymen call it. This quickly led me to LSTMs (Long Short-Term Memory units), a particular version of a Recurrent Neural Network (RNN) that is very popular for generating texts and making music.

But as I read more into the subject, I began to question the logic of applying RNNs and their variants to generate pop music. The logic seemed to based on several assumptions about the internal structure of (pop) music that I did not fully agree with.

One specific assumption is the independent relationship between the harmony and the melody (description of the two is above).

Take for instance the 2017 publication from the University of Toronto: Song from Pi: A Musically Plausible Network for Pop Music Generation (Hang Chu, et al). In this article, the authors explicitly “assume…the chords are independent given the melody” (3, italics mine). Based on this specification, the authors build a complex and multi-layered RNN model. The melody has its own layer for generating notes (the key and the press layer), which is autonomous from the chord layer. On top of the independence, this particular model conditions the harmony on the melody for generation. This just means that the harmony is dependent on the melody for note generation.

Hang Chu, et al.’s stacked RNN model. Each layer is responsible for addressing different aspect of a song.

This kind of modeling feels odd to me, as it does not seem to approximate how humans would approach composing popular music. Speaking personally as a classically trained pianist, I would never consider writing down melody notes without first considering the harmony notes. This is because the harmony notes both define and limit what my melody notes can be. Axis of Awesome, in their once viral YouTube video, demonstrated this idea long ago.

Video demonstrating how different pop melodies are all dependent on the same four chords.

Their video displays a defining attribute of western pop music: that harmony, or those four chords, strongly determine what the melody will be. In data science language, we can say that a conditional probability regulates and resolves the statistical relationship between the harmony and the melody. This becomes the case as the melody notes are naturally dependent on what the harmony notes are. One could thus argue that the harmony notes both inherently limit and enable which melody notes can be chosen in a particular song.