I am a big fan of classical music and while listening to Bach the other day, I wondered if it would be possible to get numerical data on the actual notes used by a particular composer.

I remember as a kid I used to download MIDI files and was always fascinated how programs could read MIDIs into visual notes. MIDI data is encoded into byte chunks where each chunk includes a “command” like “start playing note” followed by the data chunk. In my case, I wanted to read just the data related to the actual notes. This was easy enough to do using the Java MIDI library. This is the workhorse class I put together to get out the note information:

Reading note information from MIDI in Java

package midireader; import java.util.ArrayList; import java.util.List; import javax.sound.midi.ShortMessage; public class MIDIAnalyzer { private static final String[] sharpNoteNames = {"C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"}; private static final String[] flatNoteNames = {"C", "Db", "D", "Eb", "E", "F", "Gb", "G", "Ab", "A", "Bb" ,"B"}; //Each track of the MIDI has //its own array of ShortMessages private ArrayList<ShortMessage[]> messages; public MIDIAnalyzer(List<ShortMessage[]> messages){ this.messages = (ArrayList) messages; } private int[] countNotes(ShortMessage[] smArray){ int[] notes = new int[12]; for(ShortMessage sm : smArray){ if(sm != null){ if(sm.getCommand() == ShortMessage.NOTE_ON){ //Taking mod 12 removes the octave information notes[sm.getData1() % 12]++; } } } return notes; } public void noteCountSummary(){ int[] notes = new int[12]; for(ShortMessage[] smArray : messages){ int[] trackNotes = countNotes(smArray); for(int i = 0; i < trackNotes.length; i++){ notes[i] = notes[i]+trackNotes[i]; } } for(int i = 0; i < notes.length; i++){ System.out.println(sharpNoteNames[i] + "\t" + notes[i]); } } }

With this code written, it was time to go back to my roots and download a bunch of MIDIs.

Notes Analysis

I wanted to compare two composers from different eras to see if there were any detectable stylistic differences based on the note selection. I chose Bach and Mozart who were both prolific (i.e. lots of data) and quintessential examples for their respective eras.

To control for key difference, I only downloaded pieces labelled as G Major. My dataset had 128,280 notes written by Bach and 131,107 notes written by Mozart.

Here is a display of the two composers’ note usage (click to enlarge):

And this compares each composers’ note percentages:

If you’re unfamiliar with music, in a G-major piece, we would expect to hear lots of G,A,B,D, and F#. This is because the combinations G-D-B and D-F#-A sound good together. Notice that the D is the similar note between each chord and it kind of ties them together. This is why both composers use D the most. You can see that the other four notes are high in both distributions.

What I find most interesting is that Bach’s notes tends to be more spread out. Bach’s pieces often modulate through different keys even from one measure to the next so we see more instances of D♯ and G♯, for example. Mozart on the other hand sticks more closely to the notes of the title key (G major). However, Mozart makes a lot more use of E which can be a major sixth or a relative minor first depending on how it is used. It’s really surprising to me how infrequently this note shows up for Bach. Maybe someone who know more about music than I can let me know why this is.

I would like to look into some later composers and see how the different notes are distributed. I would expect the note counts to be more even as atonality and twelve-tone music became more popular in the 20th century.