Wrapping it all up.

How good is 21.8% accuracy?

Well to begin, it’s much better than random chance. It’s pretty amazing that a statistical model can understand and explain some of these nuanced relationships that I have internalized for years — especially considering that it knows nothing about what these songs actually sound like. However, the reality is that the model (like us humans) is very good at learning the songs that occur in a few particular patterns and pretty bad at the rest. These particular patterns happen when songs occur:

As common segues — Phish has a handful songs that [almost always] occur side-by-side, one after another. Our model gets those subsequent songs right a good amount of the time. (Ex. “Mike’s Song” > “I am Hydrogen” > “Weekapaug Groove” or “The Horse” > “Silent in the Morning” or “Swept Away” > “Steep”) As set openers/closers As encores And when guessing it’s time for the set breaks/encores themselves

Songs that the model performs best on (sorted by F1 Score)

Room for Improvement

A huge problem with this modeling approach is that it is only looking at sequential data… meaning it has no concept of categorical and abstract knowledge surrounding Phish. For example, the model doesn’t recognize what a [newer] 3.0 song is, and consequently, doesn’t understand that these songs are more likely to be played now vs. an [older/now rarer] 1.0 song. A huge improvement would be to incorporate categorical data (era, venue, year, album, etc.) into the neural network along with the setlist sequences.

Another means for improvement (or at least improved relevance), could be to exclude the first ~10-15 years of data. As shown below, Phish played the majority of their shows in the early 90’s (128 shows in 1994!) when they had relatively few unique songs that were played (~375 of today’s >850), meaning the majority our training data is heavily skewed to learn patterns associated with those 375 songs (during Phish 1.0). A good example of this is “Cold as Ice” > “Cracklin Rosie” > “Cold As Ice”; which was played 46 times between 1992–1995 and only 4 times since.

To make things more complicated, Phish played certain songs regularly back then, that are very rarely ever played now. Not to mention, the new songs that have unfolded since Phish 1.0 [and continue to keep coming] have been played way less frequently overall, so there are less patterns to learn from. Consequently, this is a very difficult problem to model.

Setlist Generation

Using the newly trained neural network [artfully named TrAI], we can recursively make predictions to generate what Phish’s next setlist will be based on an input of the most recent 50 songs played. Without further adieu, here are TrAI’s predictions for the November 29th 2019, Fall Tour opener in Providence, RI: