3. Building the classifier

A decision tree, a popular tool in the field of machine learning, uses its tree-like structure to make decisions.

Photo by Stephen Milborrow on Wikipedia — “A tree showing survival of passengers on the Titanic (“sibsp” is the number of spouses or siblings aboard). The figures under the leaves show the probability of survival and the percentage of observations in the leaf. Summarizing: Your chances of survival were good if you were (i) a female or (ii) a male younger than 9.5 years with less than 2.5 siblings.”

Above is an image found on the Decision tree learning Wikipedia article. This tree references the movie Titanic to give us an intuitive example of how decision trees work.

The audio-based classifier will utilize decision trees by making predictions determined by each of the tree’s leaves. For instance, one leaf may have a statement such as “is acousticness > 0.5?” which it will then decide whether the song in question is predicted to be a hip-hop song or a punk/metal song. Now, the songs in our data set surely have more attributes than the people on the Titanic did, so our tree will look much more complex than the tree listed above. Above, we see that there are only three attributes the decision tree is taking into consideration. Our songs have 12 attributes. These attributes are what the decision tree bases its predictions off of, so it makes sense that the tree will be a lot bigger — not to mention the enormous amount of songs we will be feeding into it.

I chose to use scikit-learn’s Python tools — specifically the train-test-split function and the decision tree classifier. When building a classifier, the first step we must take is called fitting the classifier. We can think of fitting as exposing the classifier to data and telling it which classification it belongs to.

It’s pretty similar to the idea of studying. When studying for tests, you’re exposed to a lot of information that you must remember when it comes time to take the test. The key here (this is important) is that on the test, you will see problems you must solve that you have never seen before — but these problems have similarities to the ones you studied before taking the test. Using what you know about the problems you’ve studied, you solve these test problems to the best of your ability, hoping that you get a good exam grade in the end.

When fitting the decision tree classifier, its decision tree leaves are generated by the data its exposed to. After fitting, we can then test the classifier by exposing it to data it has never seen before, hoping it makes accurate predictions.

I built my decision tree classifier by splitting up the entire data set into 70% training and 30% testing data. I also decided to drop two audio attributes from consideration, as the difference between genres was too little. I made these decisions after researching how to avoid overfitting.

Before fitting, I utilized a method called bagging to further optimize the classifier. Bagging essentially takes your initial classifier, creates a bunch of copies of it, then trains them on random subsets of the original data set. All their averages are taken after training, resulting in one single averaged classifier. Bagging introduces randomization in the construction of classifiers, making it a great way to improve a single complex tree classifier.