As part of the development team at OPUS, the world’s first decentralized music streaming platform, my responsibilities include working on the player’s AI features. I conduct research, experiments, and look into various methods of how AI can be implemented and what its greatest advantages and disadvantages are. — Mateusz Gruszka

In this article, I discuss music recommendation, a feature many of us take for granted these days, but behind which stand principles from mathematics and computer science. The methods described below will be used and perfected during the development process of the OPUS platform.

How do they do it?

People tend to have preferred sources for printed news, for example newspapers or magazines; however, in an increasingly online world this reality is constantly changing as users are flooded with information from a large range of different sources.

The same principle holds for music. With such a vast amount of information, it is becoming more and more difficult to select the songs that users will enjoy listening to. Consequently, users limit their consumption of music.

To overcome the problem of being overwhelmed by information, which can cause troubles when making decisions, recommender systems can be used. These systems use various different types of feedback from users to narrow down the options presented to a smaller, more manageable number. But what exactly is going on behind the scenes when we are recommended what track to listen to next or what movie to watch?

According to Netflix executives Carlos A. Gomez-Uribe and Neil Hunt their recommender system with suggestion algorithms saves them about $1 billion each year.

What we perceive as magic is, in fact, a combination of machine learning and algorithms used to ensure that you never run out of content to enjoy. If you are not familiar with the concept of machine learning, here is the formal definition given by Tom M. Mitchell in 1997: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” A simpler, more concise definition states: “Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.” But how exactly is a learning algorithm or recommender system that figures out your preferences constructed, and how can a single algorithm be built to be used by millions of people simultaneously? Before these questions can be answered, recommendation systems need to be explained more carefully. A recommender system is traditionally composed of:

- Users

- Items (e.g. movies, songs, articles)

- Preferences or Rating (e.g. movies that can be rated out of 5 stars)

Approaches to recommendation systems can be divided into four categories:

• Content-Based Filtering — Here, the system tries to recommend items similar to those a given user has liked in the past. A purely content-based recommender system makes recommendations based solely on the profile built up by analyzing the content of items which that user has rated in the past.

· Collaborative Filtering — The idea is finding users in a community that mutual preferences. If two users have similarly rated items in common then it can be assumed that they have the same taste. Such users constitute a group, otherwise known as a neighborhood. A given person’s recommendations are now based on what was positively rated by other members of the neighborhood of which that person is a part of. Collaborative filtering can also be split to focus on users (User Based Approach) or on items (Item Based Approach).

· Hybrid Filtering — A combination of the above approaches.

· Non-personalized systems — These involve summary statistics and, in some cases, product associations by using external data from the community, like a product that is a current best seller, most popular, trending, or hot. It may also provide a summary of community ratings, for example, how much a population likes a restaurant or summary of community ratings which turns into a list.

So, what problems are there?

While it seems that the solution is perfectly reasonable and simple to implement, every developer who undertakes this task is faced with a range of obstacles. First, there is the Cold Start Problem, which occurs when new users enter the system or new items are added to the catalogue. Given that an item has no rating and the user has no history of rating, the absence of data creates a significant challenge. Next, there is the Synonymy Problem, which arises when an item is represented with two or more different names, or when entries have similar meanings. Lastly, Shilling Attacks happen when a malicious user or a competitor enters the system and starts giving false ratings on the items, either to increase or to decrease the popularity of an item. Of course, there is a wide range of more examples that prove to be a hurdle for developers to overcome.

To tackle these obstacles, it is crucial to pick the most suitable algorithm and develop solutions while also planning ahead for future events. For the OPUS platform, we have not yet fully completed a recommendation algorithm for music just yet. One of the things I am currently working on is to inject and implement Model Based Collaborative Filltering as a concept and orthogonal matrix factorization algorithm called “Singular Value Decomposition approximate” (SVD++ for short). To tackle the Cold Start Problem, which appears in the Collaborative Filtering approach, I want to add a non-personalized system that will show new or popular items beside recommended songs as well as increase focus on new users and encourage them to interact with systems at from the very beginning to generate data.

In 2009, Netflix awarded $1 million to a team of researchers who developed an algorithm that improved the prediction accuracy of Netflix by 10%. While the winning algorithm was actually an ensemble of over 100 algorithms, SVD++ was one of the key algorithms that contributed the most to this achievement and is still used in production.

What is SVD++?

Boiling the concept down to a single sentence, SVD++ is an approximate method of computing a low-rank matrix by minimizing the squared error loss. The basic idea is that we want to decompose our original and very sparse matrix of data (where the rows are user IDs, the columns are song IDs, and at the crossing line are ratings) into two low-rank matrices that represent user factors and item factors. In this particular case, the items are songs and the factors can be, for example, the genres of the songs.

This is done by using an iterative approach to minimize the loss function. The most common way to do this is by using the Stochastic Gradient Descent (SGD), but others methods, such as ALS, are also possible. Essentially, SGD is a method that finds the best solutions for an existing dataset. The actual loss function to minimize by SGD includes a general bias term and two biases for both the user and the item.

There are systematic tendencies for some users to give higher ratings than others, and for some items to receive higher ratings than others. It is customary to adjust the data by accounting for these effects, which we encapsulate within the baseline estimates. Denote by μ the overall average rating. A baseline estimate for an unknown rating r_ui is denoted by b_ui and accounts for the user and item effects:

The parameters b_u and b_i indicate the observed deviations of user b and item i, respectively, from the average. The formulation is as follows:

The regularization term, following the addition symbol, is used to prevent overfitting of the dataset, which means that our algorithm is trained too hard with our train dataset and when we feed it new data to predict, the results would be inaccurate. Regularizations are here to cool down the algorithm and make it be more flexible for new inputs.

Once the user and item factors are computed, and the model is trained, we can turn to the initial task that is being attempted to solve, that is to predict unknown values in the original matrix. In order to do this, we simply multiply user and item factors and add it to the bias:

This is the standard SVD, sometimes also called the latent factor model. Now, let us investigate what the enhanced SVD++ model contributes to it. This will be briefly and superficially discussed in the next few lines. The SVD++ proposes the following prediction rule:

If you compare this to the previous equation on the SVD, you will notice that the only difference is the addition of the factor:

This is because it includes the effect of the implicit information as opposed to the p_u that only includes the effect of the explicit one. To correctly interpret this, it is vital to understand that a user rates an item is in itself an indication of preference. In other words, the chance that a user likes an item he or she has rated are higher than for a random non-rated item.

Conclusion

Ultimately, there is no real magic involved at all. With the formulae stated above, anyone can create recommendation algorithms at their whim.

Of course, you can use other concepts and algorithms, for instance KNN or DNN. It is possible that in the future I will modify this matrix factorization model, transform it into a deep learning model, and compare the performances of the old and new versions of system, but this is a discussion best left for my next article. For now, I will stick to the primary matrix factorization model. You will soon be able to see the above theory implemented in real life on the OPUS player, a seemingly simply feature that has the power to distinguish amateur projects from professional streaming platforms like OPUS. No music app has quite perfected this just yet, but there are some great ideas and solutions being worked upon already. Including smart algorithms at OPUS can bring the company to unprecedented heights.