2. Spotlight

Spotlight is a well-implemented python framework for constructing a recommender system. It contains two major types of models, factorization model and sequence model. The former one makes use of the idea behind SVD, decomposing the utility matrix (the matrix that records the interaction between users and items) into two latent representation of user and item matrices, and feeding them into the network. The latter one is built with time-series model such as Long Short-term Memory (LSTM) and 1-D Convolutional Neural Networks (CNN). Since the backend of Spotlight is PyTorch, make sure you have installed proper version of PyTorch before using it.

Interactions

The utility matrix is called interactions in Spotlight. To create an implicit interaction, we specify the ids of the users and items for all the pairs of user-item interactions. The additional rating information turns this interaction into an explicit one.

Factorization Model

A factorization model takes in an implicit or explicit interaction. We will use the implicit one for easy illustration.

Implicit Feedback (https://github.com/maciejkula/netrex/)

Its idea is very similar to SVD where users and items are mapped into a latent space so that they are directly comparable. Essentially, we use two embedding layers to represent users and items, respectively. The target is the interaction (utility matrix) that we passed in. To compute the score for a user-item pair, we take the dot product of the latent representation for that user and item, and passing it through a sigmoid activation function. By computing the loss (more on that later) for all the user-item pairs with regard to the true interaction, we can back-propagate and optimize the embedding layers. The network structure is shown as the figure below.

Neural Network Structure (https://github.com/maciejkula/netrex/)

We only need a few lines of code to train such model with Spotlight, which looks very similar to scikit-learn toolkit:

Sequential Model

A sequential model treats the recommendation problem as a sequential prediction problem. Given the past interaction, we want to know which item the user is most likely to like in the next time step. For instance, assume user A has interaction with items whose ids are in the sequence of [2, 4, 17, 3, 5]. Then we will have the following expanding window prediction.

[2] -> 4

[2, 4] -> 17

[2, 4, 17] -> 3

[2, 4, 17, 3] -> 5

The array on the left stores the past interaction, while the integer on the right represents the item that user A will interacts next.

To train such model, we simply turn the original interaction object into a sequential interaction object. The remaining are the same.

Note that to_sequence() function pads zeros in front of the sequence whose length is not long enough to ensure that every sequence has the same length.

Therefore, items with id 0 ought to be changed to other arbitrary unused id numbers in order for this function to work.

Choice of Loss Function

When specifying the model, we have the flexibility to change the loss function. Model with different loss functions may have significant difference in the performance. I will briefly describe two main types of loss function defined in Spotlight.

‘pointwise’: This is the most simple form of loss function compared with others. Due to the sparseness of the sample (a lot of 0 in the utility matrix), it is not computationally feasible to take all the items into account. Thus, instead of computing loss for all the items given a user, we only consider a portion of negative samples (item that user has not interacted with) randomly selected and all the positive samples.

‘bpr’: Bayesian Personalized Ranking (BPR) gives every item a rank for each of the user. It tries to make sure that rank for positive samples are higher than that of the negative samples with the following formula.

ranking loss

Now, you have learned how to build a recommender system with Spotlight. It is very simple to use with a decent amount of flexibility to fulfill your need. Though for a majority of the problems, the sequence models outperform the factorization one, it takes a lot longer to train sequence models. In addition, it would not be very helpful if the data does not have a clear sequential correlation when applying sequence model.

3. Item2Vec

The Item2Vec idea came to my mind last month when I was participating a competition, International Data Analysis Olympia (IDAO). The challenge it gave required participants to build a recommender system for Yandex. Since I was learning Word2Vec then, I thought that perhaps similar concept can also be used in recommender systems. I’m not sure if this idea has been pointed out in any paper or article, but I have not seen any similar application of Word2Vec’s concept in this field. The rough idea behind Word2Vec is that we leverage distributive representation to encode each word. Namely, each word is represented by a vector determined by other words surrounding this word. If you would like to know more about Word2Vec, you can refer to my previous blog post: Word2Vec and FastText Word Embedding with Gensim. Similarly, I tried to use distributive representation to encode each item based on the items that user interacted with before and after interacting with it.

For each of the user, I first created an item list in chronological order. Then, Gensim’s Word2Vec model was trained on these item list. The trained item vectors were stored in the disk so that we can load it for later use.

After that, we load the trained item vectors into an embedding matrix.

We then define our model for predicting the user’s future interaction. Basically, it is a GRU model accelerated by CuDNN. If you don’t have a Nvidia GPU, don’t worry about it. You can simply replace CuDNNGRU with GRU.

Note that at I loaded the pre-trained embedding matrix into the embedding layer of the model and froze it by setting trainable to be false.

4. Performance

I have tested some of the models that I mentioned in both parts of this series on IDAO’s public board test set. The public board score for each of them is shown in the following table.

Public Board Score

It seems that Neural Network-based models do not necessarily beat traditional method for building a recommender system. While being simple to understand and implement, SVD’s score is on par with the Spotlight Sequence model which takes a much longer time to train. Item2Vec, surprisingly, is the best model among all the models that I have tested out. It is true that we cannot judge all these models with only one test set. This gives you a rough idea of how good each model is.

5. Conclusion

I have discussed two types of models implemented in Spotlight toolkit and Item2Vec created by myself which relies on Word2Vec concept. I also compared the models that I have mentioned based on IDAO’s public board test set. It turns out that SVD is an efficient solution to a recommender system, and that Item2Vec has demonstrated its ability to recommend items more accurately. Should you have any problem or question regarding to this article, please do not hesitate to leave a comment below or drop me an email: khuangaf@connect.ust.hk. If you like this article, make sure you follow me on twitter for more Machine Learning/ Deep Learning blog post!