Recommender systems are useful for recommending users items based on their past preferences. Broadly, recommender systems can be split into content-based and collaborative-filtering types.

Content-based recommendations : Recommend users items based on their past buying records/ratings. One way to do this is to use a predictive model on a table of say, characteristics of items bought by the user, run through a list of new items and try to predict whether the user will like to buy the items. This can be done with typical binary classification supervised learning methods like logistic regression.

The disadvantage of this method is there is no serendipity – items recommended tend to be those that you already know you want. There is no, “hey how did the system know I wanted this?”. Which brings us to …

Collaborative-Filtering Types : Matches users to people with similar tastes. Users who have similar tastes are put in a “basket” algorithmic-ally, and recommendations are given based on what these users like on a whole. There are 3 approaches to this : user-user collaborative filtering, item-item collaborative filtering and matrix factorization.

We will concentrate on collaborative filtering for the purposes of this article. Here, we will use the Surprise python package, an excellent open-source library by Nicolas Hug which has most of the fundamental algorithms. http://surpriselib.com/ Also, we will make use of the in-built dataset in Surprise, movieLens, for this method.

User-user Collaborative Filtering

Things to note :

The output is the prediction of user u’s rating on item i.

We utilize the similarity measure between user u and user v in this case.

Other than that, the algorithm is already coded for u in Surprise.

http://surprise.readthedocs.io/en/stable/knn_inspired.html

Item-Item Collaborative Filtering

This means that instead of using user similarity, we use item similarity measure to calculate the prediction.

Note that similarity in the above equation is now between item i and item j, instead of user u and v as before.

Advantages of item-based filtering over user-based filtering :

Scales Better : User-based filtering does not scale well as user likes/interests may change frequently. Hence, the recommendation needs to be re-trained frequently. Computationally Cheaper : In many cases, there are way more users than items. It makes sense to use item-based filtering in this case.

A famous example of item-based filtering is Amazon’s recommendation engine.

https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf

Matrix Factorization

While user‐based or item‐based collaborative filtering methods are simple and intuitive, Matrix Factorization techniques are usually more effective because they allow us to discover the latent features underlying the interactions between users and items. We don’t actually know these latent features. The famous singular vector decomposition (SVD) shown here employs the use of gradient descent to minimize the squared error between predicted rating and actual rating, eventually getting the best model.

Click to access math420-UPS-spring-2014-gower-netflix-SVD.pdf

Again, Surprise has done the hard plumbing for you, and all is needed is to utilize the SVD() class.

See http://surprise.readthedocs.io/en/stable/matrix_factorization.html for more information.

In the above code, we use GridSearchCV to do a brute-force search for the hyper-parameters for the SVD algorithm. After doing a cross validation that these are indeed the best values, we use these hyper-parameter values to train on the training set.

Eventually, we evaluate the model on the test set.

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s). Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean Std MAE (testset) 0.7191 0.7158 0.7138 0.7166 0.7254 0.7181 0.0040 RMSE (testset) 0.9173 0.9162 0.9125 0.9179 0.9271 0.9182 0.0048 Fit time 1.04 1.11 1.03 1.01 0.90 1.02 0.07 Test time 2.08 2.06 2.06 2.05 2.06 2.06 0.01 {'test_mae': array([0.71909501, 0.71579784, 0.71384567, 0.71656901, 0.72541331]), 'fit_time': (1.0419056415557861, 1.1055827140808105, 1.0349535942077637, 1.01346755027771, 0.9016950130462646), 'test_rmse': array([0.91726677, 0.91622589, 0.91245293, 0.91793969, 0.92711428]), 'test_time': (2.0783369541168213, 2.0616250038146973, 2.0627968311309814, 2.049354314804077, 2.058506727218628)} SVD : Test Set RMSE: 0.9000

All the code from this article can be found here:

https://github.com/lppier/Recommender_Systems