I’ve been involved in building several different types of recommendation systems, and one thing I’ve noticed is that each use case is different from the next, as each aims to solve a different business problem. Let’s consider a few examples:

Movie/Book/News Recommendations — Suggest new content that increases user engagement. The aim is to introduce users to new content that may interest them and encourage them to consume more content on our platform. Stock Recommendations — Suggest stocks that are most profitable to the clients. The recommendations may be stocks that they have traded in historically. Novelty does not matter here; profitability of the stock does. Product Recommendations — Suggest a mix of old and new products. The old products from users’ historical transactions serve as a reminder of their frequent purchases. Also, it is important to suggest new products that the users may like to try.

In all of these problems, the common thread is that they aim to increase customer satisfaction and in turn drive business in the form of increased commissions, greater sales, etc. Whatever the use case may be, the data is typically in the following format:

Customer ID, Product ID (Movie/Stock/Product), No: of Units/Rating, Transaction Date

Any other feature like the details of the product or demographics of the customer

Going forward, here are the topics I will be covering:

Methods used for building recommendation systems — Content-based, Collaborative Filtering, Clustering

Evaluation Metrics — Statistical accuracy metrics, Decision Support accuracy metrics

Things to keep in mind

Methods

There are 2 major approaches for building recommendation systems — content-based and collaborative filtering. In the following section, I will discuss each one of them and when they are suitable.

Content based

The gist of this approach is that we match users to the content or items they have liked or bought. Here the attributes of the users and the products are important. For example, for movie recommendations, we use features such as director, actors, movie length, genre, etc. to find similarity between movies. Furthermore, we can extract features like sentiment score and tf-idf scores from movie descriptions and reviews. (The tf-idf score of a word reflects how important a word is to a document in a collection of documents). The aim of content-based recommendation is to create a ‘profile’ for each user and each item.

Consider an example of recommending news articles to users. Let’s say we have 100 articles and a vocabulary of size N. We first compute the tf-idf score for each of the words for every article. Then we construct 2 vectors:

Item vector: This is a vector of length N. It contains 1 for words that have a high tf-idf score in that article, otherwise 0.

2. User vector: Again a 1xN vector. For every word, we store the probability of the word occurring (i.e. having a high tf-idf score) in articles that the user has consumed. Note here, that the user vector is based on the attributes of the item (tf-idf score of words in this case).

Once we have these profiles, we compute similarities between the users and the items. The items that are recommended are the ones that 1) the user has the highest similarity with or 2) has the highest similarity with the other items the user has read. There are multiple ways of doing this. Let’s look at 2 common methods:

Cosine Similarity:

To compute similarity between the user and item, we simply take the cosine similarity between the user vector and the item vector. This gives us user-item similarity.

To recommend items that are most similar to the items the user has bought, we compute cosine similarity between the articles the user has read and other articles. The ones that are most similar are recommended. Thus this is item-item similarity.

Cosine similarity is best suited when you have high dimensional features, especially in information retrieval and text mining.

2. Jaccard similarity:

Also known as intersection over union, the formula is as follows:

This is used for item-item similarity. We compare item vectors with each other and return the items that are most similar.

Jaccard similarity is useful only when the vectors contain binary values. If they have rankings or ratings that can take on multiple values, Jaccard similarity is not applicable.

In addition to the similarity methods, for content based recommendation, we can treat recommendation as a simple machine learning problem. Here, regular machine learning algorithms like random forest, XGBoost, etc., come in handy.

This method is useful when we have a whole lot of ‘external’ features, like weather conditions, market factors, etc. which are not a property of the user or the product and can be highly variable. For example, the previous day’s opening and closing price play an important role in determining the profitability of investing in a particular stock. This comes under the class of supervised problems where the label is whether the user liked/clicked on a product or not(0/1) or the rating the user gave that product or the number of units the user bought.