Building a Simple RBM Model Using Pytorch

In order to install PyTorch, head over to the official PyTorch website and install it depending on your operating system. We’ll use the movie review data set available at Grouplens.

Importing the necessary libraries

We kick off by importing the libraries that we’ll need, namely:

Numpy for scientific computation

Pandas for loading in our dataset

torch the Pytorch library import

torch.nn as nn for initializing the neural network

for initializing the neural network torch.nn.parallel for parallel computations

for parallel computations torch.optim as optim for the optimizer

for the optimizer torch.utils.data for data loading and processing.

for data loading and processing. autograd for implementing automatic differentiation

Loading the dataset

In the next step, we import the users, ratings, and movies dataset. In our case, our dataset is separated by double colons. The dataset does not have any headers so we shall pass the headers as none. We then use the latin-1 encoding type since some of the movies have special characters in their titles.

We then set the engine to Python to ensure the dataset is correctly imported. The first column of the ratings dataset is the user ID , the second column is the movie ID , the third column is the rating and the fourth column is the timestamp .

Preparing the test set and training set

Let’s now prepare our training set and test set. Our test and training sets are tab separated; therefore we’ll pass in the delimiter argument as \t . As we know very well, pandas imports the data as a data frame. However, we need to convert it to an array so we can use it in PyTorch tensors. We do that using the numpy.array command from Numpy. We also specify that our array should be integers since we’re dealing with integer data types.

Generate matrix for use in the RBM

In order to build the RBM, we need a matrix with the users’ ratings. This matrix will have the users as the rows and the movies as the columns. The matrix will contain a user’s rating of a specific movie. Zeros will represent observations where a user didn’t rate a specific movie.

In order to create this matrix, we need to obtain the number of movies and number of users in our dataset. For no_users we pass in zero since it’s the index of the user ID column. The way we obtain the number of users is by getting the max in the training and test set, and then using the max utility to get the maximum of the two. We then force the obtained number to be an integer by wrapping the entire function inside int . We obtain the number of movies in a similar fashion:

Next, we create a function that will create the matrix. The reason for doing this is to set up the dataset in a way that the RBM expects as input. We create a function called convert , which takes in our data as input and converts it into the matrix.

First, we create an empty list called new_data . We then create a for loop that will go through the dataset, fetch all the movies rated by a specific user, and the ratings by that same user. Notice that we loop up to no_users + 1 to include the last user ID since the range function doesn’t include the upper bound.

Since there are movies that the user didn’t rate, we first create a matrix of zeros. We then update the zeros with the user’s ratings. When appending the movie ratings, we use id_movies — 1 because indices in Python start from zero. We therefore subtract one to ensure that the first index in Python is included. We append the ratings to new_data as a list. This will create a list of lists. Later, we’ll convert this into Torch tensors. The function that converts the list to Torch tensors expects a list of lists.

Now let’s use our function and convert our training and test data into a matrix.