Context

Neural Networks are now widely used in many ways. From image caption generation to breast cancer prediction, this great diversity of applications is a natural consequence of the important variety of neural architectures (Feed Forward Neural Networks, Convolutional Neural Networks, etc…). Among all these architectures, Long Short Term Memory (LSTM) — a particular case of Recurrent Neural Networks — have proven very successful on tasks such as machine translation, time series prediction or generally anything where the data is sequential. This is mainly due to their ability to memorize relatively long term dependencies, which is achieved by taking into account previous information for further predictions.

But LSTMs alone aren’t always enough. Sometimes we need to tweak these layers and adapt them to the task at hand.

At Kwyk, we provide online math exercises. And every time an exercise is answered, we collect some data which is then used to tailor homework for each student using the website. In order to determine which exercises are the most likely to make a student progress, we need to know how likely he/she is to succeed on each exercise at any given point in time. And by being able to correctly predict these probabilities of success we can choose the homework that maximizes the overall progress.

This post aims to introduce our modified LSTM cell, the “Multi-state LSTM”, which we based our model on to try and solve this problem.

A quick data overview

When an exercise is answered, we save information about both the student (or “user”) and the exercise, along with an additional score value that is either (0) or (1) depending the user’s success. The information we collect is in the form of categorical features that tell us : “which exercise is it ?”, “what chapter is it ?”, “what student is it ?”, “from which grade is he/she ?” … This results in features that have from tens to thousands of modalities, which we use to predict the “score” variable and get success probabilities.

A sample of our data

LSTM-based model

There are two main motivations behind the choice of an LSTM-based model.

The first one comes from the fact that all the features we’re using are categorical. That’s the real source of information we want to learn from. And by manually cooking features such as moving averages of previous scores, we introduce biases due to the subjective choices we make. Therefore, we chose to rely on neural networks, directly feed our data and let the features form automatically and “objectively”.

The second motivation concerns the choice of LSTMs among all available architectures. This is simply due to our belief that for each student the probability of success on each exercise depends on all previous results in time and therefore is sequential.

With this in mind, we thought of an architecture where for each categorical feature, a shared LSTM would keep a history of previous results for each modality.

Let’s take the “user_id” feature for example. In this case, we need to adapt an LSTM cell to memorize on the fly but independently for each student a history of how he/she was successful in the past. This is done by adding what we call a “Multi-state” to the basic LSTM cell :