Gaining a better understanding of the model’s behaviour w.r.t data

Toxic comment classification task

The first example we would use here is this interesting Natural Language Processing contest from Kaggle that was going on at the time I was developing this tool. The goal was to classify text comments to different categories — toxic, obscene, threat, insult & so on. It’s a multi-label classification problem.

Among the neural network models, I tried several architectures starting from the simplest (feed-forward neural networks without convolutions/recurrences) to more complex ones. I used binary cross entropy loss with sigmoid activation in the final layer of the neural network. This way — it just outputs two probabilities for each label — thereby enabling multi-label classification. We will use the hidden representations from a Bi-directional LSTM initialized with untuned pre-trained word embeddings for this demonstration.

So I did the same steps described above — extracted hidden representations of each text comment in the validation set from the final layer, performed T-SNE/UMAP to shrink them to 2 dimensions and visualized them using the tool. The training went on for 5 epochs before early stopping kicked in. An advantage with using UMAP is that it’s an order of magnitude faster and still produces a high quality representation. Google did release real-time TSNE recently but I didn’t get to explore that yet.

Here’s a zoomed-in version of the visualization at the end of epoch 5. The class being visualized is insult. So red dots are insults and green dots are non-insults.

Hidden representations after epoch 5 on the toxic comment classification task

Let’s start with a fun one and look at the two points the blue arrows above are pointing to. One of them is an insult and the other one is not. What do the texts say?

Text1 (green dot with blue arrow): “bullshit bullshit bullshit bullshit bullshit bullshit”

Text2 (red dot with blue arrow): “i hate you i hate you i hate you i hate you i hate you i hate you i hate you”

It’s kind of funny how the model placed the two repetitive texts together. And also the notion of insult seems subtle here!

I was also curious to look at some of the green points in the center of the red cluster. Why might the model have confused about them? What would their texts be like? For example, here’s what the text of the point that the black arrow in the figure above points to says:

“don’t call me a troublemaker you p&&&y you’re just as much of a racist right wing nut as XYZ” (the censors and name omissions are mine — they are not present as such in the text).

Well that does seem like an insult — so it just seems like a bad label! It should’ve been a red dot instead!

It might not be that all these mis-placed points are bad labels but digging deep by visualizing as above might lead to discovering all these characteristics of the data.

I also think this helps us uncover the effects of things such as tokenization/pre-processing on a model’s performance. In the Text2 above, it might have helped the model if there’s proper punctuation — may be a full stop after each i hate you. There are other examples where I felt capitalization might have helped.

Yelp reviews sentiment classification task

I also wanted to try this approach on a different dataset. So I picked this yelp reviews data from Kaggle and decided to implement a simple sentiment classifier. I converted the star ratings to be binary — to make things a bit easier. So — 1, 2 and 3 stars are negative and 4, 5 stars are positive reviews. Again, I started with a simple feedforward neural network architecture that operates on embeddings, flattens them, sends them through a fully connected layer and outputs the probabilities. It’s an unconventional architecture for NLP classification tasks — but I was just curious to see how it does. The training went on for 10 epochs before early stopping kicked in.

Here’s what the visualization’s like at the end of the last epoch:

Hidden representations after epoch 10 on yelp binary sentiment classification task

The text pointed to by the black arrow says:

“food has always been delicious every time that i have gone here. unfortunately the service is not very good. i only return because i love the food.”

This seems like a neutral review and probably a bit more leaning towards the positive side. So maybe it isn’t too unreasonable for the model to put that point in the positive cluster. Furthermore, this model treats words individually (no n-grams) and that might explain things like missing the “not” in “not very good” above. Below is the text for the closest positive point to the negative point above.

“love this place. simple ramen joint with a very basic menu, but always delicious and great service. very reasonably priced and a small beautiful atmosphere. definitely categorize it as a neighborhood gem.”

The fact that the model placed the two texts above very close in space probably re-affirms the limitations of the model (things such as not capturing n-grams).

I sometimes imagine this analysis can help us understand which examples are “hard” vs “easy” for the model. This can be understood just by looking at the points that seem misclassified w.r.t their neighbors. Once we gain some understanding, we could then use that knowledge to either add more hand-crafted features to help model understand such examples better or change the architecture of the model so that it will better understand those “hard” examples.