Model Training

Using the Keras Library , we’ll build and train neural networks for both aspect category and sentiment classification. Keras is a neural networks API that enables fast experimentation through a high-level, user-friendly, modular and extensible API. Keras was developed and is maintained by Francois Chollet and can run on both CPU and GPU.

Defining the Neural Network Architecture

Let’s start by defining the aspect classifier architecture:

We’ll use a fully-connected network structure with three layers. We first create a Sequential object and use the add function to add layers to our model. The Dense class is used to define a fully-connected layer, where each neuron in the network receives input from all the neurons in the previous layer.

The input shape is set to 6000, which is the maximum size of vocabulary created using word embedding, with relu used as the non-linear activation function. Nonlinear functions transform data such that the resulting transformed points can effectively be classified into different classes. The output layer comprises 13 neurons, one for each class. The softmax activation function used in our model returns the probabilities of each class—the target class will have the highest probability.

Once the architecture is specified, we need to configure the learning process by specifying an optimizer, a loss function, and accuracy metrics. “Learning” simply means finding a combination of model parameters that minimize a

loss function for a given set of training data samples and their corresponding targets. Since the problem at hand is multi-class classification, categorical cross entropy loss was specified as the loss function.

Vector Representations of Words

To encode the reviews in vectors we use a word embedding technique known as the Bag-of-Words (BoW). In this approach, we use tokenized words for each observation and find out the frequency of each token:

We need to encode the aspect category column as well:

The above three steps are repeated for the sentiment classifier. The output layer for this classifier will, however, be initialized with the value 3 since there are 3 types of sentiments—namely positive, neutral, and negative.

The Training Process

Our two models are ready to be trained. To train, we’ll use the fit() function on our models with the training data ( reviews_tokenized ), target data ( encoded_y ), the number of epochs, and verbose parameters. Verbose helps us see the training progress for each epoch.

Training progress

The accuracy of our models can be improved further by tuning the hyperparameters.

Finally, we test our models using a list of reviews, as shown below. Minor pre-processing, which includes changing the reviews to lowercase, is done on the reviews.

Test Reviews Results

The models performed fairly well in classifying our test reviews.

Conclusion

Aspect-based sentiment analysis (ABSA) can help businesses become customer-centric and place their customers at the heart of everything they do. It’s about listening to customers, understanding their voices, analyzing their feedback, and learning more about customer experiences, as well as their expectations for products or services.

While the use of computation approaches in mining customer opinions have proven to be quite promising, more needs to be done to improve the performance of these models. A plausible approach to improving the performance would be to use advanced word representation techniques, specifically contextual word embedding.

Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state-of-the-art for a wide range of NLP tasks. Among the popular contextual word embedding techniques is Google’s Bidirectional Encoder Representations from Transformer (BERT).

Contextual representation is characterized by generating a representation of each word based on the other words in the sentence. Therefore, the performance of the classification model is expected to improve significantly as a result of the rich contextual representations learned from the user-generated content.

Discuss this post on Hacker News and Reddit.