Introduction

Text classification is a supervised machine learning method used to classify sentences or text documents into one or more defined categories. It’s a widely used natural language processing task playing an important role in spam filtering, sentiment analysis, categorisation of news articles and many other business related issues.

Most current state of the art approaches rely on a technique called text embedding. It transforms text into a numerical representation in high-dimensional space. It allows for a document, sentence, word, a character (depending on what embedding we use) to be expressed as a vector in this high-dimensional space.

The reason Flair is exciting news for NLP is because a recent paper Contextual String Embeddings for Sequence Labelling from Zalando Research covers an approach that consistently outperforms previous state-of-the-art solutions. It’s implemented and fully supported in Flair and can be used to build text classifiers.

1. Getting Ready

To install Flair you will need Python 3.6. If you do not have it yet, here’s a guide on how to do that. Then, to install Flair, run:

pip install flair

This will install all the required packages needed to run Flair. It will also include PyTorch which Flair sits on top of.

2. Using a Pre-trained Classification Model

The new release 0.4 comes with two pre-trained models. A sentiment analysis model trained on the IMDB dataset and an ‘offensive language detection’ model (which currently only supports German).

Using, downloading and storing the model has all been incorporated into a single method that makes the whole process of using pre-trained models surprisingly straightforward.

To use the sentiment analysis model simply run:

from flair.models import TextClassifier

from flair.data import Sentence classifier = TextClassifier.load('en-sentiment') sentence = Sentence('Flair is pretty neat!')

classifier.predict(sentence) # print sentence with predicted labels

print('Sentence above is: ', sentence.labels)

When running this for the first time, Flair will download the sentiment analysis model and by default store it into the .flair sub-folder of the home directory. It can take up to a few minutes.

The code above first loads the required libraries, then loads the sentiment analysis model into memory (downloads it first if needed) and then predicts the sentiment score of sentence “Flair is pretty neat!” on the scale form 0 to 1. The final command prints out: The sentence above is: [Positive (1.0)] .

That’s it! Now you can for example incorporate the code into a REST api and offer a service comparable to Google’s Cloud Natural Language API’s sentiment analysis which can prove to be quite expensive when used in production on high volume of requests.

3. Training a Custom Text Classifier

To train a custom text classifier we will first need a labelled dataset. Flair’s classification dataset format is based on the Facebook’s FastText format. The format requires one or multiple labels to be defined at the beginning of each line starting with the prefix __label__ . The format is as follows:

__label__<class_1> <text>

__label__<class_2> <text>

For this article we will use Kaggle’s SMS Spam Detection Dataset to build a spam/not-spam classifier with Flair. The dataset is suitable for learning as it only contains 5572 lines and it is small enough to train a model in a few minutes on a CPU.

SMS messages from the dataset labelled as either spam or ham (not spam)

3.1 Preprocessing — Building the Dataset

We first download the dataset from this link on Kaggle to obtain spam.csv . Then, in the same directory as our dataset, we run our preprocessing snippet below which will do some preprocessing and split our dataset into train, dev and test sets.

Make sure you have Pandas installed. If not, run pip install pandas first.

import pandas as pd

data = pd.read_csv("./spam.csv", encoding='latin-1').sample(frac=1).drop_duplicates() data = data[['v1', 'v2']].rename(columns={"v1":"label", "v2":"text"})



data['label'] = '__label__' + data['label'].astype(str) data.iloc[0:int(len(data)*0.8)].to_csv('train.csv', sep='\t', index = False, header = False)

data.iloc[int(len(data)*0.8):int(len(data)*0.9)].to_csv('test.csv', sep='\t', index = False, header = False)

data.iloc[int(len(data)*0.9):].to_csv('dev.csv', sep='\t', index = False, header = False);

This will remove some duplicates from our dataset, shuffle it (randomise rows) and split the data into train, dev and test sets using the 80/10/10 split.

If this runs successfully you will end up with train.csv , test.csv and dev.csv formatted in the FastText format ready to be used with Flair.

3.2 Training a Custom Text Classification Model

To train the model run this snippet in the same directory as the generated dataset.

from flair.data_fetcher import NLPTaskDataFetcher

from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentLSTMEmbeddings

from flair.models import TextClassifier

from flair.trainers import ModelTrainer

from pathlib import Path corpus = NLPTaskDataFetcher.load_classification_corpus(Path('./'), test_file='test.csv', dev_file='dev.csv', train_file='train.csv') word_embeddings = [WordEmbeddings('glove'), FlairEmbeddings('news-forward-fast'), FlairEmbeddings('news-backward-fast')] document_embeddings = DocumentLSTMEmbeddings(word_embeddings, hidden_size=512, reproject_words=True, reproject_words_dimension=256) classifier = TextClassifier(document_embeddings, label_dictionary=corpus.make_label_dictionary(), multi_label=False) trainer = ModelTrainer(classifier, corpus) trainer.train('./', max_epochs=10)

When running this code for the first time, Flair will download all required embedding models which can take up to a few minutes. The whole training process will then take another 5 minutes.

This snippet first loads the required libraries and datasets into a corpus object.

Next, we create a list of the embeddings (two Flair contextual sting embeddings and a GloVe word embedding). This list is then used as an input for our document embedding object. Stacked and document embedding are one of the most interesting concepts of Flair. They provide means to combine different embeddings together. You can use both traditional word embeddings (like GloVe, word2vec, ELMo) together with Flair contextual sting embeddings. In the example above we use an LSTM based method of combining word and contextual sting embeddings for generating document embeddings. You can read more about it here.

Finally, the snippet trains the model which produces final-model.pt and best-model.pt files which represent our stored trained model.

3.3 Using the Trained Model for Predictions

We can now use the exported model to generate predictions by running the following snippet from the same directory:

from flair.models import TextClassifier

from flair.data import Sentence classifier = TextClassifier.load_from_file('./best-model.pt') sentence = Sentence('Hi. Yes mum, I will...') classifier.predict(sentence) print(sentence.labels)

The snippet prints out ‘[ham (1.0)]’ meaning that the model is 100% sure our example message is not spam.

How does it Perform Compared to Other Frameworks?

Unlike Facebook’s FastText or even Google’s AutoML Natural Language platform, doing text classification with Flair is still a relatively low-level task. We have full control of how text embedding and training is done by having an option to set parameters such as learning rate, batch size, anneal factor, loss function, optimiser selection… In order to achieve optimal performance these hyper parameters need to be tuned. Flair provides us with a wrapper of a well known hyper parameter tuning library Hyperopt (described here) which we can use to tune our hyper parameters for optimal performance.

In this article, we used the default hyper parameters for the sake of simplicity. With mostly default parameters our Flair model achieved an f1-score of 0.973 after 10 epochs.

For comparison, we trained a text classification model with FastText and on AutoML Natural Language platform. We first ran FastText with the default parameters and achieved an f1-score of 0.883, meaning that our model outperformed the FastText by a large margin. FastText, however, only needed a few seconds to train as opposed to 5 minutes for our custom Flair model.

Then we also compared our results to the ones obtained on Google’s AutoML Natural Language platform. The platform first needed 20 minutes to just parse the dataset. After that we started the training process which took almost 3 hours to finish (costing almost $10 of free credits), but achieved an f1-score of 99.211 - a slightly better result than our custom Flair model.

Final Thoughts

This article should give you a rough understanding of how to use Flair for text classification.

In the next publication we explain how to tune Flair’s hyper parameters to achieve optimal performance and beat Google’s AutoML in text classification.