Natural Language Processing (NLP) with BERT

BERT has been the most significant breakthrough in NLP since its inception.

But what is it? And why is it such a big deal?

Let’s start at the beginning. BERT stands for Bidirectional Encoder Representations from Transformers. Still none the wiser?

Let’s simplify it.

BERT is a deep learning framework, developed by Google, that can be applied to NLP.

Bidirectional (B)

This means that the BERT framework learns information from both the right and left side of a word (or token in NLP parlance). This makes it more efficient at understanding context.

For example, consider these two sentences:

Jimmy sat down in an armchair to read his favorite magazine.

Jimmy took a magazine and loaded it into his assault rifle.

The same word – two meanings, also known as a homonym. As BERT is bidirectional it will interpret both the left-hand and right-hand context of these two sentences. This allows the framework to more accurately predict the token given the context or vice-versa.

Encoder Representations (ER)

This refers to an encoder which is a program or algorithm used to learn a representation from a set of data. In BERT’s case, the set of data is vast, drawing from both Wikipedia (2,500 million words) and Google’s book corpus (800 million words).

The vast number of words used in the pretraining phase means that BERT has developed an intricate understanding of how language works, making it a highly useful tool in NLP.

Transformer (T)

This means that BERT is based on Transformer architecture. We’ll discuss this in more detail in the next section.

Why is BERT so revolutionary?

Not only is it a framework that has been pre-trained with the biggest data set ever used, it is also remarkably easy to adapt to different NLP applications, by adding additional output layers. This allows users to create sophisticated and precise models to carry out a wide variety of NLP tasks.