The recent rapid development of pretrained language models has produced significant performance improvements on downstream NLP tasks. These pretrained language models compile and store relational knowledge they encounter in training data, which prompted Facebook AI Research and University College London to introduce their LAMA (LAnguage Model Analysis) probe to explore the feasibility of using language models as knowledge bases.

The term “knowledge base” was introduced in the 1970s. Unlike databases which store figures, tables, and other straightforward data in computer memory, a knowledge base is able to store more complex structured and unstructured information. A knowledge base system can be likened to a library that stores facts in a specific field. Knowledge bases also contain an inference engine that can reason about those facts and use rules and logic to deduce new facts. A typical example is an expert system, which can respond to questions by using the range of knowledge contained in the system.

One reason researchers are interested in using language models as knowledge bases is that language models require no schema engineering, allowing users to query an open class of relations. Language models are also more flexible to data extensions, and more importantly, require no human intervention during the training process.

The researchers used two of the most popular pretrained high-capacity language models — ELMo and BERT — as examples. These pretrained language models are very efficient in predicting the next word or masked words in a sequence (ie “Canada’s capital city is __”), which indicates that the model parameters have stored vast amounts of linguistic knowledge.

Comparison of querying a knowledge base and language models for factual knowledge

Given the above characteristics of language models as potential representations of relational knowledge, the researchers undertook an in-depth analysis of a wide range of state-of-the-art pretrained language models, using the LAMA probe to test their factual and commonsense knowledge. The goal was to have LAMA determine how much relational knowledge the language models store; how they differ with regard to types of knowledge, such as facts about entities or common sense; and how their performance compares to traditional knowledge bases.

The researchers evaluated each model based on how highly it ranked the ground truth against others words in the vocabulary. The higher the models ranked, the more factual knowledge they contained.

Mean precision at one for all evaluated models

The results show that without fine tuning, BERT-large does remarkably well on open-domain question answering compared to baselines and non-neural competitors. The researchers determined it is nontrivial to extract a knowledge base from text that performs as well as pretrained BERT-large. Moreover, the performance of language models like BERT improves on large corpora. Considering the ever growing size of corpora in today’s data-hungry AI environment, it’s possible that in the near future language models may become a viable alternative to traditional knowledge bases extracted from text.

The paper Language Models as Knowledge Bases? is on arXiv.