Natural language processing (NLP) is getting very popular today, which became especially noticeable in the background of the deep learning development. NLP is a field of artificial intelligence aimed at understanding and extracting important information from text and further training based on text data. The main tasks include speech recognition and generation, text analysis, sentiment analysis, machine translation, etc.

In the past decades, only experts with appropriate philological education could be engaged in the natural language processing. Besides mathematics and machine learning, they should have been familiar with some key linguistic concepts. Now, we can just use already written NLP libraries. Their main purpose is to simplify the text preprocessing. We can focus on building machine learning models and hyperparameters fine-tuning.

There are many tools and libraries designed to solve NLP problems. Today, we want to outline and compare the most popular and helpful natural language processing libraries, based on our experience. You should understand that all the libraries we look at have only partially overlapped tasks. So, sometimes it is hard to compare them directly. We will walk around some features and compare only those libraries, for which this is possible.

General overview

NLTK (Natural Language Toolkit) is used for such tasks as tokenization, lemmatization, stemming, parsing, POS tagging, etc. This library has tools for almost all NLP tasks.

Spacy is the main competitor of the NLTK. These two libraries can be used for the same tasks.

Scikit-learn provides a large library for machine learning. The tools for text preprocessing are also presented here.

Gensim is the package for topic and vector space modeling, document similarity.

The general mission of the Pattern library is to serve as the web mining module. So, it supports NLP only as a side task.

Polyglot is the yet another python package for NLP. It is not very popular but also can be used for a wide range of the NLP tasks.

To make a comparison more vivid, we prepared a table that shows the pros and cons of the libraries.