Ok, we tried to write an objective review of this paper, ie. of this toolkit, but we had a problem with it. Namely, everything we wrote sounded like a commercial. However, we are not paid to write anything like that, we are just truly excited about this awesome new toolkit from Standford – Stanza. Transfer learning seems to be the future of deep learning even when it comes to NLP. In the market, there are several NLP toolkits that provide various NLP options to engineers, like CoreNLP, Flair and spaCy. However, all these toolkits have some limitations. Majority of them supports only a few major languages, they are often under-optimized and have problem with input text that has been tokenized with some other tool. This means that these tools are limited when it comes to text that comes from multiple sources and a number of languages.

That is why authors present Stanza, an open-source Python natural language processing toolkit supporting whooping 66 human languages. Stanza has multiple advantages. It has a fully neural pipeline that takes raw text as input and as outputs annotations including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. Apart from that, their design is language agnostic, which gives the ability to support 66 languages. Finally, it’s performance is fantastic. On top of that Stanza provides Python API for Java Standford CoreNLP and pre-trained models.

In an essence, Stanza is composed of two main layers: