I consider myself a an aspiring NLP(Natural Language Processing) engineer. I have tried my hands with a few projects that deal with NLP, with a wariness towards Deep Learning that I don’t fully comprehend.

I became aware of spaCy close to two years ago. I read about its concept of Pipeline for NLP and found it both practical and interesting. Although I neither got the inclination to take it for a spin nor did fate bestow me with an opportunity to do so(I know, I know, I have to make my own opportunities rather waiting ….). It reared it’s head again when I was checking out rasa_nlu (which is an open-source platform that allows you to build NLU systems) and it supported use of spaCy pipelines.

While working on another NLP-related project which with a pipeline flow and plug-n-play components become would become more versatile, I finally had the motivation to try it out.

Luckily for me spaCy now has an interactive course (similar to those codeacademy) specifically built to teach you how to use spaCy (of course with python)effectively.

I have only finished the 1st of 4 chapters, but it was good enough to motivate me to write this review-ish(puff-piece?) article.

Over the course of the first chapter they introduced people to topics that most coders who have ever tried their hands with NLP would have heard if not worked with, such as Tokenization, POS(Parts-of-speech) tagging, NER(Named-Entity-Recognition) and Statistical Models

So here are some main reasons I have an infatuation with spaCy, which I hope turns into more.

Understanding how to deal with text

During one of my adventures(reinventing the wheel, slower and slowly) I had tried encapsulating text in such a way as to provide ease of use when processing them(in my case text-statistics were of focus). This allowed me to realize the import of the Document-based structure that spaCy employs. It allows for a quick and painless way of dealing with text without all the melodrama.