Natural Language Processing (NLP) is the ability of a computer system to understand human language. Natural Langauge Processing is a subset of Artificial Intelligence (AI). There are multiple resources available online which can help you develop expertise in Natural Language Processing.

In this blog post, we list resources for the beginners and intermediate level learners.

Natural Language Resources for Beginners

A beginner can follow two methods i.e. Traditional Machine Learning and Deep Learning to get started with Natural language processing. These two methods are very different from each other. For the inquisitive, here’s the difference between these two.

Traditional Machine Learning

Traditional machine learning algorithms are complex and often not easy to understand. Here are a few resources which will help you get started in learning NLP using machine learning:

Speech and Language Processing by Jurafsky and Martin is the popularly acclaimed bible for traditional Natural Language Processing. You can access it here.

For a more practical approach, you can try out Natural Language Toolkit.

Deep Learning

Deep learning is a subfield of machine learning and is far better than traditional machine learning due to the introduction of Artificial Neural Networks. Beginners can start with the following resources:

CS 224n: This is the best course to get started with using Deep Learning for Natural Language Processing. This course is hosted by Stanford and can be accessed here.

Yoav Golberg’s free and paid books are great resources to get started with Deep Learning in Natural Language Processing. The free version can be accessed here and the full version is available here.

A very thorough coverage of all algorithms can be found in Jacob Einsenstein’s notes from GATECH’s NLP class which deals in almost all NLP methods. You can access the notes on GitHub here.

Natural Language Resources for Practitioners

If you are a practicing Data Scientist, you will need three types of resources:

Quick Getting Started guides / Knowing about what is hot and new Problem-Specific Surveys of Methods Blogs to follow regularly

Quick Getting Started guides / Knowing about what is hot and new

One can start with Otter et al.’s Deep Learning for Natural Language Processing survey. You can access it here.

A survey paper by Young et al tries to summarize everything hip in Deep Learning based Natural Language Processing, and is recommended to get started with Natural Language Processing for practitioners. You can access the paper here.

You can refer to this article to understand the basics of LSTMs and RNNs, which are used in Natural Language Processing a lot. Another much more cited (and highly reputed) survey of LSTMs is here. This is an interesting paper to understand how the hidden states of RNNs work. It is an enjoyable read and can be accessed here. I always recommend the following two blog posts anyone who hasn’t read them:

Convolutional Neural Networks (Convnets) can be used to make sense of Natural Language. You can visualize how Convnets work in NLP by reading this paper here.

How Convnets and RNNs compare with each other has been highlighted in this paper by Bai et al.. All its pytorch (I have stopped or reduced, to a large extent, reading deep learning code not written in pytorch

) code is open sourced here and gives you a feel of Godzilla v/s King Kong or Ford Mustang vs Chevy Camaro(if you enjoy(ed) that type of thing). Who will win! .

Problem-specific Surveys of Methods

Another type of resources practitioners need is answers to questions of the type: “I have to train an algorithm to do X, what is the coolest (and easily accessible) thing I can apply?”.

Here’s what you will need for this:

TEXT CLASSIFICATION

What’s the first problem people solve? Text Classification, mostly. Text Classification can be in the form of categorizing text into different categories or detecting sentiment/emotion within the text.

I would like to highlight an easy read about different surveys of Sentiment Analysis described in this ParallelDots blog. Though the survey is for sentiment analysis technologies, it can be extended to most text classification problems.

Our (ParallelDots) surveys are slightly less technical and aim to direct you to cool resources to understand a concept. The Arxiv survey papers I point you to will be very technical and will need you to read other important papers to deeply understand a topic. Our suggested way is to use our links to get familiar and have fun with a topic but then to be sure to read the thorough guides we point to. (Dr. Oakley’s course talks about chunking, where you first try to get small bits here and there before you jump deep). Remember, it is great to have fun but unless you understand the techniques in detail, it will be hard to apply concepts in a new situation.

Another survey of Sentiment Analysis algorithms (by people at Linked University and UIUC) is here.

The Transfer Learning revolution has already hit Deep Learning. Just like in images where a model trained on ImageNet classification can be fine-tuned for any classification task, NLP models trained for language modeling on Wikipedia can now transfer learn text classification on a relatively lesser amount of data. Here are two papers from OpenAI and Ruder, and Howard which deal with these techniques.

Fast.ai has a more friendly documentation to apply these methods here.

If you are transfer learning two different tasks (not transferring from Wikipedia language modeling task), tricks to use Convnets are mentioned here.

IMHO, such approaches will slowly take up on all other classification methods (simple extrapolation from what has happened in vision). We also released our work on Zero Shot Text classification which gets good accuracy without any training on a dataset and are working on its next generation. We have built our custom text classification API commonly called the Custom Classifier in which you can define your own categories. You can check out the free here.

SEQUENCE LABELING

Sequence Labeling is a task which labels words with different attributes. These include part-of-speech tagging, Named Entity Recognition, Keyword Tagging etc.

We wrote a fun review of methods to tasks like these here.

An excellent resource for such problems is the research paper from this year’s COLING which gives optimal guidelines to train Sequence labeling algorithms. You can access it here.

MACHINE TRANSLATION

One of the biggest advances in NLP in recent days has been the discovery of algorithms that can translate text from one language to another. Google’s system is an insane 16 layered LSTM (which requires no dropout because they have tons of data to train on) and gives state-of-the-art translation results.

Media experts blew the hype out of proportion with hyperbole reports claiming “Facebook had to shut down AI which invented its own language”. Here are some of these.

For an extensive tutorial on Machine Translation, refer to Philip Koehn’s research paper here. A specific review to use Deep Learning for Machine Translation (which we call NMT or Neural Machine Translation) is here.

A couple of my favorite papers are here –

This paper by Google tells you how to solve a problem end-to-end when you have a lot of money and data.

Facebook’s Convolutional NMT system (just because of its cool convolutional approach) and its code is released as a library here.

https://marian-nmt.github.io/ is a framework for fast translation in C++http://www.aclweb.org/anthology/P18-4020

http://opennmt.net/ enables everyone to train their NMT systems.

QUESTION ANSWERING

IMHO this is going to be the next “Machine Translation”. There are many different types of Question Answering tasks. Choosing from options, selecting answers from a paragraph or a knowledge graph and answering questions based on an image (also known as Visual Question Answering) and there are different datasets to get to know the state of the art method.

SQuAD dataset is a question answering datasets which tests an algorithm’s ability to read comprehensions and answer questions. Microsoft published a paper earlier this year claiming they have reached human-level accuracy for the task. The paper can be found here. Another important algorithm (which I feel is the coolest) is Allen AI’s BIDAF and its improvements.

Another important set of algorithms is Visual Question Answering which answers questions about images. Teney et al.’s paper from VQA 2017 challenge is an excellent resource to get started. You can also find its implementations on Github here.

Extractive Question Answering on large documents (like how Google Highlights answer to your queries in first few results) in real life can be done using Transfer Learning (thus with few annotations) as shown in this ETH paper here. A very good paper criticizing the “understanding” of Question Answering algorithms is here. Must read if you are working in this field.

PARAPHRASE, SENTENCE SIMILARITY OR INFERENCE

The task of comparing sentences. NLP has three different tasks: Sentence Similarity, Paraphrase detection and Natural Language Inference (NLI) for this, each requiring more semantic understanding than the last. MultiNLI and its subset Stanford NLI are the most well-known benchmarks datasets for NLI and have become the focus of research lately. There are also MS Paraphrase Corpus and Quora Corpus for paraphrase detection, and a SemEval Dataset for STS (Semantic Text Similarity). A good survey for advanced models in this domain can be found here. Applied NLI in the clinical domain is very important. (Finding out about right medical procedures, side effects and cross effects of drugs etc. ). This tutorial from applied NLI in the medical domain is a good read if you are looking to apply the tech in a specific domain.

Here is a list of my favorite papers in this domain

Natural Language Inference over Interaction Space — It highlights a very clever approach for putting a DenseNet (Convolutional Neural Network on Sentence representations). The fact that this was the outcome of an internship project makes it even cooler! You can read the paper here.

This research paper from Omar Levy’s group shows that even simple algorithms can perform the task. This is because algorithms are still not learning “inference”.

BiMPM is a cool model to predict paraphrases and can be accessed here.

We have a new work for Paraphrase detection too which applies Relation Networks on top of sentence representations and has been accepted at this year’s AINL conference. You can read it here.

OTHER FIELDS

Here are some of the more detailed survey papers to get information about research for other tasks you might encounter making an NLP system.

Language Modelling(LM) — Language Modelling is the task of learning an unsupervised representation of a language. This is done by predicting the (n+1)th word of a sentence given the first N words. These models have two important real-world uses, autocomplete and acting as a base model for transfer learning for text classification as mentioned above. A detailed survey is here. If you are interested in learning how to autocomplete LSTMs in cellphones/search engines work based on search history, here is a cool paper you should read.

Language Modelling is the task of learning an unsupervised representation of a language. This is done by predicting the (n+1)th word of a sentence given the first N words. These models have two important real-world uses, autocomplete and acting as a base model for transfer learning for text classification as mentioned above. A detailed survey is here. If you are interested in learning how to autocomplete LSTMs in cellphones/search engines work based on search history, here is a cool paper you should read. Relation Extraction — Relation extraction is the task of extracting relations between entities present in a sentence. A given sentence “A is related as r to B”, gives the triplet (A,r, B). A survey of the research work in the field is here. Here is a research paper that I found to be really interesting. It uses BIDAFs for Zero Shot Relation extraction (that is, it can recognize relations it was not even trained to recognize).

Relation extraction is the task of extracting relations between entities present in a sentence. A given sentence “A is related as r to B”, gives the triplet (A,r, B). A survey of the research work in the field is here. Here is a research paper that I found to be really interesting. It uses BIDAFs for Zero Shot Relation extraction (that is, it can recognize relations it was not even trained to recognize). Dialog Systems — With the onset of the chatbot revolution, Dialog systems are now the rage. Many people (including us) make dialog systems as a combination of models such as intent detection, keyword detection, question answering etc, while others try to model it end-to-end. A detailed survey of dialog system models by the team at JD.com is here. I would also like to mention Parl.ai, a framework by Facebook AI for the purpose.

With the onset of the chatbot revolution, Dialog systems are now the rage. Many people (including us) make dialog systems as a combination of models such as intent detection, keyword detection, question answering etc, while others try to model it end-to-end. A detailed survey of dialog system models by the team at JD.com is here. I would also like to mention Parl.ai, a framework by Facebook AI for the purpose. Text Summarization — Text Summarization is used to get a condensed text from a document (paragraph/news article etc.). There are two ways to do this: extractive and abstractive summarization. While extractive summarization gives out sentences from the article with the highest information content (and what has been available for decades), abstractive summarization aims to write a summary just like a human would. This demofrom Eintein AI brought abstractive summarization into mainstream research. There is an extensive survey of techniques here.

Text Summarization is used to get a condensed text from a document (paragraph/news article etc.). There are two ways to do this: extractive and abstractive summarization. While extractive summarization gives out sentences from the article with the highest information content (and what has been available for decades), abstractive summarization aims to write a summary just like a human would. This demofrom Eintein AI brought abstractive summarization into mainstream research. There is an extensive survey of techniques here. Natural Language Generation (NLG) — Natural Language Generation is the research where the computer aims to write like a human would. This could be stories, poetries, image captions etc. Out of these, current research has been able to do very well on image captions where LSTMs and attention mechanism combined has given outputs usable in real life. A survey of techniques is available here.

Blogs to follow

Here’s a list of blogs which we highly recommend for anyone interested in keeping track of what’s new in NLP research.

Einstein AI — https://einstein.ai/research

Google AI blog — https://ai.googleblog.com/

WildML — http://www.wildml.com/

DistillPub — https://distill.pub/ (distillpub is unique, blog and publication both)

Sebastian Ruder — http://ruder.io/

If you liked this article, you must follow our blog. We come up with resource lists quite frequently here.

That’s all folks! Enjoy making neural nets understand language.

You can also read about Machine Learning algorithms you should know to become a Data Scientist here.

We hope you liked the article. Please Sign Up for a free ParallelDots account to start your AI journey. You can also check out free demos of ParallelDots AI APIs here.

You can read the original article here.