How do neural networks learn to understand language?

Faculty Interview: Sam Bowman

Sam Bowman is one of the leading researchers in the field of natural language processing (NLP), and recently joined NYU as an Assistant Professor in Computational Linguistics, a joint position between NYU’s Linguistics department and the Center for Data Science. This fall, he will be teaching a course titled “Seminar in Semantics: Artificial Neural Networks.”

Can you talk about the course that you’re teaching?

This fall, I’ll be teaching a seminar-style course on the use of neural network models for language understanding. I suspect that it is one of the first courses of its kind, because almost all of the research conducted on these models is done in computer science departments, while this course offers a more linguistics-oriented perspective.

I want the course to enable graduate students to begin conducting their own research on how these models are learning human language, and I also want to prepare students to try and build new models that incorporate more of what linguists know about the mechanics of language.

Can you give us a bit of background on the field of natural language processing?

As a field, NLP is entering into a really exciting period. Neural network models are to the point where they might begin to solve a lot of longstanding language problems, such as translation, but we don’t yet have a clear understanding of what exactly makes these models so effective.

This combination has left open a ton of questions that I find fascinating, and I think that there’s more opportunity then ever to conduct great science and great engineering in this field.

Can you give a brief definition of neural network models, and talk about how these models fit into the field of natural language processing?

Artificial neural networks are machine learning models: computational systems that are designed to learn how to perform some kind of behavior, given previous examples shown through data. This can include anything from labeling an image, to translating a text, to steering a robot.

Despite the name, most artificial neural networks are only loosely modeled off of anything in real brains. Instead, what makes them distinctive and exciting is that they can learn to perform incredibly complex behaviors — behaviors that involve rich internal representations and reasoning processes — without any human engineer telling them anything about the internal representations and processes required.

Language has a huge amount of structure that we either don’t understand, or that we understand, but is too complex to build into an NLP system. By using models that can discover structure in language without explicit guidance, we can use large language datasets as a powerful ally in building effective systems.

What are some of the research projects where you’re applying neural network models?

I’m broadly interested in how neural network models learn to extract meaning from sentences. In particular, I’m interested in how grammar and sentence structure fit into neural network learning. I’m working on models that are designed to learn sentence structure — roughly as young children do — instead of relying on experts to build that structure explicitly into the models, or skipping it altogether.

I’ve also been trying to simulate human inference judgments. That is, if you tell me that “Spot barks a lot” and I understand you, I can work out that “Spot is a dog” or “Spot is an animal” and I can pretty much rule out the possibility that “Spot is a cafe in the lobby of an art museum.” This problem arrises when building NLP-based technologies, and framing a language understanding problem so that the problem can also be used as a bridge between NLP and formal linguistics.

Can you talk about some of the practical applications for extracting sentence meaning?

These deep learning models are promising for almost any application where you need an automated system to understand text, beyond simple keyword matching. Some of the biggest applications that would benefit from better technology are question answering (a program like Siri), web search, or translation. Automatic text summarization is not as widely used now, but I think it could have a huge impact if backed by NLP models. Imagine being able to instantly receive a human-readable summary of a big swath of medical literature when you’re trying to learn about a new condition.

What domains or fields do you think could benefit from natural language processing tools, and have not yet seen these advancements?

As a linguist, I have to say linguistics! For those of us studying human language, it’s a huge obstacle that we can’t directly inspect the representations that humans use in their brains, at least not with any amount of precision. Similarly, we can’t experimentally alter the kinds of language that humans are exposed to when they first learn language as young children, which makes it difficult to test hypotheses about how that learning takes place. Now that machine learning models in NLP are starting to learn to use language in sophisticated (if imperfect) ways, there’s a tremendous opportunity to try to use models like these as proxies to get a clearer picture of what might be possible in humans.