Natural language processing has made significant progress in the past year, but few frameworks focus directly on NLP or sequence modeling. Google Brain recently released Lingvo, a deep learning framework based on TensorFlow. Lingvo focuses on sequence-to-sequence models of language-related tasks such as machine translation, speech recognition, and speech synthesis; and significantly enhances code reuse and iteration speed. Lingvo-supported frameworks include traditional RNN sequence models, transformer models, and models that include VAE components. Lingvo is now open-sourced on GitHub.

“Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years”. (arXiv).

Synced invited Ni Lao, Chief Science Officer at Mosaix, to share his thoughts on Lingvo.

How would you describe Lingvo?

Google has been in the forefront of research and productionization of sequence (or sequence-to-sequence) models. Different from numerous open-source sequence models, Lingvo is a framework which helps develop new models (30 research papers cited on the Lingvo website). As Google Research Scientist Patrick Nguyen explains, “Lingvo means ‘language’ in Esperanto (the most widely spoken constructed international auxiliary language). This stems from the fact that the framework started with the infrastructure underlying our research efforts in speech and language.”

Because of the large engineering efforts originating within Google, the company developed good habits and patterns of approach in computing areas that could avoid development problems (e.g. D Sculley et al, 2015), and published the results for the public good. Lingvo represents an effective pattern for organizing deep learning R&D efforts which provides quick prototyping, reproducibility, and production readiness. “Lingvo is designed for research at scale. So that means more boilerplate, and a steeper learning curve. The payoff is researcher productivity and computational efficiency,” says Nguyen. This “boilerplate” framework is the extra code that helps organize the models and experiments performed by a myriad of researchers.

Why does this research matter?

The process of developing a new deep learning system is quite complicated. It involves exploring a large space of design choices involving the training data, the data processing logic, the size and type of model components, the optimization procedures, and the way to deployment. This complexity requires a framework that quickly facilitates the production of new combinations and modifications from existing experiments and documents and shares these new results. Lingvo is a workspace ready to be used by deep learning developers/researchers. Says Nguyen: “We have researchers, working on state-of-the-art products and research algorithms, basing their research off of the same codebase. This ensures that the code is battle-tested. Our collective experience is encoded in terms of good defaults and primitives (e.g. attention layers) that we have found useful over these tasks.”

What impact might this research bring to the research community?

As deep learning becomes more prevalent, its R&D efforts take more resources from organizations. Lingvo provides a well thought-out framework to manage all these efforts for research labs in universities and corporations. It potentially reduces R&D costs by improving the speed of acquiring new technologies, and enables sharing components and their improvement amongst projects. I have seen people working on great ideas but not able to realize them because of the lack of discipline in experimentation. Lingvo helps to overcome that barrier.

Can you identify any bottlenecks in the research?

Currently, Lingvo assumes that everything is stored in a centralized, version-controlled system. This is only achievable within a single R&D lab. In the public domain however, data, code, training procedures, and trained models are scattered among loosely coupled version control systems, making it harder to achieve Lingvo’s design goals.

Can you predict any potential future developments related to this research?

Sharing and collaboration has been welcomed by the research and data science communities. For example, Codalab is a platform for sharing experiment setups for many research papers. Kaggle (acquired by Google in 2017) is a cloud-based workbench for sharing data, code, and analysis. For most organizations, it would be valuable to have a framework which helps integrate resources from the public domain into the private domain of a lab.

The paper Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling is on arXiv.