Description

We have recently started investigating how to scale deep learning techniques to much larger models in an effort to improve the accuracy of such models in the domains of computer vision, speech recognition, and natural language processing. Our largest models to date have more than 1 billion parameters, and we utilize both supervised and unsupervised training in our work. In order to train models of this scale, we utilize clusters of thousands of machines, and exploit both model parallelism (by distributing computation within a single replica of the model across multiple cores and multiple machines) and data parallelism (by distributing computation across many replicas of these distributed models). In this talk I'll describe the progress we've made on building training systems for models of this scale, and also highlight a few results for using these models for tasks that are important to improving Google's products. This talk describes joint work with Kai Chen, Greg Corrado, Matthieu Devin, Quoc Le, Rajat Monga, Andrew Ng, MarcAurelio Ranzato, Paul Tucker, and Ke Yang.

Questions and Answers