The world has changed remarkably over the last few decades—from command-line interfaces to graphical interfaces and touch screens. The next great horizon can be seen—or heard—in the daily speech patterns of citizens around the globe. But language and dialect are as individual as a fingerprint, and technology must learn to decipher the subtle context, implications, and complex structures of human speech.

In June 2017, Mozilla’s Open Innovation team launched Common Voice with the goal of establishing the world’s largest open collection of human voice data to provide startups, innovators, and research universities with reliable datasets with which to train machine learning models for speech technologies. Currently, Common Voice is used to train Mozilla’s TensorFlow implementation of Baidu’s DeepSpeech architecture, as well as Kaldi (the speech recognition toolkit that was core to the development of Siri). The project’s goal is to collect up to 10,000 hours of speech for as many distinct languages as possible.

Common Voice has seen remarkable accelerated growth, supported by eager, vocal contributors and technology collaborations, such as with Mycroft, Snips, Dat Project, and Bangor University in Wales. Today, Common Voice represents the second-largest open speech dataset, with more than 500 hours of English voice data collected from 112 countries. To put that in perspective, the public collection of Ted talks constitutes about 200 hours, while LibriSpeech, which is essentially public-domain books on tape, represents about 1,000 hours.

The platform has also been adapted by communities to collect Macedonian and Welsh, with community translations of the site already underway for 17 new languages, to be enabled for people to contribute their voices later this year.

To learn more and to contribute your valuable speech sample, go to voice.mozilla.org.