Last week two independent research groups published results of their work on human language. One group discovered that human language is “universal”; and that the relation between words and meaning is not arbitrary. The other group announced a convoluted neural net that can learn to emulate human language by “listening” to human voices. Connecting those two fascinating pieces of work can tell us a lot about the future of human-machine collaboration...

First, the discovery on human language: till last week the de facto assumption in linguistics has been that words and meaning are arbitrary. This meant that different peoples evolving and living in different geographical areas of our planet developed different vocabularies by chance. And although Chomsky has suggested the existence of a “universal grammar” hardwired in the human brain the consensus amongst linguists has been that when it comes to words there is not such thing as a “universal dictionary”. This assumption has now been seriously challenged, following the publication of a research paper by an international scientific consortium led by the Cognitve Neuroscience Lab of Cornell University.

Using statistical methods, the research scientists of the consortium analyzed word lists from two thirds of the 3000+ currently in existence and found that the sounds used to make the words of the most common objects were strikingly similar. For example, words that describe the nose include the sounds “neh” and “oo”, as in “ooze”. Their analysis showed statistically significant patterns emerging across every language, irrespective of the geographical region where the language is spoken, or has evolved. The similar sounds were particularly evident when describing parts of the human body, family relationships, and aspects of the natural world. Their discovery suggests that humans speak, in effect, one universal language; which consequently suggests that there is something about the human brain that ascribes sounds to meanings in a non-arbitrary way. We have yet to discover what that might be. My personal bet would be that synaesthesia might be a good place to start looking for an explanation for a universal language. People with synaesthesia have sensory outcomes triggered by signals arriving at a different sensory pathway. For example, they can “see” sounds, or “taste” numbers. If you are not one of them you can get a sense of how synaesthesia feels by testing the bouba/kiki effect on yourself. “Bouba” brings to mind something round and curved – right? How about “kiki”? How does that “feel”? But, I m digressing…

The second announcement came from DeepMind, a Google company that is pioneering research in machine learning and artificial intelligence. The big news here was about computer speech synthesis. The traditional approach to solving this complex problem has been to record and store fragments of human speech in a data base, then use these fragments to synthesise speech in a vocoder, an approach called “concatenation”. DeepMind approached the problem of human speech synthesis more directly, by sampling human voices at 16,000 samples per second, then using convoluted neural networks to create WaveNet, a deep generative model of raw audio waveforms. Generating human voice by this method can mimic the human voice better than concatenate text-to-speech (TTS) using speech fragments, according to the DeepMind announcement. But there is something about WaveNet that goes beyond the practical implications of making computers speak in more realistic human voices.

One day, a system such as WaveNet coupled with a language recognition system – such as Google voice search - could be able to speak any human language simply by listening to people speaking. Its deep neural nets may indeed form an internal representation of human language not dissimilar to the innate representations that our brains have when we are born. If this is true then we will get a computer representation of what the universal human language looks like. This is a falsifiable hypothesis that could be tested by using WaveNet to validate the results of the linguistic research paper. To do so would require WaveNet to further evolve towards the “general intelligence” spectrum of artificial intelligence. Intriguingly, the system has already exhibited a form of general, or “transfer”, learning. As reported by DeepMind, the creators of WaveNet discovered that training the system on many speakers made the system better at modelling a single speaker than training on that speaker alone. This behaviour suggests that the system could generalize its internal representation and learn the characteristics of many different voices, male and female.

Should such experimental validation ever occur, we would have an artificial system “wired” about language in a way equivalent to the human brain. If human language is indeed universal then computers are getting closer and closer at becoming like us.