Video: Look who’s talking

Ba, ba, ba, brick (Image: Pete Stevens)

AT FIRST it’s just noise: a stream of incoherent sounds, burbling away. But, after a few minutes, a fully formed word suddenly emerges: “red”. Then another: “box”. In this way, a babbling robot learns to speak its first real words, just by chatting with a human.

Seeing this developmental leap in a machine may lead to robots that speak in a more natural, human-like way, and help uncover how children first start to make sense of language.

Between the ages of 6 and 14 months children move from babbling strings of syllables to uttering actual words. It’s a necessary step en route to acquiring full language. Once a few “anchor” words have been established, they provide clues as to where words may start and finish and so it becomes easier for a child to learn to speak.


Inspired by this process, a team led by computer scientist Caroline Lyon at the University of Hertfordshire, UK, programmed their iCub humanoid robot, called DeeChee, with almost all the syllables that exist in English – around 40,000 in total. This allowed it to babble rather like a baby, by arbitrarily stringing syllables together.

The researchers also enlisted 34 people to act as teachers, who were told to treat DeeChee as if it were a child. DeeChee took part in an 8-minute dialogue with each teacher. Between each session, its memory was saved, wiped and reset, so that the experiment started anew with each teacher. At the outset of each dialogue, each of the syllables in DeeChee’s lexicon had an identical score.

Lexicon score

All that changed once the lesson began. Programmed to take turns listening and then speaking, DeeChee turned the teacher’s speech into syllables, totting up the number of instances of each one. It then updated the scores in its own lexicon, giving extra points to syllables the teacher had used. When it next spoke, it would be more likely to repeat the syllables the teacher had uttered because these now had higher scores.

Lyon says this is reminiscent of human infants. “When they hear frequent sounds, they become sensitive to them,” says Lyon. “They prefer what’s familiar.”

This learning by imitation was then reinforced by encouraging remarks from the teacher when DeeChee spoke a recognisable word. DeeChee was programmed to detect these comments and give extra points to the syllables that preceded the teacher’s approval. Inevitably, some nonsense syllables would get extra points too. But as this process was repeated, only those syllables that made up words would keep showing up in strings that gained approval.

Though the robot was still uttering nonsense syllables, towards the end of the 8 minutes, real words kept popping up more often than if DeeChee were still selecting syllables at random.

That words can emerge from babble using a statistical learning process not specific to language demonstrates that this stage of language acquisition does not require hard-wired grammar faculties, says Lyon.

Paul Vogt, a cognitive scientist at Tilburg University in the Netherlands, is impressed: “It’s a very interesting first step towards having robots that can help us study language acquisition.”

Right now, DeeChee’s speech is a far cry from full-blown language, but starting from babble could be the best way to create robots that speak naturally. “If you want the robot to work with natural speech, then you might need to teach it from the very beginning,” says Lyon.

Journal reference: PLoS One, DOI: 10.1371/journal.pone.0038236

Only the ones that matter Not all words are created equal. When Caroline Lyon’s team at the University of Hertfordshire, UK, taught a robot rudimentary speech, it favoured some types of words over others. Shapes and colours – including “red”, “green”, “heart”, “square” and “box” – appeared much more often than “the” or “and”. Lyon says “and” or “the” are uttered very frequently but are sometimes squashed together with another word, making it hard to make them out. Salient words like “red” or “green” tend to be pronounced the same way no matter where in a sentence they appear. It’s possible that this difference helps children learn, as these words have higher “information value” at a young age, Lyon’s team suggests.