An iPhone app will let anyone donate their voice to be cut up and customised into synthetic speech for someone who needs a similar voice

Putting words in another’s mouth (Image: Ken Davies/Corbis)

I AM locked in a quiet, carpeted room, listening to a robotic voice on my iPhone. When the voice pauses, I repeat after it: “Jo’s gentlemanly demeanour amused and set him at his ease.”

This rather odd sentence, along with a few hundred others, form part of a test for a new phone app that will allow people to donate their voices to help other people. VocaliD wants to use speech recordings to create personalised synthetic voices for those who are unable to speak on their own. One day, my voice may be cut up and customised for a person who needs a voice that sounds a bit like mine.

There are millions of people with severe speech impediments because of a stroke, Parkinson’s or cerebral palsy, for example. In the past, a lucky few have had synthetic voices built for them. When movie critic Roger Ebert lost his ability to speak due to cancer, Scottish text-to-speech company CereProc was able to build a substitute that sounded close to his own.


But most people don’t have a rich supply of audio recordings at their disposal to help stitch a new voice together. Generally, they are stuck with generic computerised voices – think Stephen Hawking, who uses an early synthesiser called DECtalk.

“For these individuals, this is the only way that they interact with people around them,” says Rupal Patel, a speech scientist at Northeastern University in Boston and VocaliD’s co-director. It’s crucial that it fits, just like any other prosthetic, she says.

It’s crucial that a synthetic voice fits like any other prosthetic – it’s the only means of interaction

So her group listens to the limited sounds that her patients are able to produce. These utterances provide clues to what that person’s speech might sound like – whether it’s high-pitched, raspy or breathy.

A surrogate who is similar in age and the same sex is selected to donate their voice. That person reads through several thousand sample sentences, sourced from classic books like White Fang, The Wonderful Wizard of Oz and The Velveteen Rabbit.

Then the two voices are blended together and, using a software tool called ModelTalker, stripped down into the tiny units that make up speech. Even a single vowel sound might be broken into two or three parts that can then be assembled into new words. “You probably wouldn’t recognise it as having come from the donor any more,” says Timothy Bunnell of the University of Delaware in Wilmington, who created ModelTalker and is also VocaliD’s co-director.

Using this method, the group has built a handful of personalised voices. The impact it has on recipients is huge. As one anonymous user put it: “I was almost in tears when I first heard it and I can’t express what it means to know that, whatever happens to me, I will be able to communicate with my own voice.”

But the process is slow, since surrogates must come to a studio to record for several hours. It takes at least 800 sentences to create a usable voice, and around 3000 for one that sounds relatively natural.

VocaliD wants people all over the world to donate their voices so that what Patel calls their “voice bank” will have a whole range of speaking styles on tap.

“If we were successful at being able to do this data collection via the iPhone, we’d really get to capture the variation of voices in the world,” she says.

The team hopes to encourage children to contribute their voices by building a game around the recording process, which can feel a little tedious at times.

“This is a significant step forward in using technology in a way that’s quite novel,” says David Pisoni, director of the Speech Research Laboratory at Indiana University. It is important because how someone speaks gives a listener much more information about them than just the content of what they say, he says.

“Those attributes of your speech tell the listener whether you are familiar or unfamiliar, male or female, from New England, New York City or the South. It tells you about the emotional and mental space of a talker,” Pisoni says. “This is what makes you you and me me.”

This article appeared in print under the headline “A voice to call your own”