Technology is moving so fast it can make our heads spin, especially in the world of text to speech (TTS). As voice over actors, we’re certainly aware of TTS – and some of us may even fear the technology is advancing us right out of our careers. But it’s really not. Despite the rapid advances in the field, TTS remains unable to replace the real deal. Keep reading to find out why.

How TTS Has Advanced

Text to speech (TTS) is a system that converts the written word into the spoken word. Simple enough, right? But it gets more complex from there. TTS systems store speech units that can include phones, diphones, words and entire sentences. It then puts those speech units together in specific combinations to create synthetic speech that says anything – all using the voice that initially recorded those speech units.

While the first talking machine was initially introduced back in 1939, advances in the world of TTS over the past several years have been more rapid and dramatic than over the past 75. Some of these advances include the ability to:

Incorporate a model of the vocal tract and other human voice characteristics to sound more human.

Correct synthetic speech mispronunciations, adjust regional pronunciations, add emphasis, and other tricks through Speech Synthesis Markup Language (SSML).

Produce robo calls that stop and ask “Can you hear me?” or wait for a reply, like a human would, before continuing their spiel.

Copy lip-movements for dubbing.

Fix small errors in voice over recordings with synthetic edits.

Create a model, or “voice bank,” of a real person’s voice for later use as synthetic speech

Once TTS began to converge with machine learning, big data and artificial intelligence (AI), it became smarter, more realistic and, as mentioned earlier, a perceived threat to some in the voice over industry.

Potential TTS Threats to the VO Industry

There is no doubt the advances of TTS have aroused a number of concerns across the voice over industry, with some of the most common outlined below.

Losing Ongoing Royalties

The royalty structure keeps giving us a steady flow of money each time our voice is used, regularly paying us even though we’ve already done the work. If we are recording into a voice bank, are we going to get royalties every time our voice is used to create synthetic speech? Probably not. While we can likely expect to be paid a large amount for the initial recording session, we may lose out on royalties each time our voice is used down the line. After all, how can we be paid royalties for a future recording that uses our voice but we didn’t technically record?

No Control Where Your Voice is Used



Since technology allows for a pre-recorded voice to be used to create any type of message or project down the line, voice over artists may fear they won’t have a say in the type of work that will be attached to their voice. Some work may be unacceptable, but we may have no control or say over the matter.

Being Prohibited from Future Spots

If we offer buyouts on our voice banks, we could be limiting our careers without realizing it. For instance, let’s say our voice is used for a car company. We would then potentially be prohibited from doing all spots for all other companies in the future – even though we didn’t know we’d be associated with a car company at the time of the buyout.

Continuously Declining TTS Rates

Recording sessions for TTS are no longer in the $50K range. As the technology advances, the rates continue to decrease. Methods of capturing and synthesizing voice take far less recording time, which means far less pay for the voice over talent.

Why Voice Over Actors Don’t Need to Fret

While TTS concerns may feel valid for us voice over artists, we don’t have to lose sleep over them for several reasons. For starters, TTS still harbors many limitations – like the inability to spontaneously generate the infinite human range of emotions and vocal techniques.

Being able to create synthetic speech by simply typing in the words you want it to say is also not something that can yet be done. And synthetic speech, no matter how advanced or finely tuned, has still not shown it can match the multiple nuances and components associated with a real human voice.

Ongoing payments may still even exist. In addition to a recording fee, we could arrange licensing agreements that outline when and where our voices can be used down the line. Turning our TTS fears into the framework for a clear-cut contract can help ensure we have all bases covered – and continue to thrive in our profession.