Single-Speaker Speech Generation Samples generated by MelNet trained on the task of unconditional single-speaker speech generation using professionally recorded audiobook data from the Blizzard 2013 dataset. Samples Samples from the model without biasing or priming. Biased Samples Samples from the model using a bias of 1.0. Primed Samples The first 5 seconds of each audio clip are from the dataset and the remaining 5 seconds are generated by the model.

Multi-Speaker Speech Generation Samples generated by MelNet trained on the task of unconditional multi-speaker speech generation using noisy, multispeaker, multilingual speech data from the VoxCeleb2 dataset. Samples Samples from the model without biasing or priming.

Music Generation Samples generated by MelNet trained on the task of unconditional music generation using recorded piano performances from the MAESTRO dataset. Samples Samples from the model without biasing or priming. Primed Samples The first 5 seconds of each audio clip are from the dataset and the remaining 5 seconds are generated by the model.

Single-Speaker Text-to-Speech Samples generated by MelNet trained on the task of single-speaker TTS using professionally recorded audiobook data from the Blizzard 2013 dataset. Samples The first audio clip for each text is taken from the dataset and the remaining 3 are samples generated by the model. “My dear Fanny, you feel these things a great deal too much. I am most happy that you like the chain,” Looking with a half fantastic curiosity to see whether the tender grass of early spring, “I like them round,” said Mary. “And they are exactly the color of the sky over the moor.” Lydia was Lydia still; untamed, unabashed, wild, noisy, and fearless. “Oh, he has been away from New York—he has been all round the world. He doesn't know many people here, but he's very sociable, and he wants to know every one.” Primed Samples Each unlabelled audio clip is taken from the dataset and the audio clip that directly follows is a sample generated by the model primed with that sequence. Write a fond note to the friend you cherish. Pluck the bright rose without leaves. Two plus seven is less than ten. He said the same phrase thirty times. We frown when events take a bad turn.

Multi-Speaker Text-to-Speech Samples generated by MelNet trained on the task of multi-speaker TTS using noisy speech recognition data from the TED-LIUM 3 dataset. Samples Samples generated by the model conditioned on text and speaker ID. The conditioning text and speaker IDs are taken directly from the validation set (text in the dataset is unnormalized and unpunctuated). it wasn't like i was asking for the code to a nuclear bunker or anything like that but the amount of resistance i got from this and what that form is modeling and shaping is not cement that every person here every decision that you've made today every decision you've made in your life you've not really made that decision but in fact syria was largely a place of tolerance historically accustomed and no matter what the rest of the world tells them they should be the years went by and the princess grew up into a beautiful young woman i spent so much time learning this language why do i only and we were down to eating one meal a day running from place to place but wherever we could help we did at a certain point in time in phrases and words even if you have a phd of chinese language you can't understand them and when they came back and told us about it we really started thinking about the ways in which we see styrofoam every day is only a very recent religious enthusiasm it surfaced only in the west chances are that they are rooted in the productivity crisis i cannot face your fears or chase your dreams and you can't do that for me but we can be supportive of eachother the first law of travel and therefore of life you're only as strong Selected Speakers Samples generated by the model for selected speakers. Reference audio for each of the speakers can be found on the TED website. Bill Gates Daphne Koller Fei-Fei Li George Takei Jane Goodall Sal Khan Stephen Wolfram Stephen Hawking A cramp is no small danger on a swim. He said the same phrase thirty times. Pluck the bright rose without leaves. Two plus seven is less than ten. The glow deepened in the eyes of the sweet girl. Bring your problems to the wise chief. Write a fond note to the friend you cherish. Clothes and lodging are free to new men. We frown when events take a bad turn. Port is a strong wine with a smoky taste.

WaveNet Baseline For comparison, we train WaveNet on the same three unconditional audio generation tasks used to evaluate MelNet (single-speaker speech generation, multi-speaker speech generation, and music generation). Single-Speaker Speech Generation Samples without biasing or priming. Samples with priming: 5 seconds from the dataset followed by 5 seconds generated by WaveNet. Multi-Speaker Speech Generation Samples without biasing or priming. Music Generation Samples without biasing or priming. Samples from a two-stage model which separately models MIDI notes and then uses WaveNet to synthesize audio conditioned on the generated MIDI notes.