Data Efficient Voice Cloning for Neural Singing Synthesis Sound examples Contact: {merlijn.blaauw, jordi.bonada}@upf.edu [arXiv preprint] Presented at ICASSP 2019, May 12-17, 2019, Brighton, UK. [other singing synthesis demos]

Demos

English male voice - All Along The Watchtower

Here the voice timbre is cloned from 2 songs (03:31 total) of the target singer.

The waveform in this demo was generated using a mixed data WaveNet vocoder. Reference recording of target voice (but not adaptation material).

Japanese female voice - Kanade 奏（かなで）

Here the voice timbre is cloned from 01:49 of pseudo singing (short phrases at a constant pitch). Reference recording of target voice (but not adaptation material).

Catalan choir - El Rossinyol

Synthesized choir consisting of 8 cloned voices (average of 02:22 pseudo singing per voice), and 4 voices trained on full datasets (average of 51:47 pseudo singing per voice). All 12 voices trained on full datasets (average of 51:47 pseudo singing per voice).

English female voice - Yellow

Voice cloned from from 02:14 of natural singing. Voice cloned from 02:23 of pseudo singing.

Acknowledgments

The proprietary datasets used in these experiments were generously provided by Zya, Voctro Labs, and Yamaha Corp. Experiments with choir synthesis performed as part of TROMPA project (H2020 770376). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.