Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.
We need a data set to run audio conversion models. Here is a list of the most important datasets.
-
Ljspeech
This is a public domain speech dataset consisting of 13,100 short audio clips of a single English speaker reading passages from 7 non-fiction books. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.
download Ljspeech -
VCTK
This VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences.
download VCTK -
LibriTTS
LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research.
download LibriTTS -
Common Voice
LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research.
download Common Voice