This dataset contains 15.000 audio fragments of a male Dutch Flemish voice, the sentences read are extracted from the Mozilla Common Voice project.
Dataset: Google Drive (1.5GB)
New: the dataset is now also available for preview on Kaggle
To use this dataset with Mozilla TTS, append the following fragment to TTS/tts/datasets/preprocess.py
:
def rdh_flemish(root_path, meta_file):
txt_file = os.path.join(root_path, meta_file)
speaker_name = "rdh_flemish"
items = []
with open(txt_file, 'r', encoding="utf-8") as f:
for line in f:
cols = line.split("|")
text = cols[1]
wav_file = os.path.join(root_path, cols[0] + ".wav")
items.append([text, wav_file, speaker_name])
return items
Files in the dataset are 16-bit, 22050Hz downsampled from 44.1kHz, mono, wave.
The audio samples unfortunately may vary slightly over recording sessions.
- De plant van de aardappel is giftig
- De Straat van Gibraltar is een zee-engte aan het uiteinde van de Middellandse Zee.
- Dat kan met betrekkelijk weinig geld.
- Ik geloof in deze dialoog en we zullen deze dialoog voeren.
- We zullen de uitkomsten op deze werkterreinen afwachten.
Models with their corresponding synthesised audio samples are provided in the links below.
Original dataset:
-
Tacotron DDC: Google Drive
-
Glow: Google Drive
Other dataset:
- Tacotron DDC with transfer learning: Google Drive (subpar results)
Due to a severe lack of quality data (4.000 noise gated fragments) the second dataset hasn't been released. The first model was used for transfer learning, although this still proved to be insufficient.