Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Glow-tts with korean cleaner, enabled multispeaker training (reffering to some of issues).
This repo recommended to be used as a reference for multispeaker training.
_custom
: executed with korean cleaners.
_custom_multi
: executed with korean cleaners, for multispeaker training.
Single korean speaker demo with KSS is available. link
- from Pitchtron https://github.com/hash2430/pitchtron
-
Due to apex(commit: 37cdaf4) dependency, I used pytorch 1.3.0 (instead of 1.2.0)
-
For multispeaker setting
-
filelist should be in followed format.
audio_path(*.wav)|speaker_id|transcript
related issue -
Add n_speakers, gin_channels to config is recommended. related issue
-
(
TextMelLoader
,TextMelCollate
) should be replaced with (TextMelSpeakerLoader
,TextMelSpeakerCollate
) ininit.py
,train.py
Also, edit
(x, x_lengths, y, y_lengths)
to(x, x_lengths, y, y_lengths, g)
. -
Usage of speaker information(g) should be delievered explicitly to FlowGenerator. related issue
generator(x=x, x_lengths=x_lengths, y=y, y_lengths=y_lengths, g=g, gen=False)
(I do not know why)
-
-
'Gradient overflow' might be caused due to data problem. related issue
- Python==3.6.9
- pytorch==1.3.0
- cython==0.29.12
- librosa==0.7.1
- numpy==1.16.4
- scipy==1.5.4
- nltk==3.6.5
Please check official repository.
sh train_custom_multi_ddi.sh configs/base.json base
See inference.ipynb