-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
French Tacotron2 DDC release. #539
Comments
Hi @erogol Thanks for this. Can you elaborate on why you use the phoneme_cleaners over the french_cleaners in your config file? |
because model uses phonemes |
If I understand well:
Am I misundertsanding something? |
I used french phonemes as dictated by |
I did the implementation of def french_cleaners(text):
'''Pipeline for French text. There is no need to expand numbers, phonemizer already does that'''
text = lowercase(text)
text = expand_abbreviations(text, lang='fr')
text = replace_symbols(text, lang='fr')
text = remove_aux_symbols(text)
text = collapse_whitespace(text)
return text as you can see, the text doesn't go through the |
@erogol The config present in your shared folder seems to be different from what is present in 72a6ac5(The commit ID corresponding to Tacotron2 DDC in the wiki. Different in terms of fiels present in the json and not the values). Can you please confirm if I am trying to do same sort of thing on different language, which should be my point to start for training Tacotron2 part? Thanks |
use the one in the shared folder. |
I'm trying to improve French Tacotron2 DDC, because there is some noises you don't have in English synthesizer made with Tacotron 2. There is also some pronunciation defaults on nasal fricatives, certainly because missing phonemes (ɑ̃, ɛ̃) like in œ̃n ɔ̃ɡl də ma tɑ̃t ɛt ɛ̃kaʁne (Un ongle de ma tante est incarné.) I started to train text2feat, from scratch on French corpus (MAI_ezwa) but after 10k to 15k the loss increase drastically and the result is not so bad with your vocoder, but you told that you trained the model only for 100k steps, 10 times as I do. How can I arrive to 100k steps and more. I join the config file ( Thanks |
I'm trying to improve French Tacotron2 DDC, because there is some noises you don't have in English synthesizer made with Tacotron 2. There is also some pronunciation defaults on nasal fricatives, certainly because missing phonemes (ɑ̃, ɛ̃) like in œ̃n ɔ̃ɡl də ma tɑ̃t ɛt ɛ̃kaʁne (Un ongle de ma tante est incarné.) I started to train text2feat, from scratch on French corpus (MAI_ezwa) but after 10k to 15k the loss increase drastically and the result is not so bad with your vocoder, but you told that you trained the model only for 100k steps, 10 times as I do. How can I arrive to 100k steps and more. I join the config file. Thanks
There is a big bug in your |
Colab notebook: https://colab.research.google.com/drive/16T5avz3zOUNcIbF_dwfxnkZDENowx-tZ?usp=sharing
Model files: https://drive.google.com/drive/folders/1LpsUx08Z3-JgvNLPQY67y8OjE4IlP4f1?usp=sharing
This release uses Tacotron2 DDC with a combination of Universal Fullband-Melgan. Model is trained using MAI-Labs dataset subset fr_FR/by_book/female/etwa/monsieur_lecoq).
Tacotron2 model is trained for 100K steps starting from a pre-trained LJSpeech model.
Tacotron2 and the vocoder model have different sampling rates (16khz vs 24khz) and this is resolved by interpolating Tacotron2 output before feeding into the vocoder as in the sample Colab notebook above.
The text was updated successfully, but these errors were encountered: