French Tacotron2 DDC release. #539

erogol · 2020-10-12T10:20:47Z

Colab notebook: https://colab.research.google.com/drive/16T5avz3zOUNcIbF_dwfxnkZDENowx-tZ?usp=sharing
Model files: https://drive.google.com/drive/folders/1LpsUx08Z3-JgvNLPQY67y8OjE4IlP4f1?usp=sharing

This release uses Tacotron2 DDC with a combination of Universal Fullband-Melgan. Model is trained using MAI-Labs dataset subset fr_FR/by_book/female/etwa/monsieur_lecoq).

Tacotron2 model is trained for 100K steps starting from a pre-trained LJSpeech model.

Tacotron2 and the vocoder model have different sampling rates (16khz vs 24khz) and this is resolved by interpolating Tacotron2 output before feeding into the vocoder as in the sample Colab notebook above.

Gaet81 · 2020-11-15T16:37:07Z

Hi @erogol Thanks for this.

Can you elaborate on why you use the phoneme_cleaners over the french_cleaners in your config file?

erogol · 2020-11-16T15:15:09Z

Hi @erogol Thanks for this.

Can you elaborate on why you use the phoneme_cleaners over the french_cleaners in your config file?

because model uses phonemes

Gaet81 · 2020-11-17T23:01:33Z

If I understand well:

it will transform the text to ASCII -> what seems to be an issue for french because then there will be no différence between e, é and (è, ế or ë)
the abreviation, symbols will be transform into the english one rather than the french one...

Am I misundertsanding something?

erogol · 2020-11-18T11:22:59Z

I used french phonemes as dictated by phoneme_language in config.json. So I don't think there is any issue.

WeberJulian · 2020-11-25T15:19:39Z

If I understand well:

it will transform the text to ASCII -> what seems to be an issue for french because then there will be no différence between e, é and (è, ế or ë)

the abreviation, symbols will be transform into the english one rather than the french one...

Am I misundertsanding something?

I did the implementation of french_cleaners myslef:

def french_cleaners(text):
    '''Pipeline for French text. There is no need to expand numbers, phonemizer already does that'''
    text = lowercase(text)
    text = expand_abbreviations(text, lang='fr')
    text = replace_symbols(text, lang='fr')
    text = remove_aux_symbols(text)
    text = collapse_whitespace(text)
    return text

as you can see, the text doesn't go through the convert_to_ascii method.
I also added support for french abbreviations here.

srijan14 · 2021-01-16T20:34:16Z

@erogol The config present in your shared folder seems to be different from what is present in 72a6ac5(The commit ID corresponding to Tacotron2 DDC in the wiki. Different in terms of fiels present in the json and not the values). Can you please confirm if I am trying to do same sort of thing on different language, which should be my point to start for training Tacotron2 part?

Thanks

erogol · 2021-01-17T00:37:02Z

use the one in the shared folder.

lpierron · 2021-02-22T14:01:05Z

I'm trying to improve French Tacotron2 DDC, because there is some noises you don't have in English synthesizer made with Tacotron 2. There is also some pronunciation defaults on nasal fricatives, certainly because missing phonemes (ɑ̃, ɛ̃) like in œ̃n ɔ̃ɡl də ma tɑ̃t ɛt ɛ̃kaʁne (Un ongle de ma tante est incarné.)

I started to train text2feat, from scratch on French corpus (MAI_ezwa) but after 10k to 15k the loss increase drastically and the result is not so bad with your vocoder, but you told that you trained the model only for 100k steps, 10 times as I do. How can I arrive to 100k steps and more.

I join the config file (config.json) in a zip file, you will see also the config file for the vocoder, same as yours I suppose, and an example file of French sentences (sentences.txt) with the phonemize-espeak phonetisations (sentences.pho).

Thanks
tts_fr_config_sentences.zip

lpierron · 2021-02-22T14:20:25Z

I'm trying to improve French Tacotron2 DDC, because there is some noises you don't have in English synthesizer made with Tacotron 2. There is also some pronunciation defaults on nasal fricatives, certainly because missing phonemes (ɑ̃, ɛ̃) like in œ̃n ɔ̃ɡl də ma tɑ̃t ɛt ɛ̃kaʁne (Un ongle de ma tante est incarné.)

I started to train text2feat, from scratch on French corpus (MAI_ezwa) but after 10k to 15k the loss increase drastically and the result is not so bad with your vocoder, but you told that you trained the model only for 100k steps, 10 times as I do. How can I arrive to 100k steps and more.

I join the config file.

Thanks

If I understand well:

it will transform the text to ASCII -> what seems to be an issue for french because then there will be no différence between e, é and (è, ế or ë)

the abreviation, symbols will be transform into the english one rather than the french one...

Am I misundertsanding something?

I did the implementation of french_cleaners myslef:
def french_cleaners(text):
    '''Pipeline for French text. There is no need to expand numbers, phonemizer already does that'''
    text = lowercase(text)
    text = expand_abbreviations(text, lang='fr')
    text = replace_symbols(text, lang='fr')
    text = remove_aux_symbols(text)
    text = collapse_whitespace(text)
    return text
as you can see, the text doesn't go through the convert_to_ascii method.
I also added support for french abbreviations here.

There is a big bug in your expand_abbreviations in French. I send you the new one.
I send also a new symbols.py, with the missing nasal vowels.
abbreviations_symbols.zip

erogol added the model-release explanation for new model releases label Oct 12, 2020

erogol closed this as completed Feb 17, 2021

lpierron mentioned this issue Feb 22, 2021

Problems with French TTS Model #668

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

French Tacotron2 DDC release. #539

French Tacotron2 DDC release. #539

erogol commented Oct 12, 2020

Gaet81 commented Nov 15, 2020

erogol commented Nov 16, 2020

Gaet81 commented Nov 17, 2020

erogol commented Nov 18, 2020

WeberJulian commented Nov 25, 2020

srijan14 commented Jan 16, 2021 •

edited

Loading

erogol commented Jan 17, 2021

lpierron commented Feb 22, 2021

lpierron commented Feb 22, 2021

French Tacotron2 DDC release. #539

French Tacotron2 DDC release. #539

Comments

erogol commented Oct 12, 2020

Gaet81 commented Nov 15, 2020

erogol commented Nov 16, 2020

Gaet81 commented Nov 17, 2020

erogol commented Nov 18, 2020

WeberJulian commented Nov 25, 2020

srijan14 commented Jan 16, 2021 • edited Loading

erogol commented Jan 17, 2021

lpierron commented Feb 22, 2021

lpierron commented Feb 22, 2021

srijan14 commented Jan 16, 2021 •

edited

Loading