Skip to content

Conversation

@samuel-lunii
Copy link
Contributor

Note that the files of the SynPaFlex dataset have to be reorganized in an LJSpeech manner. I made a script to do this automatically, which is not present here. I can also share it if you want :)

@dathudeptrai dathudeptrai self-assigned this Aug 4, 2021
@dathudeptrai dathudeptrai added the enhancement 🚀 New feature or request label Aug 4, 2021
@dathudeptrai
Copy link
Collaborator

@samuel-lunii Is this pull request complete?

@ZDisket
Copy link
Collaborator

ZDisket commented Aug 4, 2021

@samuel-lunii Is there a trained model?

@samuel-lunii
Copy link
Contributor Author

@dathudeptrai almost complete :)

  • I forgot to include Synpaflex in tensorflow_tts/configs/tacotron2.py
  • Still need to update README.md and to create a notebook similar to this one to prepare synpaflex data before preprocessing.

@ZDisket I have trained models for tacotron2 (150k) and MB-MelGAN (780k), but I am not sure where to include them ?

@samuel-lunii
Copy link
Contributor Author

@dathudeptrai it is complete now

@dathudeptrai
Copy link
Collaborator

@samuel-lunii

I have trained models for tacotron2 (150k) and MB-MelGAN (780k), but I am not sure where to include them ?

You can make a google colab for the inference. The model should be downloaded from google drive then i will fork and copy your model to upload into Huggingface Hub :D.

@samuel-lunii
Copy link
Contributor Author

@dathudeptrai
Here is a link to the google colab for inference, everything is there :)

@dathudeptrai
Copy link
Collaborator

@samuel-lunii Many thanks :D, i will review and merge this weekend :D

dathudeptrai
dathudeptrai previously approved these changes Aug 6, 2021
@dathudeptrai
Copy link
Collaborator

@samuel-lunii can you fix a failing check :D

@dathudeptrai dathudeptrai self-requested a review August 10, 2021 10:16
@dathudeptrai dathudeptrai merged commit d7415ac into TensorSpeech:master Aug 10, 2021
@samuel-lunii samuel-lunii deleted the sd/synpaflexSupport branch August 11, 2021 07:58
@samuel-lunii
Copy link
Contributor Author

@dathudeptrai here is a colab for inference with FS2 trained at 200k steps with SynPaFlex dataset. Durations have been exported with tacotron2 trained at 150k steps.

@samuel-lunii
Copy link
Contributor Author

samuel-lunii commented Aug 26, 2021

@dathudeptrai
I realized that you set the sampling rate to 24000 Hz instead of 22050 Hz as I used for preprocessing, in this colab cell.

Also, adding the piece of code below in this cell, before # vocoder part, allows for finding the end of the synthesized utterance from alignment data, even if the synthesized mel spectrogram is much longer than it should be :

  # find the end of the sentence according to alignment data
  final_text_index = alignment_history[0].shape[0]
  final_frame_index = 0
  for frame in np.swapaxes(alignment_history[0],0,1):
    max_index = np.where(frame == np.amax(frame))[0][0] 
    final_frame_index += 1
    if max_index == final_text_index - 1:
      break

and then replace
audio = vocoder_model.inference(mel_outputs)[0, :-remove_end, 0]
by :
audio = vocoder_model.inference(mel_outputs[:,:final_frame_index,:])[0, :-1, 0]

So Vocoder inference is only performed on actual speech :)

@dathudeptrai
Copy link
Collaborator

dathudeptrai commented Aug 26, 2021

@samuel-lunii Many thanks, i will upload your model to huggingface hub soon :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement 🚀 New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants