Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HifiGAN training -- obvious harmonics in test files #339

Open
Kristopher-Chen opened this issue Mar 14, 2022 · 7 comments
Open

HifiGAN training -- obvious harmonics in test files #339

Kristopher-Chen opened this issue Mar 14, 2022 · 7 comments
Labels
question Further information is requested

Comments

@Kristopher-Chen
Copy link

I trained HifiGAN on VCTK multi-speaker datasets with 24kHz sampling rate. I also do normalization in the input log-Mel spectrogram (with mean=-4, std=4), and found obvious harmonics in the test files as below.
Have you ever met this? Any suggestions? Thank you!
image

@Kristopher-Chen
Copy link
Author

BTW, the discriminators' loss is quite small, which may suggest the discriminators are too strong?
image

@kan-bayashi
Copy link
Owner

Did you use this repository? Or general question about hifigan?

@kan-bayashi kan-bayashi added the question Further information is requested label Mar 14, 2022
@Kristopher-Chen
Copy link
Author

Did you use this repository? Or general question about hifigan?

Hi, actually I referred to your repository and the official version. I trained several epochs by the official code and find the discriminators' losses around 0.1~0.2, but also with obvious harmonics. So I wonder if this happens in early training stages? But the discriminators' loss is quite strange...

@kan-bayashi
Copy link
Owner

OK. How many iterations did you run? In my experiment, around 200k iters can generate reasonable voice.
I'm not familiar with official implementation but in my case official optimizer setting does not work well.
The following issue may help you.
#278

@Kristopher-Chen
Copy link
Author

OK. How many iterations did you run? In my experiment, around 200k iters can generate reasonable voice. I'm not familiar with official implementation but in my case official optimizer setting does not work well. The following issue may help you. #278

There seems something wrong with the discriminators. The losses get smaller after more epochs. The normal values would be around 0.1~0.2 for each discriminatory, but mine is as below.
image

@MlWoo
Copy link

MlWoo commented Feb 22, 2023

@Kristopher-Chen have you resolved the problems?

@Kristopher-Chen
Copy link
Author

Kristopher-Chen commented May 11, 2023

@Kristopher-Chen have you resolved the problems?

when I refer to the original codes, this problem is solved.

For discriminator losses, the 2nd and 3rd MSD losses are easily becoming small, and others look normal.

Moreover, the feature map loss keeps growing gradually. But interestingly, the generated samples sound natural...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants