How to inference using MelGAN given a tacotron mel spec output? #46

OswaldoBornemann · 2020-03-09T09:24:17Z

When i trained melgan with original wav's mel spec, the result went well.

But when i tried to feed tacotron mel spec output into trained melgan model, the sound just all bee. Would you mind sharing some advice? thanks a lot. @seungwonpark

CookiePPP · 2020-03-09T09:25:06Z

upload sound samples?

OswaldoBornemann · 2020-03-09T09:44:03Z

@CookiePPP Please set the volume into lowest... I don't want to hurt your ears...

bad result.wav.zip

CookiePPP · 2020-03-09T09:46:14Z

Do you have the code you used to feed the tacotron outputs into melgan uploaded somewhere?
That's definitely bugged out.

OswaldoBornemann · 2020-03-09T09:51:47Z

@CookiePPP The process are kind like below:

First i get the mel spec output from tacotron, using like

# mel sent shape is (spec_length, 80)
mel_sent = tacotron_out(model, sentence, CONFIG, use_cuda, ap, use_gl=use_gl, figures=True)

Then i unsqueeze and transpose the mel result to feed into MelGAN.

checkpoint_path = "./melgan/chkpt/id_test1/id_test1_aca5990_0700.pt"
config = "./melgan/config/id_test1.yaml"

checkpoint = torch.load(checkpoint_path)
# if args.config is not None:
#     hp = HParam(config)
# else:
hp = load_hparam_str(checkpoint['hp_str'])

melgan_model = Generator(hp.audio.n_mel_channels).cuda()
melgan_model.load_state_dict(checkpoint['model_g'])
melgan_model.eval()

with torch.no_grad():
    mel = torch.from_numpy(mel_sent).unsqueeze(0).transpose(2, 1)
    mel = mel.cuda()

    audio = model.inference(mel)
    audio = audio.cpu().detach().numpy()

CookiePPP · 2020-03-09T09:55:29Z

mel_sent = tacotron_out(model, sentence, CONFIG, use_cuda, ap, use_gl=use_gl, figures=True)

Where does this line come from? This repo is designed to inferface with NVIDIA/Tacotron.
Nvidia uses their own Spectrogram conversion that I believe outputs values between -12 and 2.

OswaldoBornemann · 2020-03-09T09:59:04Z

@CookiePPP I see. I use mozilla tts instead.

OswaldoBornemann · 2020-03-09T10:00:25Z

@CookiePPP I would like to know that whether could we use tacotron gta output to train melgan

CookiePPP · 2020-03-09T10:02:31Z

@tsungruihon
You should be able to scale the output and get an audible result. I don't know what range Mozilla TTS has, but try to transform the Mozilla output to match the Nvidia one.
e.g

mel_sent = tacotron_out(model, sentence, CONFIG, use_cuda, ap, use_gl=use_gl, figures=True)
mel_sent = (mel_sent * 0.5) + 2

and replace 0.5 and +2 with the values that move the spectrogram between -12 and 2.

@CookiePPP I would like to know that whether could we use tacotron gta output to train melgan

Note sure, I'm busy today so I can't really help you there.

OswaldoBornemann · 2020-03-09T10:05:43Z

@CookiePPP Really appreciated. Thanks a lot.

mennatallah644 · 2020-11-15T13:29:44Z

I face the same problem Did you find a solution?
@tsungruihon

OswaldoBornemann · 2020-11-16T00:33:54Z

Please visit https://github.com/mozilla/TTS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to inference using MelGAN given a tacotron mel spec output? #46

How to inference using MelGAN given a tacotron mel spec output? #46

OswaldoBornemann commented Mar 9, 2020 •

edited

Loading

CookiePPP commented Mar 9, 2020

OswaldoBornemann commented Mar 9, 2020

CookiePPP commented Mar 9, 2020 •

edited

Loading

OswaldoBornemann commented Mar 9, 2020 •

edited

Loading

CookiePPP commented Mar 9, 2020 •

edited

Loading

OswaldoBornemann commented Mar 9, 2020 •

edited

Loading

OswaldoBornemann commented Mar 9, 2020

CookiePPP commented Mar 9, 2020 •

edited

Loading

OswaldoBornemann commented Mar 9, 2020

mennatallah644 commented Nov 15, 2020

OswaldoBornemann commented Nov 16, 2020

How to inference using MelGAN given a tacotron mel spec output? #46

How to inference using MelGAN given a tacotron mel spec output? #46

Comments

OswaldoBornemann commented Mar 9, 2020 • edited Loading

CookiePPP commented Mar 9, 2020

OswaldoBornemann commented Mar 9, 2020

CookiePPP commented Mar 9, 2020 • edited Loading

OswaldoBornemann commented Mar 9, 2020 • edited Loading

CookiePPP commented Mar 9, 2020 • edited Loading

OswaldoBornemann commented Mar 9, 2020 • edited Loading

OswaldoBornemann commented Mar 9, 2020

CookiePPP commented Mar 9, 2020 • edited Loading

OswaldoBornemann commented Mar 9, 2020

mennatallah644 commented Nov 15, 2020

OswaldoBornemann commented Nov 16, 2020

OswaldoBornemann commented Mar 9, 2020 •

edited

Loading

CookiePPP commented Mar 9, 2020 •

edited

Loading

OswaldoBornemann commented Mar 9, 2020 •

edited

Loading

CookiePPP commented Mar 9, 2020 •

edited

Loading

OswaldoBornemann commented Mar 9, 2020 •

edited

Loading

CookiePPP commented Mar 9, 2020 •

edited

Loading