You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is unclear how mel spectrograms are used by the StyleMelGAN generator module.
I've been trying to figure out how to format mel spectrograms so the generator will accept them. To figure that out, I've been looking at the initialization parameters of the StyleMelGANGenerator module.
The only obvious candidate for defining the format/dimensions of the input spectrogram is the aux_channels parameter. But that wouldn't make sense, for these reasons:
Its default value is 80, but a mel spectrogram contains much more than 80 points of data.
aux_channels controls only one parameter: the in_channels parameter of the first layer in the first TADEResBlock. That would make sense if if the mel spectrograms' dimensions corresponded to this parameter, but...
The diagram of StyleMelGAN's signal path in the original StyleMelGan paper conflicts with point 2); the diagram shows the spectrograms being inserted into every TADEResBlock, not just the first.
So my questions are:
What isaux_channels? (What kind of data is considered "auxiliary input" - am I correct that this is the spectrograms?)
If aux_channels does not determine how the input spectrograms should be formatted, what does?
If you can answer these questions for me, I would be happy to improve the documentation/comments myself.
Thank you!
The text was updated successfully, but these errors were encountered:
Hello,
This is probably just a documentation problem.
It is unclear how mel spectrograms are used by the StyleMelGAN generator module.
I've been trying to figure out how to format mel spectrograms so the generator will accept them. To figure that out, I've been looking at the initialization parameters of the
StyleMelGANGenerator
module.The only obvious candidate for defining the format/dimensions of the input spectrogram is the
aux_channels
parameter. But that wouldn't make sense, for these reasons:aux_channels
controls only one parameter: thein_channels
parameter of the first layer in the firstTADEResBlock
. That would make sense if if the mel spectrograms' dimensions corresponded to this parameter, but...TADEResBlock
, not just the first.So my questions are:
aux_channels
? (What kind of data is considered "auxiliary input" - am I correct that this is the spectrograms?)aux_channels
does not determine how the input spectrograms should be formatted, what does?If you can answer these questions for me, I would be happy to improve the documentation/comments myself.
Thank you!
The text was updated successfully, but these errors were encountered: