I want to thank all the authors for the great work that they have done with this paper.
I am trying to reproduce the Librispeech model training to get a better sense of how the model is training in the hopes of building a
25Hz version of xcodec in the future.
I downloaded all the 960h Librispeech training from here and kept the config of the model as it is. I only changed batch size from 8 in 8 GPUs to 16 in 4 GPUs.
The problem I am running into is that the training is not stable. It seems to me that the GAN setting is difficult to train and is the main culprit of this.


I just wanted to ask if you have experienced this during the experiments and how you dealt with this. I am almost tempted to just resume the training from an earlier checkpoint. It would be really helpful if you guys can guide me here.
Thank you and I appreciate the time you've taken to read this!
I want to thank all the authors for the great work that they have done with this paper.
I am trying to reproduce the Librispeech model training to get a better sense of how the model is training in the hopes of building a
25Hz version of xcodec in the future.
I downloaded all the 960h Librispeech training from here and kept the config of the model as it is. I only changed batch size from 8 in 8 GPUs to 16 in 4 GPUs.
The problem I am running into is that the training is not stable. It seems to me that the GAN setting is difficult to train and is the main culprit of this.


I just wanted to ask if you have experienced this during the experiments and how you dealt with this. I am almost tempted to just resume the training from an earlier checkpoint. It would be really helpful if you guys can guide me here.
Thank you and I appreciate the time you've taken to read this!