Reproducibility problems with Librispeech model

I want to thank all the authors for the great work that they have done with this paper. 

I am trying to reproduce the Librispeech model training to get a better sense of how the model is training in the hopes of building a 
 25Hz version of xcodec in the future. 

I downloaded all the 960h Librispeech training from [here](https://www.openslr.org/12) and kept the config of the model as it is. I only changed batch size from 8 in 8 GPUs to 16 in 4 GPUs. 

The problem I am running into is that the training is not stable. It seems to me that the GAN setting is difficult to train and is the main culprit of this.  
![image](https://github.com/user-attachments/assets/437d37c8-211d-4cca-a288-fadc7f3c5770)
![image](https://github.com/user-attachments/assets/f6cd89b3-42fc-48f5-8561-41980b78bb0e)

I just wanted to ask if you have experienced this during the experiments and how you dealt with this. I am almost tempted to just resume the training from an earlier checkpoint. It would be really helpful if you guys can guide me here. 

Thank you and I appreciate the time you've taken to read this! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility problems with Librispeech model #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducibility problems with Librispeech model #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions