You first run this in your shell:
git clone https://github.com/mizoru/FastSpeech2.git
cd FastSpeech2
pip install -r requirements.txt
wget https://dagshub.com/mizoru/FastSpeech2/raw/be43b4f7d3be88e258e0cef8cdd68d587fff54e7/checkpoint1.pth
To get the audio for sentences in texts.txt run:
python test.py -m checkpoint1.pth -t texts.txt
To continue training:
python train.py -r checkpoint1.pth
You can find the final inference-time predictions in wavresults.zip
You can find the report on the training here and also look at my WandB logs
B(eta) stands for pitch and G(amma) for energy