This is a PyTorch implementation of MultiSpeech: Multi-Speaker Text to Speech with Transformer
In order to train the model on your data, follow the steps below
- prepare your data and make sure the data is formatted in an PSV format as below without the header
speaker_id,audio_path,text,duration
0|file/to/file.wav|the text in that file|3.2
The speaker id should be integer and starts from 0
- make sure the audios are MONO if not make the proper conversion to meet this condition
- create enviroment
python -m venv env
- activate the enviroment
source env/bin/activate
- install the required dependencies
pip install -r requirements.txt
- update the config file if needed
- train the model
python train.py --train_path train_data.txt --test_path test_data.txt --checkpoint_dir outdir --epoch 100 --batch_size 64