Welcome to afrTTS! We implement two systems:
-NaiveTTS uses an existing limited pronunciation dictionary and suffers from misallignment.
-G2PxTTS uses a G2P conversion model to expand this dictionary to create more coherent audio.
Installs
pip install librosa
pip install univoc
pip install tacotron
pip install omegaconf
pip install torch
Tacotron
A series pretrained weights are avaible at https://github.com/JulianHerreilers/pantoffel_tacotron_models_storage only the following two should be used:
-NaiveTTS: "https://github.com/JulianHerreilers/pantoffel_tacotron_models_storage/releases/download/v0.190k-210k-230k-beta/model-230000.
-G2PxTTS: "https://github.com/JulianHerreilers/pantoffel_tacotron_models_storage/releases/download/v1.120epoch/model-300000.pt"
Tacotron can be trained by running the following preprocessing and training commands:
First adjust the first argument in line 32 of utils/jsonmaker.py to metadata_incomplete.csv then run the following commands:
python utils/jsonmaker.py
python preprocess.py afrZA datasets/afrZA
python train.py afrza afrZA/metadata_incomplete.csv datasets/afrZA
The G2P model can be trained and used to expand the pronuncation dictionary all from the notebook at G2P/G2P_LSTM.ipynb.
A demo notebook afrTTS_demo.pynb can be used to test out the two systems provided that demo_utils.py, g2pmodel.py and the two dictionaries, afr_za_dict.txt and rcrl_apd.1.4.1.txt are in the same directory.
Further datasets and alogrithms are available in utils/:
-split_num_letters.py converts all numbers in a sequence to their word equivalents.
-demo_sample_randomizer.ipynb was used to sort the demo samples for the subjective evaluation.
-check_valid_entries complete.py returns the remainder of the dataset that remains in the selected dictionary, whether afr_za_dict.txt or rcrl_apd.1.4.1.txt.
Acknowledegments:
-https://github.com/bshall/Tacotron
-https://github.com/bshall/UniversalVocoding
-Computations were performed using the University of Stellenbosch's HPC1 (Rhasatsha): http://www.sun.ac.za/hpc
Please not this was my first proper exposure to PyTorch and Deep Models (I would probably approach this very differently with the knowledge I now have (but hey, that's learning :)). I tried to document it as well as possible to explain my thought process but use at your own risk.)