This repository is the official implementation of "Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech".
📋 Optional: include a graphic explaining your approach/main result, bibtex entry, link to demos, blog posts and tutorials
This is how I build my environment, which is not exactly needed to be the same:
- Sign up for Comet.ml, find out your workspace and API key via www.comet.ml/api/my/settings and fill them in
.comet.config
. Comet logger is used throughout train/val/test stages. - [Optional] Install pyenv for Python version control, change to Python 3.8.6.
# After download and install pyenv:
pyenv install 3.8.6
pyenv local 3.8.6
- [Optional] Install pyenv-virtualenv as a plugin of pyenv for clean virtual environment.
# After install pyenv-virtualenv
pyenv virtualenv meta-tts
pyenv activate meta-tts
- Install learn2learn from source.
# Install Cython first:
pip install cython
# Then install learn2learn from source:
git clone https://github.com/learnables/learn2learn.git
cd learn2learn
pip install -e .
- Install requirements:
pip install -r requirements.txt
First, download LibriTTS and VCTK, then change the paths in config/LibriTTS/preprocess.yaml
and config/VCTK/preprocess.yaml
, then run
python3 prepare_align.py config/LibriTTS/preprocess.yaml
python3 prepare_align.py config/VCTK/preprocess.yaml
for some preparations.
Alignments of LibriTTS is provided here, and
the alignments of VCTK is provided here.
You have to unzip the files into preprocessed_data/LibriTTS/TextGrid/
and
preprocessed_data/VCTK/TextGrid/
.
Then run the preprocessing script:
python3 preprocess.py config/LibriTTS/preprocess.yaml
# Copy stats from LibriTTS to VCTK to keep pitch/energy normalization the same shift and bias.
cp preprocessed_data/LibriTTS/stats.json preprocessed_data/VCTK/
python3 preprocess.py config/VCTK/preprocess.yaml
To train the model(s) in the paper, run this command:
python3 train.py -a <algorithm>
Available algorithms:
- base_emb_vad, base_emb_va, base_emb_d, base_emb
- Baseline with embedding table.
- meta_emb_vad, meta_emb_va, meta_emb_d, meta_emb
- Meta-TTS with embedding table.
- base_emb1_vad, base_emb1_va, base_emb1_d, base_emb1
- Baseline with shared embedding.
- meta_emb1_vad, meta_emb1_va, meta_emb1_d, meta_emb1
- Meta-TTS with shared embedding.
(*_vad: fine-tune embedding + variance adaptor + decoder)
(*_va: fine-tune embedding + variance adaptor)
(*_d: fine-tune embedding + decoder)
(without *_vad/*_va/*_d: fine-tune embedding only)
To evaluate my model on ImageNet, run:
python eval.py --model-file mymodel.pth --benchmark imagenet
📋 Describe how to evaluate the trained models on benchmarks reported in the paper, give commands that produce the results (section below).
You can download pretrained models here:
- My awesome model trained on ImageNet using parameters x,y,z.
📋 Give a link to where/how the pretrained models can be downloaded and how they were trained (if applicable). Alternatively you can have an additional column in your results table with a link to the models.
Our model achieves the following performance on :
Model name | Top 1 Accuracy | Top 5 Accuracy |
---|---|---|
My awesome model | 85% | 95% |
📋 Include a table of results from your paper, and link back to the leaderboard for clarity and context. If your main result is a figure, include that figure and link to the command or notebook to reproduce it.
📋 Pick a licence and describe how to contribute to your code repository.