Skip to content

Latest commit

 

History

History
123 lines (88 loc) · 4.53 KB

README.md

File metadata and controls

123 lines (88 loc) · 4.53 KB

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

This repository is the official implementation of "Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech".

📋 Optional: include a graphic explaining your approach/main result, bibtex entry, link to demos, blog posts and tutorials

Requirements

This is how I build my environment, which is not exactly needed to be the same:

  • Sign up for Comet.ml, find out your workspace and API key via www.comet.ml/api/my/settings and fill them in .comet.config. Comet logger is used throughout train/val/test stages.
  • [Optional] Install pyenv for Python version control, change to Python 3.8.6.
# After download and install pyenv:
pyenv install 3.8.6
pyenv local 3.8.6
  • [Optional] Install pyenv-virtualenv as a plugin of pyenv for clean virtual environment.
# After install pyenv-virtualenv
pyenv virtualenv meta-tts
pyenv activate meta-tts
# Install Cython first:
pip install cython

# Then install learn2learn from source:
git clone https://github.com/learnables/learn2learn.git
cd learn2learn
pip install -e .
  • Install requirements:
pip install -r requirements.txt

Proprocessing

First, download LibriTTS and VCTK, then change the paths in config/LibriTTS/preprocess.yaml and config/VCTK/preprocess.yaml, then run

python3 prepare_align.py config/LibriTTS/preprocess.yaml
python3 prepare_align.py config/VCTK/preprocess.yaml

for some preparations.

Alignments of LibriTTS is provided here, and the alignments of VCTK is provided here. You have to unzip the files into preprocessed_data/LibriTTS/TextGrid/ and preprocessed_data/VCTK/TextGrid/.

Then run the preprocessing script:

python3 preprocess.py config/LibriTTS/preprocess.yaml

# Copy stats from LibriTTS to VCTK to keep pitch/energy normalization the same shift and bias.
cp preprocessed_data/LibriTTS/stats.json preprocessed_data/VCTK/

python3 preprocess.py config/VCTK/preprocess.yaml

Training

To train the model(s) in the paper, run this command:

python3 train.py -a <algorithm>

Available algorithms:

  • base_emb_vad, base_emb_va, base_emb_d, base_emb
    • Baseline with embedding table.
  • meta_emb_vad, meta_emb_va, meta_emb_d, meta_emb
    • Meta-TTS with embedding table.
  • base_emb1_vad, base_emb1_va, base_emb1_d, base_emb1
    • Baseline with shared embedding.
  • meta_emb1_vad, meta_emb1_va, meta_emb1_d, meta_emb1
    • Meta-TTS with shared embedding.

(*_vad: fine-tune embedding + variance adaptor + decoder)

(*_va: fine-tune embedding + variance adaptor)

(*_d: fine-tune embedding + decoder)

(without *_vad/*_va/*_d: fine-tune embedding only)

Evaluation

To evaluate my model on ImageNet, run:

python eval.py --model-file mymodel.pth --benchmark imagenet

📋 Describe how to evaluate the trained models on benchmarks reported in the paper, give commands that produce the results (section below).

Pre-trained Models

You can download pretrained models here:

📋 Give a link to where/how the pretrained models can be downloaded and how they were trained (if applicable). Alternatively you can have an additional column in your results table with a link to the models.

Results

Our model achieves the following performance on :

Model name Top 1 Accuracy Top 5 Accuracy
My awesome model 85% 95%

📋 Include a table of results from your paper, and link back to the leaderboard for clarity and context. If your main result is a figure, include that figure and link to the command or notebook to reproduce it.

Contributing

📋 Pick a licence and describe how to contribute to your code repository.