Closed
Description
These are the main dev plans for 🐸 TTS.
If you want to contribute to 🐸 TTS and don't know where to start you can pick one here and start with our Contribution Guideline. We're also always here to help.
Feel free to pick one or suggest a new one.
Contributions are always welcome 💪 .
v0.1.0 Milestones
- Better model config handling [Discussion] Ideas for better model config management #21
- TTS recipes for public datasets.
- TTS trainer API to unify all the model training scripts.
- TTS, Vocoder and SpeakerEncoder model abstractions and APIs.
- Documentation for
- Implementing a new model using 🐸 TTS.
- Training a model on a new dataset from gecko.
- Using
Synthesizer
interface onCLI
orServer
. - Extracting Spectrograms for Vocoder training.
- Contributing a new pre-trained 🐸 TTS model.
- Explanation for Model config parameters/
v0.2.0 Milestones
- Grapheme 2 Phoneme in-house conversion. (Thx to gruut 👍 )
- Implement VITS model.
v0.3.0 Milestones
- Implement generic ForwardTTS API.
- Implement Fast Speech model.
- Implement Fast Pitch model.
v0.4.0 Milestones
- Trainer API v2 - join the discussion
- Multi-speaker VCTK recipes for all the
TTS.tts
models.
v0.5.0 Milestones
- Support for multi-lingual models
- YourTTS release 🚀
v0.6.0 Milestones
- Add ESpeak support
- New Tokenizer and Phonemizer APIs New Tokenizer API #937
- New Model API Update models (Rebased) #1078
- Splitting the trainer as a separate repo 👟Trainer
- Update VITS model API
- Gradient accumulation. Accumulate grads ( Larger batch size for low gpu memory) #560 (in 👟)
v0.7.0 Milestones
- Implement Capacitron 👑 @a-froghyar 👑 @WeberJulian
- Release pretrained Capacitron
v0.8.0 Milestones
- Separate numpy transforms
- Better data sampling for VITS
- New Thorsten DE models 👑 @thorstenMueller
🏃♀️ Milestones along the way
- Implement End-to-end training API for ForwardTTS models a vocoder. ForwardTTSE2E implementations and related API changes #1510
- Implement a Python voice synthesis API.
- Inject phonemes to the input text at inference. SSML support #1452
- AdaSpeech1/2 https://arxiv.org/pdf/2104.09715 and https://arxiv.org/abs/2103.00993
- Let the user pass a custom text cleaner function.
- Refactor the text cleaners for a more flexible and transparent API.
- Implement HifiGAN2 (not the vocoder)
- Implement emotion and style adaptation.
- Implement FastSpeech2 (https://arxiv.org/abs/2006.04558).
- AutoTTS 🤖 (👑 @loganhart420)
- Watermarking TTS outputs to sign against DeepFakes.
- Implement SSML v0.0.1
- ONNX and TorchScript model exports.
- TensorFlow run-time for training models.
🤖 New TTS models
- AlignTTS (@erogol)
- HiFiGAN (Hifigan Vocoder #16 👑 @rishikksh20 and @erogol)
- UnivNet Vocoder ( 👑 @rishikksh20)
- VITS paper
- FastPitch source
- Alignment Network paper
- End2End TTS combining aligner + tts + vocoder.
- Multi-Lingual TTS (Add multilingual support #11 👑 @WeberJulian )
- ParallelTacotron paper (open for contribution)
- Efficient TTS paper (open for contribution)
- Gaussian length regulator from https://arxiv.org/pdf/2010.04301.pdf (open for contribution)
- LightSpeech from https://arxiv.org/pdf/2102.04040.pdf (open for contribution)
- AdaSpeech1/2 https://arxiv.org/pdf/2104.09715 and https://arxiv.org/abs/2103.00993