AlexTTS is a passion project— a text-to-speech model built from the ground up. I wanted to understand, at a low level, what an end-to-end deep learning project really means. While I've had some experience with autoregressive text generation models, it was primarily limited to fine-tuning pre-existing architectures. I had the privilege of speaking with Eli, an ML researcher at Cartesia. Our conversation left me with a single, compelling thought:
Why not build my own unique text-to-speech model?
See my blog for more information! My files are stored in apps/tts.
This repository is forked from Meta Lingua, a minimal and fast LLM training and inference library designed for research. A huge thanks to:
Mathurin Videau*, Badr Youbi Idrissi*, Daniel Haziza, Luca Wehrstedt, Jade Copet, Olivier Teytaud, David Lopez-Paz. *Equal and main contribution
Meta Lingua and AlexTTS are licensed under BSD-3-Clause license. Refer to the LICENSE file in the top level directory.