tts
Inference and training library for high-quality TTS models.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Faster Whisper transcription with CTranslate2
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
E2E TTS using Conditional Flow Matching (Experimental*)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
A Conversational Speech Generation Model
Instant voice cloning by MIT and MyShell. Audio foundation model.
zero-shot voice conversion & singing voice conversion, with real-time support
TTSFM is a reverse-engineered API server that mirrors OpenAI's TTS service, providing a compatible interface for text-to-speech conversion with multiple voice options.
You can find the speech algorithms you want here
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
🔥🔥 Kokoro in Rust. https://huggingface.co/hexgrad/Kokoro-82M Insanely fast, realtime TTS with high quality you ever have.