Your words, any voice.
Voice cloning and text-to-speech with multiple TTS engines. Clone any voice from a short audio sample and generate speech in that voice.
- Voice Cloning - Clone voices from 5-30 second audio samples
- Multiple Engines - Coqui XTTS v2 (multilingual) and Chatterbox (fast/expressive)
- 16 Languages - English, Spanish, French, German, Chinese, Japanese, and more
- Three Interfaces - GUI application, CLI tool, and Python API
- Expressive Speech - Paralinguistic tags for laughs, sighs, gasps (Chatterbox Turbo)
# Clone repository
git clone https://github.com/luongnv89/voice-cast.git
cd voicecast
# Create virtual environment
python3.10 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -e .
# Optional: Chatterbox engine
pip install -e ".[chatterbox]"GUI Application:
python voice_cloning_app.pyCommand Line:
python vcloner.py -i voice.wav -t "Hello world" -o output.wavPython API:
from voice_cloner import VoiceCloner
cloner = VoiceCloner(speaker_wav="./voice-samples/speaker.wav")
cloner.say("Hello, this is my cloned voice!", save_audio=True, output_file="output.wav")| Engine | Languages | Speed | Best For |
|---|---|---|---|
| Coqui XTTS v2 | 16 | Medium | Multilingual, quality |
| Chatterbox Turbo | English | Fast | Rapid iteration, expressions |
| Chatterbox Standard | English | Medium | Production quality |
Expressive speech with Chatterbox Turbo:
cloner.say("That's hilarious [laugh]! I can't believe it [gasp]!")Tags: [laugh], [chuckle], [cough], [sigh], [gasp], [yawn]
| Document | Description |
|---|---|
| API Reference | Complete Python API documentation |
| CLI Reference | Command-line interface guide |
| GUI Guide | Desktop application user manual |
| Engines Guide | TTS engine comparison and parameters |
| Architecture | System design and patterns |
| Development | Contributing and setup guide |
| Troubleshooting | Common issues and solutions |
- Python 3.10+
- 8GB RAM (16GB recommended)
- NVIDIA GPU with CUDA (optional, for faster processing)
MIT License - see LICENSE file.
- Coqui TTS - XTTS v2 model
- Chatterbox - Fast TTS by Resemble AI
- PyTorch - Deep learning framework
