Skip to content

Your words, any voice. Voice cloning and text-to-speech with multiple TTS engines. Clone any voice from a short audio sample and generate speech in that voice.

License

Notifications You must be signed in to change notification settings

luongnv89/voice-cast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceCast

Your words, any voice.

Voice cloning and text-to-speech with multiple TTS engines. Clone any voice from a short audio sample and generate speech in that voice.

VoiceCast GUI

Features

  • Voice Cloning - Clone voices from 5-30 second audio samples
  • Multiple Engines - Coqui XTTS v2 (multilingual) and Chatterbox (fast/expressive)
  • 16 Languages - English, Spanish, French, German, Chinese, Japanese, and more
  • Three Interfaces - GUI application, CLI tool, and Python API
  • Expressive Speech - Paralinguistic tags for laughs, sighs, gasps (Chatterbox Turbo)

Quick Start

Installation

# Clone repository
git clone https://github.com/luongnv89/voice-cast.git
cd voicecast

# Create virtual environment
python3.10 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -e .

# Optional: Chatterbox engine
pip install -e ".[chatterbox]"

Usage

GUI Application:

python voice_cloning_app.py

Command Line:

python vcloner.py -i voice.wav -t "Hello world" -o output.wav

Python API:

from voice_cloner import VoiceCloner

cloner = VoiceCloner(speaker_wav="./voice-samples/speaker.wav")
cloner.say("Hello, this is my cloned voice!", save_audio=True, output_file="output.wav")

TTS Engines

Engine Languages Speed Best For
Coqui XTTS v2 16 Medium Multilingual, quality
Chatterbox Turbo English Fast Rapid iteration, expressions
Chatterbox Standard English Medium Production quality

Expressive speech with Chatterbox Turbo:

cloner.say("That's hilarious [laugh]! I can't believe it [gasp]!")

Tags: [laugh], [chuckle], [cough], [sigh], [gasp], [yawn]

Documentation

Document Description
API Reference Complete Python API documentation
CLI Reference Command-line interface guide
GUI Guide Desktop application user manual
Engines Guide TTS engine comparison and parameters
Architecture System design and patterns
Development Contributing and setup guide
Troubleshooting Common issues and solutions

System Requirements

  • Python 3.10+
  • 8GB RAM (16GB recommended)
  • NVIDIA GPU with CUDA (optional, for faster processing)

License

MIT License - see LICENSE file.

Acknowledgments

About

Your words, any voice. Voice cloning and text-to-speech with multiple TTS engines. Clone any voice from a short audio sample and generate speech in that voice.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published