VoiceCast

Your words, any voice.

Voice cloning and text-to-speech with multiple TTS engines. Clone any voice from a short audio sample and generate speech in that voice.

Features

Voice Cloning - Clone voices from 5-30 second audio samples
Multiple Engines - Coqui XTTS v2 (multilingual) and Chatterbox (fast/expressive)
16 Languages - English, Spanish, French, German, Chinese, Japanese, and more
Three Interfaces - GUI application, CLI tool, and Python API
Expressive Speech - Paralinguistic tags for laughs, sighs, gasps (Chatterbox Turbo)

Quick Start

Installation

# Clone repository
git clone https://github.com/luongnv89/voice-cast.git
cd voicecast

# Create virtual environment
python3.10 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -e .

# Optional: Chatterbox engine
pip install -e ".[chatterbox]"

Usage

GUI Application:

python voice_cloning_app.py

Command Line:

python vcloner.py -i voice.wav -t "Hello world" -o output.wav

Python API:

from voice_cloner import VoiceCloner

cloner = VoiceCloner(speaker_wav="./voice-samples/speaker.wav")
cloner.say("Hello, this is my cloned voice!", save_audio=True, output_file="output.wav")

TTS Engines

Engine	Languages	Speed	Best For
Coqui XTTS v2	16	Medium	Multilingual, quality
Chatterbox Turbo	English	Fast	Rapid iteration, expressions
Chatterbox Standard	English	Medium	Production quality

Expressive speech with Chatterbox Turbo:

cloner.say("That's hilarious [laugh]! I can't believe it [gasp]!")

Tags: [laugh], [chuckle], [cough], [sigh], [gasp], [yawn]

Documentation

Document	Description
API Reference	Complete Python API documentation
CLI Reference	Command-line interface guide
GUI Guide	Desktop application user manual
Engines Guide	TTS engine comparison and parameters
Architecture	System design and patterns
Development	Contributing and setup guide
Troubleshooting	Common issues and solutions

System Requirements

Python 3.10+
8GB RAM (16GB recommended)
NVIDIA GPU with CUDA (optional, for faster processing)

License

MIT License - see LICENSE file.

Acknowledgments

Coqui TTS - XTTS v2 model
Chatterbox - Fast TTS by Resemble AI
PyTorch - Deep learning framework

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
docs		docs
engines		engines
examples		examples
gui		gui
openspec/changes/integrate-chatterbox-tts		openspec/changes/integrate-chatterbox-tts
output-examples		output-examples
tests		tests
voice-samples		voice-samples
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
convert_to_standard_wav.sh		convert_to_standard_wav.sh
icon.jpg		icon.jpg
pyproject.toml		pyproject.toml
tts_engine_base.py		tts_engine_base.py
tts_factory.py		tts_factory.py
vcloner.py		vcloner.py
voice_cloner.py		voice_cloner.py
voice_cloning_app.py		voice_cloning_app.py
voice_cloning_app.spec		voice_cloning_app.spec
voicecast-app.png		voicecast-app.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VoiceCast

Features

Quick Start

Installation

Usage

TTS Engines

Documentation

System Requirements

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

luongnv89/voice-cast

Folders and files

Latest commit

History

Repository files navigation

VoiceCast

Features

Quick Start

Installation

Usage

TTS Engines

Documentation

System Requirements

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages