Try the beta with the VoxCPM1.5 model !!
uv pip install -U --pre voxcpmane
VoxCPM TTS model with Apple Neural Engine (ANE) backend server. CoreML models available in Huggingface repository.
- 🎤 Voice Cloning: Support for custom voice prompts and cached voices
- 📡 Streaming Support: Real-time audio streaming for low latency
- 🎧 Server-side Playback: Direct audio playback on the server
- 🌐 Web Interface: Interactive playground for testing
VoiceCloning.mp4
Included Voices Listen samples
IncludedVoices.mp4
- macOS with Apple Silicon for ANE acceleration
- Python 3.9-3.12
- uv package manager (recommended)
pydubrequired for audio formats other thanwavin/speechendpoint
uv pip install voxcpmanepip install voxcpmaneThe server will start on http://localhost:8000 by default. You can access the web playground at the root URL.
uv run voxcpmane-server --help--host: Host to bind the server to (default:0.0.0.0)--port: Port to run the server on (default:8000)--cache-dir: Directory for custom voice caches (default:~/.cache/ane_tts)
You can create reusable cached voices in two ways:
- Via the Web Playground/API: Use the "Create Voice" tab or
POST /v1/voicesendpoint. - Startup Compilation: Place pairs of audio files (e.g.,
.wav,.mp3) and transcriptions (.txt) in the custom cache directory. The server will automatically compile them into voice caches (.npy) on startup.
Example:
If you place myvoice.mp3 and myvoice.txt in the cache directory, the server will generate myvoice.npy on start, making "myvoice" available for generation.
The full API documentation is available in docs/API.md.
- Added support for creation of custom voices
- Automatic prompt caching
- Chunked long audio generation
- Custom voices
- VoxCPM - Original TTS model