VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
-
Updated
Jun 10, 2026 - Python
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
TTS-Story is a web-based multi‑voice TTS studio for turning tagged scripts into audiobooks—featuring full speaker management, chunk review/regeneration, a job queue and library system, and local GPU or API backends including Kokoro, Chatterbox, VOX CPM, Pocket-TTS, Kitten-TTS, IndexTTS-2, QWEN3 TTS and Omnivoice engines
Talk to 峰哥 — 克隆任何人的声音和性格,实时语音对话,工程延迟 < 1 秒 | Clone anyone's voice & personality for real-time conversation. < 1s engineering latency.
A clean, efficient ComfyUI custom node for VoxCPM TTS (Text-to-Speech) functionality. This implementation provides high-quality speech generation and voice cloning capabilities using the VoxCPM 1.5 model.
DubCue is a local AI dubbing director built on VoxCPM2, with semantic segmentation, editable direction, voice continuity, and a Tauri desktop workspace.
Build voice apps fast. Unified API for speech recognition & synthesis with streaming, WebSocket, and multi-engine support.
Standalone C++ inference project for VoxCPM models built on top of ggml with API Frontend
Multi-engine TTS server (Qwen3-TTS + VoxCPM2): MLX & PyTorch backends, voice cloning, incremental PCM streaming, one-model-in-VRAM manager with idle eviction.
One-click Pinokio launcher for VoxCPM2 Portable. Multilingual TTS (30 languages), Voice Design, Voice Cloning, end-to-end LoRA fine-tuning. Cross-platform (Win/Linux/macOS, NVIDIA/AMD/CPU).
One-click Pinokio installer for VoxCPM2 with voice cloning api and prompt memory.
⚡ A talking desktop Pikachu that runs 100% on-device — MiniCPM5-1B brain + VoxCPM voice + Nemotron ears, every model ≤1B params. No cloud, works with the Wi-Fi off.
Production-ready multi-agent framework with VoxCPM orchestration, Ollama local LLMs, and Streamlit UI.
Voice cloning software for multi-person dialogue synthesis.
Generate high quality synthetic TTS training audio from text datasets
Reverse-Turing webcam interrogation game. An AI interrogator (A.M.N.) uses webcam, microphone, pulse detection, and micro-expression analysis to determine if you're human.
Add a description, image, and links to the voxcpm topic page so that developers can more easily learn about it.
To associate your repository with the voxcpm topic, visit your repo's landing page and select "manage topics."