Skip to content

feat(tts): add Qwen3-TTS as third TTS backend#311

Open
basnijholt wants to merge 18 commits intomainfrom
feat/qwen-tts-backend
Open

feat(tts): add Qwen3-TTS as third TTS backend#311
basnijholt wants to merge 18 commits intomainfrom
feat/qwen-tts-backend

Conversation

@basnijholt
Copy link
Owner

Summary

  • Add Qwen3-TTS as a third TTS backend for the TTS server, alongside Kokoro and Piper
  • Qwen3-TTS is a multilingual neural TTS model with 1.7B parameters supporting 10+ languages
  • Uses subprocess isolation pattern (like Kokoro) for clean memory management
  • OpenAI-compatible API with voice name mapping (alloy, echo, etc. → Qwen speakers)

Features

  • Multilingual: Auto-detects language or supports explicit language selection
  • Multiple voices: 6 default voices (Vivian, Ryan, Serena, etc.) plus custom voice cloning
  • GPU acceleration: Supports CUDA, MPS (Apple Silicon), and CPU fallback
  • OpenAI compatible: Works with existing OpenAI TTS clients

Test plan

  • Run pre-commit hooks
  • Test TTS server startup with --backend qwen
  • Test synthesis via curl to OpenAI-compatible endpoint
  • Verify voice mapping (alloy → Vivian)
  • Verify audio output (MP3 format, correct duration)

basnijholt and others added 8 commits January 25, 2026 05:13
Add Qwen3-TTS as a new TTS backend option alongside Kokoro and Piper.

Features:
- High-quality multilingual neural TTS with 10+ language support
- 9 built-in voices (Vivian, Ryan, Serena, Dylan, Eric, Aiden, etc.)
- OpenAI voice name mapping (alloy, echo, fable, etc. → Qwen speakers)
- Subprocess isolation for clean memory management on TTL unload
- GPU acceleration (CUDA/MPS) with CPU fallback

Usage:
  pip install "agent-cli[qwen-tts]"
  agent-cli server tts --backend qwen

The model (~3GB) auto-downloads from HuggingFace on first use.
- Include qwen-tts in @requires_extras decorator so users can install
  only qwen-tts without piper or kokoro
- Remove unused QWEN_SAMPLE_RATE constant (sample rate comes from model)
- Add TestQwenBackend test class with 7 tests covering initialization,
  voice mapping, language support, and streaming status
- Add test_create_qwen_backend to TestBackendFactory
- Regenerate auto-generated docs to show qwen in --backend options
- Pass cache_dir to Qwen3TTSModel.from_pretrained() so --cache-dir works
- Remove no-op try/except around flash attention dict assignment
- Remove unused SUPPORTED_LANGUAGES list and associated test
The Qwen backend gets sample rate dynamically from model output,
so this constant was never used.
if backend == "qwen":
from huggingface_hub import snapshot_download # noqa: PLC0415

from agent_cli.server.tts.backends.base import ( # noqa: PLC0415
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First party imports at top

audio_int16 = (audio * 32767).astype(np.int16)

buffer = io.BytesIO()
with wave.open(buffer, "wb") as wav:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are helpers

"vad": ["Voice Activity Detection (silero-vad)", ["silero_vad"]],
"faster-whisper": ["Whisper ASR (CUDA/CPU)", ["faster_whisper"]],
"mlx-whisper": ["Whisper ASR (Apple Silicon)", ["mlx_whisper"]]
"audio": [
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert formatting

basnijholt and others added 10 commits January 25, 2026 06:17
- Use existing pcm_to_wav helper instead of duplicating WAV conversion
- Remove redundant librosa/soundfile deps (already transitive from qwen-tts)
- Restore compact JSON format for _extras.json
- Remove unused io/wave imports from qwen.py
Restore original format for _extras.json - just add qwen-tts entry
without changing existing entries or ordering.
Adds torch.compile() with reduce-overhead mode for 20-30% faster
inference on CUDA. First few inferences will be slower due to JIT
compilation warmup.

Based on optimizations from Qwen3-TTS-Openai-Fastapi repo.
Automatically enables Flash Attention 2 if flash-attn package is
installed, providing ~10% additional speedup on top of torch.compile().

Install with: pip install flash-attn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant