feat(tts): add Qwen3-TTS as third TTS backend by basnijholt · Pull Request #311 · basnijholt/agent-cli

basnijholt · 2026-01-25T13:15:08Z

Summary

Add Qwen3-TTS as a third TTS backend for the TTS server, alongside Kokoro and Piper
Qwen3-TTS is a multilingual neural TTS model with 1.7B parameters supporting 10+ languages
Uses subprocess isolation pattern (like Kokoro) for clean memory management
OpenAI-compatible API with voice name mapping (alloy, echo, etc. → Qwen speakers)

Features

Multilingual: Auto-detects language or supports explicit language selection
Multiple voices: 6 default voices (Vivian, Ryan, Serena, etc.) plus custom voice cloning
GPU acceleration: Supports CUDA, MPS (Apple Silicon), and CPU fallback
OpenAI compatible: Works with existing OpenAI TTS clients

Test plan

Run pre-commit hooks
Test TTS server startup with --backend qwen
Test synthesis via curl to OpenAI-compatible endpoint
Verify voice mapping (alloy → Vivian)
Verify audio output (MP3 format, correct duration)

Add Qwen3-TTS as a new TTS backend option alongside Kokoro and Piper. Features: - High-quality multilingual neural TTS with 10+ language support - 9 built-in voices (Vivian, Ryan, Serena, Dylan, Eric, Aiden, etc.) - OpenAI voice name mapping (alloy, echo, fable, etc. → Qwen speakers) - Subprocess isolation for clean memory management on TTL unload - GPU acceleration (CUDA/MPS) with CPU fallback Usage: pip install "agent-cli[qwen-tts]" agent-cli server tts --backend qwen The model (~3GB) auto-downloads from HuggingFace on first use.

- Include qwen-tts in @requires_extras decorator so users can install only qwen-tts without piper or kokoro - Remove unused QWEN_SAMPLE_RATE constant (sample rate comes from model) - Add TestQwenBackend test class with 7 tests covering initialization, voice mapping, language support, and streaming status - Add test_create_qwen_backend to TestBackendFactory - Regenerate auto-generated docs to show qwen in --backend options

- Pass cache_dir to Qwen3TTSModel.from_pretrained() so --cache-dir works - Remove no-op try/except around flash attention dict assignment - Remove unused SUPPORTED_LANGUAGES list and associated test

…on-proxy (#313)

The Qwen backend gets sample rate dynamically from model output, so this constant was never used.

basnijholt · 2026-01-25T13:18:54Z

agent_cli/server/cli.py

+    if backend == "qwen":
+        from huggingface_hub import snapshot_download  # noqa: PLC0415
+
+        from agent_cli.server.tts.backends.base import (  # noqa: PLC0415


First party imports at top

basnijholt · 2026-01-25T13:20:11Z

agent_cli/server/tts/backends/qwen.py

+    audio_int16 = (audio * 32767).astype(np.int16)
+
+    buffer = io.BytesIO()
+    with wave.open(buffer, "wb") as wav:


There are helpers

basnijholt · 2026-01-25T14:12:13Z

agent_cli/_extras.json

-  "vad": ["Voice Activity Detection (silero-vad)", ["silero_vad"]],
-  "faster-whisper": ["Whisper ASR (CUDA/CPU)", ["faster_whisper"]],
-  "mlx-whisper": ["Whisper ASR (Apple Silicon)", ["mlx_whisper"]]
+  "audio": [


Revert formatting

- Use existing pcm_to_wav helper instead of duplicating WAV conversion - Remove redundant librosa/soundfile deps (already transitive from qwen-tts) - Restore compact JSON format for _extras.json - Remove unused io/wave imports from qwen.py

Restore original format for _extras.json - just add qwen-tts entry without changing existing entries or ordering.

Adds torch.compile() with reduce-overhead mode for 20-30% faster inference on CUDA. First few inferences will be slower due to JIT compilation warmup. Based on optimizations from Qwen3-TTS-Openai-Fastapi repo.

Automatically enables Flash Attention 2 if flash-attn package is installed, providing ~10% additional speedup on top of torch.compile(). Install with: pip install flash-attn

basnijholt and others added 8 commits January 25, 2026 05:13

Merge 3cfcfa3 into cb4293c

5e77f44

Update auto-generated docs

7b69915

fix(tts): address PR review feedback for Qwen backend

58f14a0

- Pass cache_dir to Qwen3TTSModel.from_pretrained() so --cache-dir works - Remove no-op try/except around flash attention dict assignment - Remove unused SUPPORTED_LANGUAGES list and associated test

fix(docker): use COPY --chmod to avoid duplicate layer in transcripti…

eeb2f5a

…on-proxy (#313)

fix(tts): remove unused QWEN_DEFAULT_SAMPLE_RATE constant

6c04db2

The Qwen backend gets sample rate dynamically from model output, so this constant was never used.

docs(skill): emphasize --from flag for branch-based work (#312)

7a8c37e

basnijholt commented Jan 25, 2026

View reviewed changes

basnijholt and others added 10 commits January 25, 2026 06:17

fix(tts): preserve original _extras.json format and ordering

d44164e

Restore original format for _extras.json - just add qwen-tts entry without changing existing entries or ordering.

Merge d44164e into 7a8c37e

f49828e

Update auto-generated docs

782faaa

fix(tts): correct Qwen model size to ~4GB

c7a816f

fix(tts): move first-party imports to module level in cli.py

cf357c3

fix(tts): move pcm_to_wav import to module level

0f91e53

fix(tts): remove flash_attention_2 requirement (not always available)

2bc9b08

perf(tts): add torch.compile() optimization for Qwen backend

3c5eab9

Adds torch.compile() with reduce-overhead mode for 20-30% faster inference on CUDA. First few inferences will be slower due to JIT compilation warmup. Based on optimizations from Qwen3-TTS-Openai-Fastapi repo.

perf(tts): add Flash Attention 2 support for Qwen backend

7fcfefa

Automatically enables Flash Attention 2 if flash-attn package is installed, providing ~10% additional speedup on top of torch.compile(). Install with: pip install flash-attn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): add Qwen3-TTS as third TTS backend#311

feat(tts): add Qwen3-TTS as third TTS backend#311
basnijholt wants to merge 18 commits intomainfrom
feat/qwen-tts-backend

basnijholt commented Jan 25, 2026

Uh oh!

basnijholt Jan 25, 2026

Uh oh!

basnijholt Jan 25, 2026

Uh oh!

basnijholt Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

basnijholt commented Jan 25, 2026

Summary

Features

Test plan

Uh oh!

basnijholt Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

basnijholt Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

basnijholt Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant