Add Qwen3-ASR support with OpenAI-compatible transcriptions endpoint by juntao · Pull Request #6 · second-state/qwen3_audio_api

juntao · 2026-01-30T07:02:30Z

Summary

Add Qwen3-ASR model support with /v1/audio/transcriptions endpoint following OpenAI API format
Rename environment variables for clarity: BASE_MODEL_PATH → TTS_BASE_MODEL_PATH, CUSTOMVOICE_MODEL_PATH → TTS_CUSTOMVOICE_MODEL_PATH
Rename Docker image from qwen-tts-api to qwen3-audio-api
Add ffmpeg dependency for audio format conversion
Fix nagisa SyntaxWarning and prevent ruff download at container startup
Add Phase 4 (ASR-only) and Phase 5 (TTS→ASR round-trip) tests to TEST_PLAN.md and CI

Test plan

Build Docker image with docker build -t qwen3-audio-api .
Run container with TTS and ASR models mounted
Test TTS endpoint: curl -X POST http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{"input":"Hello world","voice":"Vivian"}' --output test.wav
Test ASR endpoint: curl -X POST http://localhost:8000/v1/audio/transcriptions -F file=@test.wav
Verify round-trip: TTS output text should approximately match ASR transcription
Verify no nagisa SyntaxWarning on startup
Verify no ruff download on container start

🤖 Generated with Claude Code

- Add /v1/audio/transcriptions endpoint for speech-to-text - Support Qwen3-ASR-0.6B and Qwen3-ASR-1.7B models - Add ASR_MODEL_PATH environment variable - Rename env vars: CUSTOMVOICE_MODEL_PATH -> TTS_CUSTOMVOICE_MODEL_PATH, BASE_MODEL_PATH -> TTS_BASE_MODEL_PATH - Update Docker image name to qwen3-audio-api - Add ffmpeg dependency for audio format conversion - Fix Docker CMD to use venv python directly (avoid uv sync on start) - Suppress nagisa SyntaxWarning via PYTHONWARNINGS env var - Add CI tests for ASR (Phase 4) and TTS+ASR round-trip (Phase 5) - Update documentation for ASR usage Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds Qwen3-ASR (Automatic Speech Recognition) support to the existing Qwen3-TTS API server, transforming it into a comprehensive audio processing API that provides both text-to-speech and speech-to-text capabilities through OpenAI-compatible endpoints.

Changes:

Implements /v1/audio/transcriptions endpoint following OpenAI API format for speech-to-text functionality
Renames environment variables for clarity (BASE_MODEL_PATH → TTS_BASE_MODEL_PATH, CUSTOMVOICE_MODEL_PATH → TTS_CUSTOMVOICE_MODEL_PATH) and adds ASR_MODEL_PATH
Adds new dependencies (qwen-asr, python-multipart, av, nagisa, soynlp) and includes ffmpeg for audio format conversion
Updates Docker configuration to suppress nagisa SyntaxWarning and prevent ruff download at container startup
Expands CI test coverage with Phase 4 (ASR-only) and Phase 5 (TTS→ASR round-trip) tests

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
python/uv.lock	Adds dependencies for ASR support (qwen-asr, av, nagisa, etc.) and pins transformers to 4.57.6
python/pyproject.toml	Updates project description and adds qwen-asr, python-multipart dependencies with transformers override
python/main.py	Implements ASR endpoint with audio conversion helpers and model loading logic
python/TEST_PLAN.md	Adds Phase 4 (ASR-only) and Phase 5 (TTS→ASR round-trip) test scenarios
python/README.md	Documents new ASR features, endpoints, and usage examples
python/Dockerfile.cuda	Adds PYTHONWARNINGS environment variable and changes CMD to use venv python directly
python/Dockerfile	Same Docker improvements as CUDA version
README.md	Updates project description to include ASR capabilities
.github/workflows/ci.yml	Adds ffmpeg installation, ASR model downloads, and new test phases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/main.py

Copilot · 2026-01-30T07:09:57Z

python/main.py

+    prompt: str | None = Form(default=None),
+    response_format: str = Form(default="json"),
+    temperature: float = Form(default=0.0),


The parameters 'prompt' and 'temperature' are accepted in the create_transcription function but are not used in the implementation. While this is mentioned in the inline comment as "not currently used", accepting parameters without using them can be confusing for API consumers. Consider either implementing support for these parameters or documenting in the API reference that they are accepted for OpenAI compatibility but currently ignored by the Qwen3-ASR model.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

juntao requested a review from Copilot January 30, 2026 07:04

Copilot started reviewing on behalf of juntao January 30, 2026 07:05 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Update python/main.py

9f9420a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

juntao merged commit 74600b3 into main Jan 30, 2026
2 checks passed

juntao deleted the add-asr-support branch January 30, 2026 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-ASR support with OpenAI-compatible transcriptions endpoint#6

Add Qwen3-ASR support with OpenAI-compatible transcriptions endpoint#6
juntao merged 2 commits intomainfrom
add-asr-support

juntao commented Jan 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

juntao commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

juntao commented Jan 30, 2026 •

edited

Loading