Skip to content

Add Qwen3-ASR support with OpenAI-compatible transcriptions endpoint#6

Merged
juntao merged 2 commits intomainfrom
add-asr-support
Jan 30, 2026
Merged

Add Qwen3-ASR support with OpenAI-compatible transcriptions endpoint#6
juntao merged 2 commits intomainfrom
add-asr-support

Conversation

@juntao
Copy link
Member

@juntao juntao commented Jan 30, 2026

Summary

  • Add Qwen3-ASR model support with /v1/audio/transcriptions endpoint following OpenAI API format
  • Rename environment variables for clarity: BASE_MODEL_PATHTTS_BASE_MODEL_PATH, CUSTOMVOICE_MODEL_PATHTTS_CUSTOMVOICE_MODEL_PATH
  • Rename Docker image from qwen-tts-api to qwen3-audio-api
  • Add ffmpeg dependency for audio format conversion
  • Fix nagisa SyntaxWarning and prevent ruff download at container startup
  • Add Phase 4 (ASR-only) and Phase 5 (TTS→ASR round-trip) tests to TEST_PLAN.md and CI

Test plan

  • Build Docker image with docker build -t qwen3-audio-api .
  • Run container with TTS and ASR models mounted
  • Test TTS endpoint: curl -X POST http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{"input":"Hello world","voice":"Vivian"}' --output test.wav
  • Test ASR endpoint: curl -X POST http://localhost:8000/v1/audio/transcriptions -F file=@test.wav
  • Verify round-trip: TTS output text should approximately match ASR transcription
  • Verify no nagisa SyntaxWarning on startup
  • Verify no ruff download on container start

🤖 Generated with Claude Code

- Add /v1/audio/transcriptions endpoint for speech-to-text
- Support Qwen3-ASR-0.6B and Qwen3-ASR-1.7B models
- Add ASR_MODEL_PATH environment variable
- Rename env vars: CUSTOMVOICE_MODEL_PATH -> TTS_CUSTOMVOICE_MODEL_PATH,
  BASE_MODEL_PATH -> TTS_BASE_MODEL_PATH
- Update Docker image name to qwen3-audio-api
- Add ffmpeg dependency for audio format conversion
- Fix Docker CMD to use venv python directly (avoid uv sync on start)
- Suppress nagisa SyntaxWarning via PYTHONWARNINGS env var
- Add CI tests for ASR (Phase 4) and TTS+ASR round-trip (Phase 5)
- Update documentation for ASR usage

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Qwen3-ASR (Automatic Speech Recognition) support to the existing Qwen3-TTS API server, transforming it into a comprehensive audio processing API that provides both text-to-speech and speech-to-text capabilities through OpenAI-compatible endpoints.

Changes:

  • Implements /v1/audio/transcriptions endpoint following OpenAI API format for speech-to-text functionality
  • Renames environment variables for clarity (BASE_MODEL_PATH → TTS_BASE_MODEL_PATH, CUSTOMVOICE_MODEL_PATH → TTS_CUSTOMVOICE_MODEL_PATH) and adds ASR_MODEL_PATH
  • Adds new dependencies (qwen-asr, python-multipart, av, nagisa, soynlp) and includes ffmpeg for audio format conversion
  • Updates Docker configuration to suppress nagisa SyntaxWarning and prevent ruff download at container startup
  • Expands CI test coverage with Phase 4 (ASR-only) and Phase 5 (TTS→ASR round-trip) tests

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/uv.lock Adds dependencies for ASR support (qwen-asr, av, nagisa, etc.) and pins transformers to 4.57.6
python/pyproject.toml Updates project description and adds qwen-asr, python-multipart dependencies with transformers override
python/main.py Implements ASR endpoint with audio conversion helpers and model loading logic
python/TEST_PLAN.md Adds Phase 4 (ASR-only) and Phase 5 (TTS→ASR round-trip) test scenarios
python/README.md Documents new ASR features, endpoints, and usage examples
python/Dockerfile.cuda Adds PYTHONWARNINGS environment variable and changes CMD to use venv python directly
python/Dockerfile Same Docker improvements as CUDA version
README.md Updates project description to include ASR capabilities
.github/workflows/ci.yml Adds ffmpeg installation, ASR model downloads, and new test phases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +486 to +488
prompt: str | None = Form(default=None),
response_format: str = Form(default="json"),
temperature: float = Form(default=0.0),
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters 'prompt' and 'temperature' are accepted in the create_transcription function but are not used in the implementation. While this is mentioned in the inline comment as "not currently used", accepting parameters without using them can be confusing for API consumers. Consider either implementing support for these parameters or documenting in the API reference that they are accepted for OpenAI compatibility but currently ignored by the Qwen3-ASR model.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@juntao juntao merged commit 74600b3 into main Jan 30, 2026
2 checks passed
@juntao juntao deleted the add-asr-support branch January 30, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant