Conversation
Support loading both CustomVoice and Base model families. The Base model enables voice cloning from a reference audio sample via the new audio_sample and audio_sample_text request parameters. At least one of CUSTOMVOICE_MODEL_PATH or BASE_MODEL_PATH must be set at startup. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Add --project python to uv commands and use python/main.py path so the workflow runs correctly from the repository root. Co-Authored-By: Claude <noreply@anthropic.com>
Superseded by ci.yml. Co-Authored-By: Claude <noreply@anthropic.com>
The audio_sample field now expects base64-encoded audio data so that clients can talk to a remote server. The underlying model already supports base64 strings natively. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds voice cloning support by allowing the server to load and route requests between CustomVoice and Base Qwen3-TTS model families, and updates documentation/CI accordingly.
Changes:
- Load CustomVoice and/or Base models at startup via
CUSTOMVOICE_MODEL_PATHandBASE_MODEL_PATH(requires at least one). - Add voice cloning request parameters (
audio_sample,audio_sample_text) and route/v1/audio/speechaccordingly. - Replace the prior GitHub Actions workflow with a multi-phase integration CI workflow that uploads generated audio artifacts.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| python/main.py | Adds dual-model loading and request routing for voice cloning vs. preset voices. |
| python/README.md | Documents model families, new env vars, and new request parameters with examples. |
| python/TEST_PLAN.md | Adds an integration test plan describing the 3 server configurations and expected behaviors. |
| python/EXAMPLE_CC_SESSION.md | Adds a transcript-style session log describing how changes were produced/tested. |
| .github/workflows/test-tts-api.yml | Removes the previous single-scenario API workflow. |
| .github/workflows/ci.yml | Adds a new multi-phase CI workflow that downloads models, runs integration calls, and uploads wav artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| on: | ||
| push: | ||
| pull_request: | ||
|
|
There was a problem hiding this comment.
This workflow now runs on every push/PR (no branch or path filtering). Given it downloads multi-GB model weights and does long-running inference, it should be scoped (e.g., to main and/or python/**) to avoid expensive CI runs on unrelated changes.
|
|
||
| - uses: astral-sh/setup-uv@v6 | ||
| with: | ||
| version: "latest" |
There was a problem hiding this comment.
setup-uv is configured without a pinned Python version. Since ubuntu-latest can change Python versions over time, explicitly set python-version: "3.12" (matching the project requirement) to prevent CI breakage when the runner image updates.
| version: "latest" | |
| version: "latest" | |
| python-version: "3.12" |
| version: "latest" | ||
|
|
||
| - name: Install dependencies | ||
| run: uv sync --project python |
There was a problem hiding this comment.
CI installs dependencies with uv sync --project python without --frozen (and without excluding dev deps). For reproducible builds and faster CI, consider using uv sync --project python --frozen --no-dev so the workflow fails if uv.lock/pyproject.toml drift and avoids installing unnecessary packages.
| run: uv sync --project python | |
| run: uv sync --project python --frozen --no-dev |
python/EXAMPLE_CC_SESSION.md
Outdated
| Me | ||
|
|
||
| ``` | ||
| Review the README.md file. Install dependencies. Download the 0.6B CustomVoice model (exclude it from git). | ||
| Start an API server. Make API requests to generate chinese and english audio files (also exclude the WAV files | ||
| from git). | ||
| ``` |
There was a problem hiding this comment.
This file appears to be a raw Claude Code session transcript (and includes outdated guidance like MODEL_PATH=...). It’s likely not suitable to keep in-repo; consider removing it or converting the relevant parts into durable documentation (e.g., README/TEST_PLAN) instead of a chat log.
python/main.py
Outdated
| if request.audio_sample: | ||
| base_model: Qwen3TTSModel | None = app.state.base_model | ||
| if base_model is None: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=( | ||
| "audio_sample requires a base model. " | ||
| "Set BASE_MODEL_PATH to enable voice cloning." | ||
| ), | ||
| ) | ||
| use_icl = request.audio_sample_text is not None | ||
| with _inference_lock: | ||
| wavs, sr = base_model.generate_voice_clone( | ||
| text=request.input, | ||
| language=request.language, | ||
| ref_audio=request.audio_sample, | ||
| ref_text=request.audio_sample_text, |
There was a problem hiding this comment.
audio_sample_text is accepted but will be silently ignored when audio_sample is omitted (request routes to CustomVoice branch). Also, if request.audio_sample: treats an empty string as "not provided". Consider validating the parameter combination explicitly (e.g., reject audio_sample_text without audio_sample, and enforce non-empty audio_sample/audio_sample_text when provided) so clients get a clear 400 instead of unexpected routing.
Pipe base64 output into jq to build the JSON payload in a file, then pass it to curl via -d @file. This avoids expanding the large base64 string as a shell argument. Co-Authored-By: Claude <noreply@anthropic.com>
The endpoint now handles both JSON and multipart/form-data. Use multipart with curl -F to upload audio_sample as a binary file, avoiding base64 encoding and shell argument length limits. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Summary
audio_sampleandaudio_sample_textrequest parametersCUSTOMVOICE_MODEL_PATHorBASE_MODEL_PATHat startupTest plan
CUSTOMVOICE_MODEL_PATHsetBASE_MODEL_PATHsetaudio_sampleparameter (Base model)🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com