Skip to content

Add voice cloning support via Base model#5

Merged
juntao merged 8 commits intomainfrom
add-voice-cloning
Jan 28, 2026
Merged

Add voice cloning support via Base model#5
juntao merged 8 commits intomainfrom
add-voice-cloning

Conversation

@juntao
Copy link
Member

@juntao juntao commented Jan 27, 2026

Summary

  • Add support for loading both CustomVoice and Base model families
  • Enable voice cloning from reference audio via audio_sample and audio_sample_text request parameters
  • Require at least one of CUSTOMVOICE_MODEL_PATH or BASE_MODEL_PATH at startup
  • Add GitHub Actions CI workflow

Test plan

  • Verify server starts with only CUSTOMVOICE_MODEL_PATH set
  • Verify server starts with only BASE_MODEL_PATH set
  • Verify server starts with both model paths set
  • Verify server fails to start with neither model path set
  • Test speech generation with a predefined voice (CustomVoice model)
  • Test voice cloning with audio_sample parameter (Base model)
  • Verify HTTP 400 when requesting voice cloning without Base model loaded
  • Verify HTTP 400 when requesting predefined voice without CustomVoice model loaded

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

juntao and others added 4 commits January 27, 2026 23:49
Support loading both CustomVoice and Base model families. The Base model
enables voice cloning from a reference audio sample via the new
audio_sample and audio_sample_text request parameters. At least one of
CUSTOMVOICE_MODEL_PATH or BASE_MODEL_PATH must be set at startup.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Add --project python to uv commands and use python/main.py path
so the workflow runs correctly from the repository root.

Co-Authored-By: Claude <noreply@anthropic.com>
Superseded by ci.yml.

Co-Authored-By: Claude <noreply@anthropic.com>
The audio_sample field now expects base64-encoded audio data so that
clients can talk to a remote server. The underlying model already
supports base64 strings natively.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds voice cloning support by allowing the server to load and route requests between CustomVoice and Base Qwen3-TTS model families, and updates documentation/CI accordingly.

Changes:

  • Load CustomVoice and/or Base models at startup via CUSTOMVOICE_MODEL_PATH and BASE_MODEL_PATH (requires at least one).
  • Add voice cloning request parameters (audio_sample, audio_sample_text) and route /v1/audio/speech accordingly.
  • Replace the prior GitHub Actions workflow with a multi-phase integration CI workflow that uploads generated audio artifacts.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
python/main.py Adds dual-model loading and request routing for voice cloning vs. preset voices.
python/README.md Documents model families, new env vars, and new request parameters with examples.
python/TEST_PLAN.md Adds an integration test plan describing the 3 server configurations and expected behaviors.
python/EXAMPLE_CC_SESSION.md Adds a transcript-style session log describing how changes were produced/tested.
.github/workflows/test-tts-api.yml Removes the previous single-scenario API workflow.
.github/workflows/ci.yml Adds a new multi-phase CI workflow that downloads models, runs integration calls, and uploads wav artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3 to +6
on:
push:
pull_request:

Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow now runs on every push/PR (no branch or path filtering). Given it downloads multi-GB model weights and does long-running inference, it should be scoped (e.g., to main and/or python/**) to avoid expensive CI runs on unrelated changes.

Copilot uses AI. Check for mistakes.

- uses: astral-sh/setup-uv@v6
with:
version: "latest"
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup-uv is configured without a pinned Python version. Since ubuntu-latest can change Python versions over time, explicitly set python-version: "3.12" (matching the project requirement) to prevent CI breakage when the runner image updates.

Suggested change
version: "latest"
version: "latest"
python-version: "3.12"

Copilot uses AI. Check for mistakes.
version: "latest"

- name: Install dependencies
run: uv sync --project python
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI installs dependencies with uv sync --project python without --frozen (and without excluding dev deps). For reproducible builds and faster CI, consider using uv sync --project python --frozen --no-dev so the workflow fails if uv.lock/pyproject.toml drift and avoids installing unnecessary packages.

Suggested change
run: uv sync --project python
run: uv sync --project python --frozen --no-dev

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 7
Me

```
Review the README.md file. Install dependencies. Download the 0.6B CustomVoice model (exclude it from git).
Start an API server. Make API requests to generate chinese and english audio files (also exclude the WAV files
from git).
```
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file appears to be a raw Claude Code session transcript (and includes outdated guidance like MODEL_PATH=...). It’s likely not suitable to keep in-repo; consider removing it or converting the relevant parts into durable documentation (e.g., README/TEST_PLAN) instead of a chat log.

Copilot uses AI. Check for mistakes.
python/main.py Outdated
Comment on lines 290 to 306
if request.audio_sample:
base_model: Qwen3TTSModel | None = app.state.base_model
if base_model is None:
raise HTTPException(
status_code=400,
detail=(
"audio_sample requires a base model. "
"Set BASE_MODEL_PATH to enable voice cloning."
),
)
use_icl = request.audio_sample_text is not None
with _inference_lock:
wavs, sr = base_model.generate_voice_clone(
text=request.input,
language=request.language,
ref_audio=request.audio_sample,
ref_text=request.audio_sample_text,
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audio_sample_text is accepted but will be silently ignored when audio_sample is omitted (request routes to CustomVoice branch). Also, if request.audio_sample: treats an empty string as "not provided". Consider validating the parameter combination explicitly (e.g., reject audio_sample_text without audio_sample, and enforce non-empty audio_sample/audio_sample_text when provided) so clients get a clear 400 instead of unexpected routing.

Copilot uses AI. Check for mistakes.
juntao and others added 3 commits January 28, 2026 00:22
Pipe base64 output into jq to build the JSON payload in a file,
then pass it to curl via -d @file. This avoids expanding the
large base64 string as a shell argument.

Co-Authored-By: Claude <noreply@anthropic.com>
The endpoint now handles both JSON and multipart/form-data. Use
multipart with curl -F to upload audio_sample as a binary file,
avoiding base64 encoding and shell argument length limits.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@juntao juntao merged commit cd34d97 into main Jan 28, 2026
2 checks passed
@juntao juntao deleted the add-voice-cloning branch January 28, 2026 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant