[codex] add long-form generation mode by Lee-take · Pull Request #313 · OpenBMB/VoxCPM

Lee-take · 2026-05-23T11:59:10Z

Summary

add VoxCPM.generate_long_form() for long scripts by splitting text into shorter segments
reuse the first generated segment as a fixed continuation prompt when no external reference is provided
keep an explicit external reference audio as the stable reference voice when the caller supplies one
ensure the seed prompt_text matches the spoken transcript, excluding voice-design control text that is not spoken
expose the mode through CLI flags: --long-form, --long-form-max-chars, and --long-form-silence-ms
document Python and CLI usage in the English and Chinese READMEs

Why

Very long single-pass generation can accumulate autoregressive drift. The long-form path keeps each generation pass short and re-anchors later segments with stable prompt context, which is an API/CLI-level mitigation that does not change model weights or decoder internals.

The generated seed segment is no longer duplicated as both reference audio and prompt audio. If no external reference is supplied, the long-form path uses prompt continuation only; if the caller supplies an external reference, it is preserved as reference conditioning. This avoids over-conditioning on the same generated seed audio while still keeping a stable continuation anchor.

The seed prompt transcript must match the seed prompt audio exactly. Voice-design control text guides the first generation pass, but it is not part of the spoken audio; including it in later continuation prompt text can make the continuation conditioning inconsistent and degrade long-form output.

Tests

python -m pytest tests/test_core_long_form.py tests/test_cli.py -q
python -m compileall src tests -q
local installed VoxCPM2 checks: python tests/test_voxcpm_long_form.py and python tests/test_voxcpm2_voice_anchor.py
local long-form smoke run with VoxCPM2 CUDA runtime produced a 108.14s WAV at 48kHz mono using fixed seed-prompt continuation and shorter default segments

Not run: full model-dependent upstream test suite, because this local PR environment only installed lightweight unit-test dependencies and no full torch/model runtime.

Lee-take added 4 commits May 23, 2026 19:58

add long-form generation mode

2fb15cd

stabilize long-form segment anchoring

bddc8a2

fix long-form seed prompt transcript

24c5310

use prompt-only seed for long-form defaults

4225e1d

Lee-take marked this pull request as ready for review May 24, 2026 02:47

Lee-take mentioned this pull request May 25, 2026

Long-form voice cloning drift: prompt/reference conditioning may become self-generated latent conditioning #302

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] add long-form generation mode#313

[codex] add long-form generation mode#313
Lee-take wants to merge 4 commits into
OpenBMB:mainfrom
Lee-take:codex/long-form-generation

Lee-take commented May 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lee-take commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lee-take commented May 23, 2026 •

edited

Loading