Skip to content

Fix ACE-Step 1.5: max_tokens typo and lyrics embedding truncation#12529

Open
fishmongr wants to merge 1 commit intoComfy-Org:masterfrom
fishmongr:fix/ace15-max-tokens-and-lyrics
Open

Fix ACE-Step 1.5: max_tokens typo and lyrics embedding truncation#12529
fishmongr wants to merge 1 commit intoComfy-Org:masterfrom
fishmongr:fix/ace15-max-tokens-and-lyrics

Conversation

@fishmongr
Copy link

Summary

Two bugs in comfy/text_encoders/ace15.py encode_token_weights():

  • max_tokens receives min_tokens value: max_tokens=lm_metadata["min_tokens"] should be max_tokens=lm_metadata["max_tokens"]. This causes the LM to always generate minimum-length audio codes regardless of the requested duration.
  • lyrics_embeds[:, 0] truncates lyric sequence: Only the first token embedding was passed to the diffusion model's lyric encoder instead of the full sequence. This effectively discards all lyrics conditioning beyond the first token.

Test plan

  • Generate music with lyrics and verify vocal quality improvement (full lyric sequence now reaches the diffusion model)
  • Generate tracks of different durations (e.g. 30s vs 120s) and verify the output length matches the request (max_tokens fix)
  • Generate music without lyrics to verify no regression

🤖 Generated with Claude Code

Two bugs in ace15.py encode_token_weights():

1. max_tokens parameter received min_tokens value:
   `max_tokens=lm_metadata["min_tokens"]` → `max_tokens=lm_metadata["max_tokens"]`
   This caused the LM to always generate minimum-length audio codes
   regardless of the requested duration.

2. lyrics_embeds[:, 0] discarded the full lyric sequence, passing only
   the first token embedding to the diffusion model's lyric encoder.
   Changed to pass the full lyrics_embeds tensor for proper lyrics
   conditioning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant