Improvements to ACE-Steps 1.5 text encoding#12283
Improvements to ACE-Steps 1.5 text encoding#12283comfyanonymous merged 2 commits intoComfy-Org:masterfrom
Conversation
|
The main problem that i discovered now is that there are too many missing words. Sometimes, a single paragraph is gone. some tags have no effects, such as male voice. Another issue is that it's rather slow, and the time cannot be set to auto yet. The sound quality is fine |
|
Issues still the same , in my tests,nothing changed. A whole paragraph's gone still can happen. A lot of missing words. [Male Vocal] still has no effect. |
|
In my tests, prompt adherence has mixed results between v0.12.2 and v0.12.3. Still playing with both versions to figure out the nuances, especially because the text encoder for Ace has some gained some additional settings (previously not exposed?). One example: In v0.12.2 I could start a song with a tag like [Saxophone Intro] and it would work. Using identical settings (incl. kSampler), v0.12.3 ignores it completely. At the same time I get the feeling that in other areas v0.12.3 has some improvements in prompt adherence. However, when it comes to taste, for my kind of prompting v0.12.2 produces the better results overall. Sound quality seems to have improved at least in v0.12.3+ For reference: ace-step-v1.5-sft + dual clip with 0.6b and 1.7b. |
|
very intertesting , without llm is better, better than 1.5 , even 4B,at least male voice can show 😄 |
ACE-Steps 1.5 text encoding is pretty broken right now, unfortunately. I am not positive these changes are perfect/a complete fix. I didn't have a whole lot of time to work on it, and as a result of that this is also very lightly tested.
Results with these changes seem to result in much better output. Here's something fun to listen to: https://voca.ro/11TKOC8Jgebf
Given a caption
Blah blah user captionand lyrics[Instrumental]this is debug output from the the current implementation:Raw debug output
That's pretty hard to read, so printed out as strings:
lm_promptSerious issues here:
# Lyricsection, the lyrics are just run together with the caption.Refs:
lm_prompt_negativelyricsSerious issues here:
qwen3_06bSerious issues here:
175 seconds, not just a bare number.Refs:
With this pull, we get the output:
Raw debug output
lm_promptlm_prompt_negativelyricsqwen3_06bNo newline between the
# Captionand# Metassection looks a bit weird, however the official template would have the same result if there wasn't a trailing newline: https://github.com/ace-step/ACE-Step-1.5/blob/eafcc2098696c60fb9e35d91813d84282d78959a/acestep/constants.py#L101Potential issues this pull doesn't address:
<|endoftext|>tokens at the ends of some of the prompts. Also one has a newline in between the tokens, one doesn't.secondsin the places it appears. I only fixed the one place I was pretty sure about.# Captionand# Metassections in the SFT prompt would be a good idea.Unfortunately, I don't really have time at the moment to do more than whine about those other issues.