-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable CFG? #119
Comments
I did try it at the start (by accident mostly) and results were underwhelming then, and it's twice as slow to sample. There's 2 different model configs, this one being loaded as cfg-distlled by default, so while you can use cfg it's not really meant to be used like that, thus their demo also uses cfg 1.0 thus disabling it by default. What I'm now wondering is how does the LLM deal with negative prompts anyway? There's a default template that wouldn't make much sense for negatives, are you using the video prompt template, something custom or none at all? Of course optional node wouldn't hurt anyone (besides me for having to maintain that whole sampling path). |
I'm using the video template. I think it makes sense for negatives, as the negative noise prediction process is exactly the same as for the positive, just with a different prompt. I think the reason for the templates in the first place is to just "prime" the text encoder. It's an instruction tuned model, so with that template prefix, the LLM "thinks" it's predicting text to describe an image or video in detail, so the hidden states would include richer information that would make predicting the next word easier. But then those hidden states just become the text embeddings. That's why they do it like that. Also I would point out that when CFG=1, the code doesn't compute negative noise predictions at all, so it's just as fast. I'm seeing much better results with loras in some cases with something like CFG=4 and embedded_guidance_scale=2, so I personally would like the option. I could always just keep the changes locally but others might want it too. I do understand it might be a bit confusing for users if the default is to have CFG=1, but there's a negative prompt text box. I don't know if there's a good way to address that. |
Considering the amount of issues and confusion already, that's exactly why I wanted to simplify it in the end. But it could be implemented as optional extra node or something. Did you try STG though? I didn't really find the best settings for it, but some people posted quite promising comparisons. It has the same issue with speed though, so I added time step scheduling for it to just run it for few steps and it seemed to give most of the benefit, additional issue with it is increased memory use, so ultimately most people would need to run it with block_swapping, which then also should be scheduled... Cfg scheduling might also make sense then. I do understand your point about LoRas. |
Making it two nodes sounds like it would help with confusion. You could make them share a common flow in the code (there's very few differences between CFG and not). I haven't tried STG yet but I'll give it a shot. For what it's worth, what I'm seeing with loras and CFG I've also seen with Flux dev. Prompt adherence and perceived "strength" of the lora is much better with CFG, but with flux you also have to use one of those anti burn techniques. Luckily here you don't need anything special. For some reason HunyuanVideo is burning much less for me than flux with CFG. |
what does STG-A and STG-R does tho |
I really appreciate this CFG topic and the "making it two nodes option" |
What I'm thinking of is just making it extra node such as STG is now, wouldn't change anything to the user who doesn't use it. |
does CFG work now i mean isnt cfg beeing distilled . i saw you addbing back cfg option
|
I have tested this new module a lot, and although I'm not entirely sure its use is meant for the negative prompt purpose I assume that's the use, even though it's not directly specified. |
I uncommented some of the code for CFG and negative prompts and have been playing around with it a bit. This model works just fine with CFG, as long as embedded_guidance_scale is 1, or a low value. Both in general use, and especially for loras, it can improve prompt adherence. I think there's no reason to have CFG disabled. The model works like Flux dev, you can use CFG as long as you have the settings right.
Also, with CFG and negative prompting it would be nice to have an option to compute the positive and negative noise predictions separately, rather than batched together. It might take a bit longer but it wouldn't use more VRAM that way, I think. But this could be done later.
The text was updated successfully, but these errors were encountered: