-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[Core] Add PAG support for PixArtSigma #8921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -258,3 +258,91 @@ def pag_attn_processors(self): | |||
if proc.__class__ in (PAGCFGIdentitySelfAttnProcessor2_0, PAGIdentitySelfAttnProcessor2_0): | |||
processors[name] = proc | |||
return processors | |||
|
|||
|
|||
class PixArtPAGMixin(PAGMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I should copy-paste the other methods from PAGMixin
and not make it a subclass of PAGMixin
.
PixArtPAGMixin
differs only in terms of the methods I implemented here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious about this. subclass looks neater than copy-paste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to have a Mixin nested into another Mixin if it's possible
we can either
- copy-paste the common methods from
PAGMixin
(maybe we can renamePAGMixin
toSDPAGMixin
in that case) - we keep a common
PAGMixin
and move these methods that're potentially model-specific to pipeline methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go with the first one. I will defer any refactoring to @yiyixuxu if she feels the need to do one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 27fceb3. LMK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @a-r-r-o-w here too, I think we can reuse this one for hunyuan too
maybe we can try the robot prepare meal example? https://huggingface.co/docs/diffusers/main/en/using-diffusers/pag#general-tasks I cannot tell if there is an improvement or not in the example provided (but I'm also not very good at looking at this, so cc @asomoza for help here too) |
Yeah I am not good at that either. So, requested @asomoza to chime in :D Here are the insect cooking meal results.
|
This one is the most interesting one of the new models, it definitely helps with the generation, it cleans the image and fixes the errors, but as with kolors, it takes away some details, so it depends on what you want. For example in this image, it fixes the hands of the robot and the shape of the bowls:
Also I found the time to play with the model and it's a really good model, I'm impressed with the generations it can do. Anyway, what I found interesting is how the PAG layers behave, they affect a lot of the generation, for example:
As with SDXL, I can see some relation of the layers with the generations but I'll need to play more with it to be sure. For example the 19 and 21 layers seems to affect more the background and some others seems to affect more the shape or the colors. Also the 1 and 6 layers can be used for the composition with a second pass over them. |
Thank you very much, Alvaro! So, what I am hearing is that it makes sense to have PAG supported for PixArt Sigma, yeah? Cc @lawrence-cj too you might find it nice. |
@asomoza very nice! thanks! |
@yiyixuxu can I add test and docs and get it ready for final review? |
@sayakpaul sure ! looks good to me |
params = TEXT_TO_IMAGE_PARAMS.union({"pag_scale", "pag_adaptive_scale"}) | ||
params = set(params) | ||
params.remove("cross_attention_kwargs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LMK if this needs to be handled differently. PixArt doesn't have cross_attention_kwargs
and doesn't need to have yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
# Because the PAG PixArt Sigma has `pag_applied_layers`. | ||
# Also, we shouldn't be doing `set_default_attn_processor()` after loading | ||
# the pipeline with `pag_applied_layers`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this and subsequent tests should be handled differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
@yiyixuxu I have left some questions for you regarding the tests. LMK. |
@sayakpaul looks good to me! feel free to merge once the conflicts are resolved |
* feat: add pixart sigma pag. * inits. * fixes * fix * remove print. * copy paste methods to the pixart pag mixin * fix-copies * add documentation. * add tests. * remove correction file. * remove pag_applied_layers * empty
What does this PR do?
Part of #8785.
I think PixArt Sigma is way more popular than PixArt Alpha (longer sequence length, better quality in general, etc.), it makes sense to just add PAG to PixArt Sigma. If there's a need, I think we could open it up to community.
TODO
Code
A bit of ablation study is in the comment below.
@asomoza would appreciate it if you could run some experiments with it :)
@sunovivid for awareness.