-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[Core] LoRA improvements pt. 3 #4842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The documentation is not available anymore as the PR was closed or merged. |
I think that is not ideal - experimentation with the pipe loaded and changing the LoRA weight on inference time to find the sweet spot is a common use-case in the community, many LoRA authors suggest a range and users usually try to find the best scale within the range for their prompt Also, slider LoRAs are a growing use-case, and they require different values during inference, as it is expected from users to play with different scale values during inference to see how the scale influence the results E.g.: the curly hair slider LoRA will build subjects with a super straight hair if the LoRA scale is |
Let me proceed with this one and see how this factors out. |
I am still working on it. Please don't commit anything to it yet. |
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice! Left mostly nits.
- It would be nice to call
_adjust_lora_scale_text_encoder
part directly in theencode_prompt
function - In addition to the tests that have been added, can we also test that: fuse lora with scales gives the same result as normal lora with scales?
src/diffusers/pipelines/versatile_diffusion/modeling_text_unet.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Good to merge for me whenever :-)
@patrickvonplaten still some problems (specifically the test case you mentioned here) and I am debugging them currently. About the tests failing I think they're acting flaky because without the I will request for another review from you when this PR's ready. Thanks for your reviews so far! |
@patrickvonplaten in e60f450, I added a test to check for the equivalence (as mentioned in #4842). However, it currently fails. I looked into the potential suspects but they seem alright to me (indicated by the print statements one can see). If you have some time, it would be great to have some suggestions here. To clear the air, I am shifting to #4855. |
Back at it. |
There's something wrong with our LoRA attention processor, likely (maybe due to the changes introduced in #4833). I am confirming this because if we load non-diffusers LoRA checkpoints the effects of So, this probably narrows down the area we need to look for to solve the current bug. Looking into it. |
with torch.no_grad(): | ||
for parameter in lora_attn_parameters: | ||
if randn_weight: | ||
parameter[:] = torch.randn_like(parameter) | ||
parameter[:] = torch.randn_like(parameter) * var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should allow to have smaller LoRA weights so that they we can keep a reasonable precision in the tests.
@sayakpaul I think your PR was just fine. I'm quite certain that the test only failed due to numerical precision issues which were fixed by passing LoRA layers that are less extreme in values (e.g. are distributed via N(0, 0.1) instead of N(0, 1)). Also ran the slow tests locally and they all passed. Think we're good here. Nice job! |
@sayakpaul Thank you for the PR, I was confused on why you add the self.lora_scale = lora_scale in the end of |
Precisely. |
* throw warning when more than one lora is attempted to be fused. * introduce support of lora scale during fusion. * change test name * changes * change to _lora_scale * lora_scale to call whenever applicable. * debugging * lora_scale additional. * cross_attention_kwargs * lora_scale -> scale. * lora_scale fix * lora_scale in patched projection. * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * styling. * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * remove unneeded prints. * remove unneeded prints. * assign cross_attention_kwargs. * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * clean up. * refactor scale retrieval logic a bit. * fix nonetypw * fix: tests * add more tests * more fixes. * figure out a way to pass lora_scale. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * unify the retrieval logic of lora_scale. * move adjust_lora_scale_text_encoder to lora.py. * introduce dynamic adjustment lora scale support to sd * fix up copies * Empty-Commit * add: test to check fusion equivalence on different scales. * handle lora fusion warning. * make lora smaller * make lora smaller * make lora smaller --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* throw warning when more than one lora is attempted to be fused. * introduce support of lora scale during fusion. * change test name * changes * change to _lora_scale * lora_scale to call whenever applicable. * debugging * lora_scale additional. * cross_attention_kwargs * lora_scale -> scale. * lora_scale fix * lora_scale in patched projection. * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * styling. * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * remove unneeded prints. * remove unneeded prints. * assign cross_attention_kwargs. * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * clean up. * refactor scale retrieval logic a bit. * fix nonetypw * fix: tests * add more tests * more fixes. * figure out a way to pass lora_scale. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * unify the retrieval logic of lora_scale. * move adjust_lora_scale_text_encoder to lora.py. * introduce dynamic adjustment lora scale support to sd * fix up copies * Empty-Commit * add: test to check fusion equivalence on different scales. * handle lora fusion warning. * make lora smaller * make lora smaller * make lora smaller --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
What does this PR do?
Currently, using different scales during LoRA fusion is supported but how to best support it for non-fusion workflows is yet to be brainstormed.
From here:
Do we do this when a pipeline call is made and if a
LoRACompatible*
layer has itslora_layer
set, then we multiply the scale with the output of thelora_layer
. Thescale
I think still needs to be retrieved fromcross_attention_kwargs
.Or, do we set
lora_scale
attribute in the LoRALinear and LoRAConv layers when they're initialized? But doing so will not allow having differentscale
values during inference IIUC.