Feature IP Adapter Xformers Attention Processor #9881

elismasilva · 2024-11-06T21:24:56Z

What does this PR do?

This PR is for fixing incorrect attention processor when setting Xformers attn after load ip adapter scale.

Solution was described on #8872.

Test Code

import numpy as np
import torch
from diffusers import AutoPipelineForText2Image
from transformers import CLIPVisionModelWithProjection
from diffusers.utils.loading_utils import load_image

MAX_SEED = np.iinfo(np.int32).max
base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
device = "cuda"
seed = 42

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", subfolder="models/image_encoder", torch_dtype=torch.float16
).to(device)

# load SDXL pipeline
pipe = AutoPipelineForText2Image.from_pretrained(
    base_model_path,    
    torch_dtype=torch.float16,
    image_encoder=image_encoder
).to(device)

#DEFAULT RUNNING ATTENTION IS 2.0
pipe.enable_vae_tiling() 
pipe.enable_model_cpu_offload()
#pipe.enable_xformers_memory_efficient_attention() #UNCOMMENT this line to run WITH XFORMERS before the IP adapter is loaded.

# load ip-adapter
pipe.load_ip_adapter("h94/IP-Adapter",
    subfolder="sdxl_models",    
    #weight_name="ip-adapter-plus-face_sdxl_vit-h.bin",
    weight_name="ip-adapter-plus_sdxl_vit-h.bin",
    image_encoder_folder=None,
)

# configure ip-adapter scales.
scale = {
    #"down": {"block_2": [0.0, 1.0]}, #composition
    "up": {"block_0": [0.0, 1.0, 0.0]}, #style
}

pipe.set_ip_adapter_scale(scale)
pipe.enable_xformers_memory_efficient_attention() #UNCOMMENT this line to run WITH XFORMERS after the IP adapter is loaded.

#pipe.disable_xformers_memory_efficient_attention() #UNCOMMENT this line to DISABLE XFORMERS and back to Attention 2.0.

generator = torch.Generator(device).manual_seed(seed)
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
image.resize((512, 512))

# generate image
# DO BREAKPOINT HERE AND CHECK ATTN PROCESSORS IN PIPE.UNET COMPONENT
images = pipe(
    prompt="a cat, masterpiece, best quality, high quality",
    negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
    ip_adapter_image=image,        
    guidance_scale=5,
    height=1024,
    width=1024,    
    num_inference_steps=30,     
    generator=generator
).images[0]

images.save("./data/result_1_diff.png")

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

requested by @a-r-r-o-w
@yiyixuxu and @asomoza

…ng incorrect attention processor when setting Xformers attn after load ip adapter scale, issues: huggingface#8863 huggingface#8872

HuggingFaceDocBuilderDev · 2024-11-07T02:05:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks for the PR
I left some questions, most on the IP adapter loading part, if you can help me understand why you added the change (e.g. the use cases you have in mind when you add them), it would be really helpful!!

yiyixuxu · 2024-11-07T02:14:20Z

src/diffusers/loaders/unet.py

-                    IPAdapterAttnProcessor2_0 if hasattr(F, "scaled_dot_product_attention") else IPAdapterAttnProcessor
-                )
+                if ('XFormers' not in str(self.attn_processors[name].__class__)):                
+                    attn_processor_class = (


can you explain this code change here

previously, if I understand the code correctly, we keep the original attention processor for motion modules (do not change to IP adapter attention processor)

now, we change to the default attention processor when it is not Xformer?

Hi, how are you? Help me understand if my reasoning is correct. In this condition it is true when "cross_attention_dim" is None or "motion_modules" is present. I just paid attention to the scenario where "cross_attention_dim" is None. Is there any specific attention class for models with "motion_modules" other than AttnProcessor and AttnProcessor2_0? Because if there was I would just leave "motion_modules" in an "elif". But this part of the code is part of a first solution that I had implemented some time ago when I had not yet implemented the replacement of the attention mechanism in the "set_use_memory_efficient_attention_xformers" method of the "Attention" class. So at the time when I was testing several adapters and combined adapters I was probably encountering a situation that made me force this xformers check in this part of the code. However, now that you mentioned it, I decided to comment out this part of the code and perform some more tests, and it seems that this modification is no longer necessary since "set_use_memory_efficient_attention_xformers" has been implemented. At least for now, I haven't run into any error situations when loading.

If you agree i will commit updates without this verification to original code.

for this, can you provide a code example that would fail without this change?

for this, can you provide a code example that would fail without this change?

yes see my code on PR if you check i have lines #pipe.enable_xformers_memory_efficient_attention() you can remove # to run before or after load PR i put the two lines before and after loading model.

i will commit my lasted code with some fixes for quality check

yiyixuxu · 2024-11-07T02:17:10Z

src/diffusers/loaders/unet.py

+                    attn_procs[name] = attn_processor_class()
+                else:
+                    attn_procs[name] = self.attn_processors[name]
+            else:                     


can you explain the change here?

Here in Else: it seems to me a situation of which comes first, the chicken or the egg? When we do not use the pipe.enable_xformers_memory_efficient_attention() call by default it is defining IPAdapterAttnProcessor2_0 or IPAdapterAttnProcessor for the IP Adapter. So when you call pipe.enable_xformers_memory_efficient_attention() before loading the IP Adapter all attn are defined for XFormersAttnProcessor, so when loading the IP Adapter modules after this call it is necessary to check if the defined mechanism is xformers to apply the new class "IPAdapterXFormersAttnProcessor". However, when you call pipe.enable_xformers_memory_efficient_attention() after loading the IP Adapter modules, the modules had already been set by default to "IPAdapterAttnProcessor2_0 or IPAdapterAttnProcessor " and the "set_use_memory_efficient_attention_xformers" method of the "Attention" class only knows how to set everything to XFormersAttnProcessor and this generated the error that was reported in the open issue. Now with the implementation that I made in this class, the method also knows how to identify "IPAdapterAttnProcessor2_0 or IPAdapterAttnProcessor " in the modules and correctly replace them with the new class. But it only knows how to do this because "IPAdapterAttnProcessor2_0 or IPAdapterAttnProcessor " was defined when loading the module. So these checks are necessary on both sides due to the order in which pipe.enable_xformers_memory_efficient_attention() is called, before or after loading the modules.

Can you confirm that you added this change in order to be able to handle this?

pipe.enable_xformers_memory_efficient_attention() pipe.load_ip_adapter()

Yes this is to solve this order, and vice-versa. My provided code in this PR simulate the two scenarios

…feature/xformers-ip-adapter-attn

yiyixuxu · 2024-11-08T01:31:33Z

cc @fabiorigano here too if you have time to give this a review!

yiyixuxu

thanks for making the code simpler! i left a few more quetions

yiyixuxu · 2024-11-08T08:23:18Z