Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PAG support for SD Controlnet Img2Img #8864

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Bhavay-2001
Copy link
Contributor

@Bhavay-2001 Bhavay-2001 commented Jul 14, 2024

What does this PR do?

Part of #8710

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Bhavay-2001 Bhavay-2001 changed the title Sd control net pag img2 img add PAG support for SD Controlnet Img2Img Jul 14, 2024
@Bhavay-2001
Copy link
Contributor Author

Hi @a-r-r-o-w, I am trying out this code to produce samples with StableDiffusionControlNetPagImg2ImgPipeline but it's giving me some error. Can you pls check this code sample once?

import numpy as np
import torch
import cv2
from PIL import Image

from diffusers import AutoPipelineForImage2Image, ControlNetModel
from diffusers.utils import load_image

# download an image
image = load_image(
    "https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)
image = np.array(image)

# get canny image
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = AutoPipelineForImage2Image.from_pretrained(
     "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16", enable_pag=True
).to("cuda")

# generate image
generator = torch.manual_seed(0)
image_out = pipe(
     "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting",
     num_inference_steps=20,
     generator=generator,
     guidance_scale=2.0,
     image=canny_image,
     pag_scale=3.0,
).images[0]

Error - ValueError: AutoPipeline can't find a pipeline linked to StableDiffusionControlNetPAGPipeline for stable-diffusion-controlnet-pag.

I think that this error is mainly because the changes that I have made haven't been integrated into the library yet soo that's why it is showing me this. Can you pls help me out here?

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is starting to look better and has many of the correct PAG-related changes as compared to previous PR. Thanks!

Some thoughts and corrections:

  • The goal of an image-to-image controlnet pipeline is to be able to take in a text prompt, control image and input image for generating a new image with similar style and structure. The example code that you provide doesn't make use of an input image and control_image is incorrectly passed to input image.
  • Your code doesn't have to be integrated in the library for it to work. You can install your own diffusers branch with your changes using pip install -e . in the root diffusers directory.
  • Please take a look at the implementation of the original StableDiffusionControlNetImg2ImgPipeline here and use the example code present there: . Make sure your PAG implementation is runnable with AutoPipelineForImageToImage(..., controlnet=controlnet, enable_pag=True)
  • Ensure that things like IP Adapters work because the code path for processing ip adapter image/embeds is significantly altered after introducing PAG-related changes. Also ensure that single controlnet as well as multiple controlnets work. The Diffusers documentation/PRs will have abundant examples of how to make this possible.
  • It is not feasible for us to help with debugging unfortunately. Most, if not all, the pipelines here should be runnable in a free-tier colab so please try to debug it there. Make sure to enable optimizations and run in fp16 if you're facing OOM. We can only assist with a final check unless the observed behaviour is really bizarre. Clearly, there are a few bugs that are obvious from first glance but I'd be happy to help once this has reached a more complete state.

Comment on lines 1192 to 1196
latent_model_input = (
torch.cat([latents] * (prompt_embeds.shape[0] // latents.shape[0]))
if self.do_classifier_free_guidance
else latents
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. Please refer to other PAG pipelines to see how it's done

Comment on lines 1173 to 1177
added_cond_kwargs = (
{"image_embeds": image_embeds}
if ip_adapter_image is not None or ip_adapter_image_embeds is not None
else None
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect because image_embeds is no longer assigned anywhere. It must be ip_adapter_image_embeds because that is what's prepared above. Please refer to other PAG PRs carefully.

@Bhavay-2001
Copy link
Contributor Author

Bhavay-2001 commented Jul 16, 2024

Hi, I made the required changes that you suggested. I tried to debug the code and turns out that the value of controlnet_conditioning_scale parameter is in the format of list instead of float. I tried to see why is such the case but cannot find it. It works fine with the non PAG pipeline though.

@Bhavay-2001
Copy link
Contributor Author

Hi @tolgacangoz, I know this is not related to you but could you pls just look through this PR once? I don't know why this is happening but before the pipeline calls self.check_inputs function, the value of the parameter `controlnet_conditioning_scale is what we have set it to be.

But inside that function check_inputs, the value of the parameter changes. I cannot figure out where does the code goes wrong. Could you pls help me figure out?

Thanks

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heading to bed and will do a deeper review later, but this might be the bug. callback and callback steps were removed so no need to check them here

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Jul 28, 2024

thanks Bhavay, awesome work! it looks good to me now and i don't see anything incorrect from a glance. could you post some results with the reproducible code too please, like #8861? no need for an ablation of layer-wise applying PAG but general outputs for same seed/prompt, different pag and cfg scales would be cool

i think you might need to run make style and make fix-copies for the tests to pass

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Bhavay-2001
Copy link
Contributor Author

thanks Bhavay, awesome work! it looks good to me now and i don't see anything incorrect from a glance. could you post some results with the reproducible code too please, like #8861? no need for an ablation of layer-wise applying PAG but general outputs for same seed/prompt, different pag and cfg scales would be cool

i think you might need to run make style and make fix-copies for the tests to pass

I tried running this example in a kaggle notebook and I am getting the error with controlnet input. I am trying to figure out the error there.

@a-r-r-o-w
Copy link
Member

I tried running this example in a kaggle notebook and I am getting the error with controlnet input. I am trying to figure out the error there.

unable to access, could you make it public?

@Bhavay-2001
Copy link
Contributor Author

I made it public. If you still face issue, I'll upload the notebook here

@a-r-r-o-w
Copy link
Member

I made it public. If you still face issue, I'll upload the notebook here

i can access it now. few suggestions on how to debug:

  • it says that the dimensions do not match in dim 0. this means it expected a batch size of 3 but only got 2. it is quite easy to see why. when you call prepare_control_image to prepare the controlnet image, it performs torch.cat([image] * 2) if classifier-free guidance is enabled - in this case, it is. for perturbed attention guidance, the expected shape is 3 - which you have done for other embeddings and inputs required
  • you will have to prepare the control_image accordingly as well. we have many PRs related to PAG open already - all that's required is to thorougly understand the code. please take a good look at: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd.py#L1193

it is hard for us to help with debugging every small issue because it is just not viable and there isn't enough time. in the previous version of this PR and this PR, many basic changes for PAG were missing. i understand navigating a large codebase could be difficult, but in this case the changes were as simple as doing a diff between old pipelines that we have, and the new PAG pipelines that have been merged - to find all the required changes that need to be made. that said, not all required changes can be pin-pointed like this, otherwise how would someone wanting to learn the codebase or contribute, learn? maybe after applying the change mentioned here, it works, or maybe it does not since something else is missing. there are many other folks opening PAG PRs too who've been successful in integrating it in a short time - so I'd recommend taking a good look at the changes and really understanding it FWIW.

@a-r-r-o-w
Copy link
Member

Hi @Bhavay-2001, thanks for your work here. I think only a few more changes are required here and it should be good to merge :) Could you address the comment above and apply the relevant changes?

@yiyixuxu yiyixuxu added the PAG label Sep 4, 2024
@yiyixuxu yiyixuxu mentioned this pull request Sep 20, 2024
6 tasks
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 29, 2024
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Sep 30, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Oct 26, 2024
@a-r-r-o-w a-r-r-o-w removed the stale Issues that haven't received updates label Oct 27, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Nov 20, 2024
@yiyixuxu yiyixuxu added close-to-merge and removed stale Issues that haven't received updates labels Dec 3, 2024
@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Dec 3, 2024

hi @Bhavay-2001
it looks like we are really close to finish:)
let us know if you'll be able to complete it, if not, we can ask others to help:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants