-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add PAG support for SD Controlnet Img2Img #8864
base: main
Are you sure you want to change the base?
add PAG support for SD Controlnet Img2Img #8864
Conversation
Merge branch 'SD_ControlNet_PAG_Img2Img' of https://github.com/Bhavay-2001/diffusers into SD_ControlNet_PAG_Img2Img :wq :wq! !wq :wq! :!wq :q! :wq
Hi @a-r-r-o-w, I am trying out this code to produce samples with import numpy as np
import torch
import cv2
from PIL import Image
from diffusers import AutoPipelineForImage2Image, ControlNetModel
from diffusers.utils import load_image
# download an image
image = load_image(
"https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)
image = np.array(image)
# get canny image
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
# load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = AutoPipelineForImage2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16", enable_pag=True
).to("cuda")
# generate image
generator = torch.manual_seed(0)
image_out = pipe(
"aerial view, a futuristic research complex in a bright foggy jungle, hard lighting",
num_inference_steps=20,
generator=generator,
guidance_scale=2.0,
image=canny_image,
pag_scale=3.0,
).images[0] Error - I think that this error is mainly because the changes that I have made haven't been integrated into the library yet soo that's why it is showing me this. Can you pls help me out here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is starting to look better and has many of the correct PAG-related changes as compared to previous PR. Thanks!
Some thoughts and corrections:
- The goal of an image-to-image controlnet pipeline is to be able to take in a text prompt, control image and input image for generating a new image with similar style and structure. The example code that you provide doesn't make use of an input image and control_image is incorrectly passed to input image.
- Your code doesn't have to be integrated in the library for it to work. You can install your own diffusers branch with your changes using
pip install -e .
in the root diffusers directory. - Please take a look at the implementation of the original StableDiffusionControlNetImg2ImgPipeline here and use the example code present there:
EXAMPLE_DOC_STRING = """ AutoPipelineForImageToImage(..., controlnet=controlnet, enable_pag=True)
- Ensure that things like IP Adapters work because the code path for processing ip adapter image/embeds is significantly altered after introducing PAG-related changes. Also ensure that single controlnet as well as multiple controlnets work. The Diffusers documentation/PRs will have abundant examples of how to make this possible.
- It is not feasible for us to help with debugging unfortunately. Most, if not all, the pipelines here should be runnable in a free-tier colab so please try to debug it there. Make sure to enable optimizations and run in fp16 if you're facing OOM. We can only assist with a final check unless the observed behaviour is really bizarre. Clearly, there are a few bugs that are obvious from first glance but I'd be happy to help once this has reached a more complete state.
latent_model_input = ( | ||
torch.cat([latents] * (prompt_embeds.shape[0] // latents.shape[0])) | ||
if self.do_classifier_free_guidance | ||
else latents | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. Please refer to other PAG pipelines to see how it's done
added_cond_kwargs = ( | ||
{"image_embeds": image_embeds} | ||
if ip_adapter_image is not None or ip_adapter_image_embeds is not None | ||
else None | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect because image_embeds is no longer assigned anywhere. It must be ip_adapter_image_embeds because that is what's prepared above. Please refer to other PAG PRs carefully.
Hi, I made the required changes that you suggested. I tried to debug the code and turns out that the value of |
Hi @tolgacangoz, I know this is not related to you but could you pls just look through this PR once? I don't know why this is happening but before the pipeline calls But inside that function Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heading to bed and will do a deeper review later, but this might be the bug. callback and callback steps were removed so no need to check them here
src/diffusers/pipelines/pag/pipeline_pag_controlnet_sd_img2img.py
Outdated
Show resolved
Hide resolved
thanks Bhavay, awesome work! it looks good to me now and i don't see anything incorrect from a glance. could you post some results with the reproducible code too please, like #8861? no need for an ablation of layer-wise applying PAG but general outputs for same seed/prompt, different pag and cfg scales would be cool i think you might need to run |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I tried running this example in a kaggle notebook and I am getting the error with controlnet input. I am trying to figure out the error there. |
unable to access, could you make it public? |
I made it public. If you still face issue, I'll upload the notebook here |
i can access it now. few suggestions on how to debug:
it is hard for us to help with debugging every small issue because it is just not viable and there isn't enough time. in the previous version of this PR and this PR, many basic changes for PAG were missing. i understand navigating a large codebase could be difficult, but in this case the changes were as simple as doing a diff between old pipelines that we have, and the new PAG pipelines that have been merged - to find all the required changes that need to be made. that said, not all required changes can be pin-pointed like this, otherwise how would someone wanting to learn the codebase or contribute, learn? maybe after applying the change mentioned here, it works, or maybe it does not since something else is missing. there are many other folks opening PAG PRs too who've been successful in integrating it in a short time - so I'd recommend taking a good look at the changes and really understanding it FWIW. |
Hi @Bhavay-2001, thanks for your work here. I think only a few more changes are required here and it should be good to merge :) Could you address the comment above and apply the relevant changes? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
hi @Bhavay-2001 |
What does this PR do?
Part of #8710
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.