Skip to content

[Kolors] Add IP Adapter #8901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 26, 2024
Merged

[Kolors] Add IP Adapter #8901

merged 14 commits into from
Jul 26, 2024

Conversation

asomoza
Copy link
Member

@asomoza asomoza commented Jul 19, 2024

What does this PR do?

Add the new released IP Adapter to the Kolors Pipelines

Note: I'll add the img2img after the initial review and after #8856 is merged.

How to test

T2I

import torch
from transformers import CLIPVisionModelWithProjection

from diffusers import DPMSolverMultistepScheduler, KolorsPipeline
from diffusers.utils import load_image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "Kwai-Kolors/Kolors-IP-Adapter-Plus",
    subfolder="image_encoder",
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    revision="refs/pr/4",
)

pipe = KolorsPipeline.from_pretrained(
    "Kwai-Kolors/Kolors-diffusers", image_encoder=image_encoder, torch_dtype=torch.float16, variant="fp16"
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)

pipe.load_ip_adapter(
    "Kwai-Kolors/Kolors-IP-Adapter-Plus",
    subfolder="",
    weight_name="ip_adapter_plus_general.safetensors",
    revision="refs/pr/4",
    image_encoder_folder=None,
)
pipe.enable_model_cpu_offload()

ipa_image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/kolors/cat_square.png")

image = pipe(
    prompt="best quality, high quality",
    negative_prompt="",
    guidance_scale=6.5,
    num_inference_steps=25,
    ip_adapter_image=ipa_image,
).images[0]

image.save("kolors_ipa_result.png")
source result 1 result 2
cat_square kolors_20240718215114_3149038746 kolors_20240718215201_2410121219

IMG2IMG

import math

import torch
from transformers import CLIPVisionModelWithProjection

from diffusers import DPMSolverMultistepScheduler, KolorsImg2ImgPipeline
from diffusers.utils import load_image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "Kwai-Kolors/Kolors-IP-Adapter-Plus",
    subfolder="image_encoder",
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    revision="refs/pr/4",
)

pipe = KolorsImg2ImgPipeline.from_pretrained(
    "Kwai-Kolors/Kolors-diffusers", image_encoder=image_encoder, torch_dtype=torch.float16, variant="fp16"
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)

pipe.load_ip_adapter(
    "Kwai-Kolors/Kolors-IP-Adapter-Plus",
    subfolder="",
    weight_name="ip_adapter_plus_general.safetensors",
    revision="refs/pr/4",
    image_encoder_folder=None,
)
pipe.set_ip_adapter_scale(0.4)

pipe.enable_model_cpu_offload()

source_image = load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/kolors/capyrabbit.png?download=true"
)
ipa_image = load_image(
    "https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/kolors/ip_image.png?download=true"
)

prompt = "a capybara wearing sunglasses. In the background of the image there are trees, poles, grass and other objects. At the bottom of the object there is the road., 8k, highly detailed."

strength = 0.5
steps = 25
num_inference_steps = math.ceil(steps / strength)

image = pipe(
    prompt=prompt,
    image=source_image,
    negative_prompt="",
    guidance_scale=6.5,
    num_inference_steps=num_inference_steps,
    strength=strength,
    ip_adapter_image=ipa_image,
).images[0]

image.save("kolors_img2img_ipa_result.png")
original ip image result style
capyrabbit ip_image 20240720025802_1449721953 20240720031546_3711734297

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu

@asomoza asomoza requested a review from yiyixuxu July 19, 2024 02:05
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, don't forget to set it back in unload_ip_adapter

def unload_ip_adapter(self):

@asomoza asomoza requested a review from yiyixuxu July 20, 2024 07:24
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@asomoza asomoza requested a review from stevhliu July 26, 2024 11:00
@asomoza
Copy link
Member Author

asomoza commented Jul 26, 2024

@stevhliu can you please review the documentation.

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Just a few comments to improve clarity 😄

@asomoza asomoza merged commit 73acebb into huggingface:main Jul 26, 2024
15 checks passed
@asomoza asomoza deleted the kolors-ip-adapter branch July 26, 2024 18:25
@e1ijah1
Copy link

e1ijah1 commented Oct 10, 2024

Hello, and thank you for this excellent PR! I'm currently working with a model that's encountering memory issues when loading the IP-Adapter on a single GPU. I noticed that the load_ip_adapter method doesn't seem to support specifying different devices directly.
I'm wondering if you could provide some guidance on how to effectively utilize multiple GPUs when loading the model and IP-Adapter?

@asomoza
Copy link
Member Author

asomoza commented Oct 10, 2024

AFAIK we don't have a method of separating the IP Adapter from the model device, what you can do is to get the image embeddings before inference.

Also, you can split the pipeline modules in different devices too, this guide should be applicable for this use case too.

You can even get the text embeddings separately as shown in that guide, so actually the most important part here would be to free the VRAM of the text encoder than separating the IP Adapter into another device.

sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* initial draft

* apply suggestions

* fix failing test

* added ipa to img2img

* add docs

* apply suggestions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants