Image2ID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks.
We have adapted Image2ID for Kolors. Leveraging Kolors' robust text generation capabilities 👍👍👍, Image2ID can be integrated with Kolors to simultaneously generate ID and text.
You can directly download the model from Huggingface. You also can download the model in python script:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="ImageX/Image2ID", filename="ControlNetModel/config.json", local_dir="./checkpoints")
hf_hub_download(repo_id="ImageX/Image2ID", filename="ControlNetModel/diffusion_pytorch_model.safetensors", local_dir="./checkpoints")
hf_hub_download(repo_id="ImageX/Image2ID", filename="ip-adapter.bin", local_dir="./checkpoints")Or run the following command to download all models:
pip install -r gradio_demo/requirements.txt
python gradio_demo/download_models.pyIf you cannot access to Huggingface, you can use hf-mirror to download models.
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download ImageX/Image2ID --local-dir checkpoints --local-dir-use-symlinks FalseFor face encoder, you need to manually download via this URL to models/antelopev2 as the default link is invalid. Once you have prepared all models, the folder tree should be like:
.
├── models
├── checkpoints
├── ip_adapter
├── pipeline_stable_diffusion_xl_image2id.py
└── README.md
If you want to reproduce results in the paper, please refer to the code in infer_full.py. If you want to compare the results with other methods, even without using depth-controlnet, it is recommended that you use this code.
If you are pursuing better results, it is recommended to follow Image2ID-Rome.
The following code👇 comes from infer.py. If you want to quickly experience Image2ID, please refer to the code in infer.py.
# !pip install opencv-python transformers accelerate insightface
import diffusers
from diffusers.utils import load_image
from diffusers.models import ControlNetModel
import cv2
import torch
import numpy as np
from PIL import Image
from insightface.app import FaceAnalysis
from pipeline_stable_diffusion_xl_image2id import StableDiffusionXLImage2IDPipeline, draw_kps
# prepare 'antelopev2' under ./models
app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
# prepare models under ./checkpoints
face_adapter = f'./checkpoints/ip-adapter.bin'
controlnet_path = f'./checkpoints/ControlNetModel'
# load IdentityNet
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
base_model = 'xmg2024/YamerMIX_v8' # from https://civitai.com/models/84040?modelVersionId=196039
pipe = StableDiffusionXLImage2IDPipeline.from_pretrained(
base_model,
controlnet=controlnet,
torch_dtype=torch.float16
)
pipe.cuda()
# load adapter
pipe.load_ip_adapter_image2id(face_adapter)Then, you can customized your own face images
# load an image
face_image = load_image("./examples/yann-lecun_resize.jpg")
# prepare face emb
face_info = app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1] # only use the maximum face
face_emb = face_info['embedding']
face_kps = draw_kps(face_image, face_info['kps'])
# prompt
prompt = "film noir style, ink sketch|vector, male man, highly detailed, sharp focus, ultra sharpness, monochrome, high contrast, dramatic shadows, 1940s style, mysterious, cinematic"
negative_prompt = "ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, vibrant, colorful"
# generate image
image = pipe(
prompt,
negative_prompt=negative_prompt,
image_embeds=face_emb,
image=face_kps,
controlnet_conditioning_scale=0.8,
ip_adapter_scale=0.8,
).images[0]To save VRAM, you can enable CPU offloading
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()Our work is compatible with LCM-LoRA. First, download the model.
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="latent-consistency/lcm-lora-sdxl", filename="pytorch_lora_weights.safetensors", local_dir="./checkpoints")To use it, you just need to load it and infer with a small num_inference_steps. Note that it is recommendated to set guidance_scale between [0, 1].
from diffusers import LCMScheduler
lcm_lora_path = "./checkpoints/pytorch_lora_weights.safetensors"
pipe.load_lora_weights(lcm_lora_path)
pipe.fuse_lora()
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
num_inference_steps = 10
guidance_scale = 0Run the following command:
python gradio_demo/app.pyor MultiControlNet version:
gradio_demo/app-multicontrolnet.py - For higher similarity, increase the weight of controlnet_conditioning_scale (IdentityNet) and ip_adapter_scale (Adapter).
- For over-saturation, decrease the ip_adapter_scale. If not work, decrease controlnet_conditioning_scale.
- For higher text control ability, decrease ip_adapter_scale.
- For specific styles, choose corresponding base model makes differences.
- We have not supported multi-person yet, only use the largest face as reference facial landmarks.
- We provide a style template for reference.

