Skip to content

Conversation

@CalamitousFelicitousness
Copy link

@CalamitousFelicitousness CalamitousFelicitousness commented Nov 29, 2025

What does this PR do?

This PR adds img2img pipeline for Z-Image. The summary of changes are below

  • Updated the pipeline structure to include ZImageImg2ImgPipeline alongside ZImagePipeline.
  • Implemented the ZImageImg2ImgPipeline class
  • Mapped the new ZImageImg2ImgPipeline for image generation tasks.
  • Added unit tests for ZImageImg2ImgPipeline
  • Updated dummy objects to include ZImageImg2ImgPipeline for testing

Closes issue #12752

Tested using a simple script:

Testing script
#!/usr/bin/env python
"""Test script for ZImage img2img support (without LoRA)."""

import sys
sys.path.insert(0, '/home/ohiom/diffusers/src')

import torch
from PIL import Image
from diffusers import ZImageImg2ImgPipeline

# Paths
MODEL_PATH = "database/models/huggingface/models--Tongyi-MAI--Z-Image-Turbo/snapshots/78771b7e11b922c868dd766476bda1f4fc6bfc96"
INPUT_IMAGE_PATH = "aline_1024.jpg"  # Use existing image as input

print("Loading ZImageImg2ImgPipeline...")
pipe = ZImageImg2ImgPipeline.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
pipe.to("cuda")
print("Pipeline loaded.")

# Load input image
print(f"\nLoading input image from {INPUT_IMAGE_PATH}...")
input_image = Image.open(INPUT_IMAGE_PATH).convert("RGB")
print(f"Input image size: {input_image.size}")

# Generate an image
prompt = "a woman sitting under a tree, oil painting style, impressionist, vibrant colors"
strength = 0.6  # 0.0 = no change, 1.0 = full transformation

print(f"\nGenerating image with prompt: {prompt}")
print(f"Strength: {strength}")

image = pipe(
    prompt=prompt,
    image=input_image,
    strength=strength,
    num_inference_steps=8,
    guidance_scale=3.0,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]

output_path = "test_zimage_img2img_output.png"
image.save(output_path)
print(f"\nImage saved to {output_path}")

Prompt: a woman sitting in a dark room, oil painting style, impressionist, vibrant colors

Clipboard2

LoRA functionality depends on my other PR #12750, so they will have to be merged sequentially. I did not think there was much point in leaving it out.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul @asomoza

Updated the pipeline structure to include ZImageImg2ImgPipeline
    alongside ZImagePipeline.
Implemented the ZImageImg2ImgPipeline class for image-to-image
    transformations, including necessary methods for
    encoding prompts, preparing latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageImg2ImgPipeline
    for image generation tasks.
Added unit tests for ZImageImg2ImgPipeline to ensure
    functionality and performance.
Updated dummy objects to include ZImageImg2ImgPipeline for
    testing purposes.
@CalamitousFelicitousness
Copy link
Author

For some reason the VAE Tiling couldn't meet the 0.2 diff threshold, my test has upped that to 0.3, whether further investigation is warranted I am not sure.

Copy link
Member

@asomoza asomoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot again! for this one we should probably wait for the lora one to be merged. I left a few comments

)
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor * 2)

def encode_prompt(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this function be also with Copied from

negative_prompt_embeds = []
return prompt_embeds, negative_prompt_embeds

def _encode_prompt(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before, this one can be Copied from no?

)
from .wan import WanImageToVideoPipeline, WanPipeline, WanVideoToVideoPipeline
from .wuerstchen import WuerstchenCombinedPipeline, WuerstchenDecoderPipeline
from .z_image import ZImageImg2ImgPipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're adding this pipeline here, can you also add the t2i too

@asomoza
Copy link
Member

asomoza commented Dec 1, 2025

this model is really finicky with the img2img but it seems to be working ok.

source img2img
dog_plushie (2) z_image_turbo_output_i2i

@CalamitousFelicitousness
Copy link
Author

@asomoza I just thought, I have inpainting PR lined up, do you think keeping this one img2img only and inpainting after that, separately, is the better approach, to keep the PR review easier? Or is it less work for you guys if I also merge this in this PR?

@asomoza
Copy link
Member

asomoza commented Dec 1, 2025

I prefer to keep them separated, I'm not really sure the inpainting can be good with this model so I want to test it and maybe we can add something like differential diffusion as a switch for it to be better

@CalamitousFelicitousness
Copy link
Author

CalamitousFelicitousness commented Dec 1, 2025

Alrighty, that's how I felt as well.

Inpainting seems alright
test_zimage_inpaint_output-1.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants