Skip to content

Conversation

@Pfannkuchensack
Copy link
Contributor

Add comprehensive support for Z-Image-Turbo (S3-DiT) models including:

Backend:

  • New BaseModelType.ZImage in taxonomy
  • Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig)
  • Model loader for Z-Image transformer and Qwen3 text encoder
  • Z-Image conditioning data structures
  • Step callback support for Z-Image with FLUX latent RGB factors

Invocations:

  • z_image_model_loader: Load Z-Image transformer and Qwen3 encoder
  • z_image_text_encoder: Encode prompts using Qwen3 with chat template
  • z_image_denoise: Flow matching denoising with time-shifted sigmas
  • z_image_image_to_latents: Encode images to 16-channel latents
  • z_image_latents_to_image: Decode latents using FLUX VAE

Frontend:

  • Z-Image graph builder for text-to-image generation
  • Model picker and validation updates for z-image base type
  • CFG scale now allows 0 (required for Z-Image-Turbo)
  • Clip skip disabled for Z-Image (uses Qwen3, not CLIP)
  • Optimal dimension settings for Z-Image (1024x1024)

Technical details:

  • Uses Qwen3 text encoder (not CLIP/T5)
  • 16 latent channels with FLUX-compatible VAE
  • Flow matching scheduler with dynamic time shift
  • 8 inference steps recommended for Turbo variant
  • bfloat16 inference dtype

Summary

Related Issues / Discussions

QA Instructions

  • Install a Z-Image-Turbo model (e.g., from HuggingFace)
  • Select the model in the Model Picker
  • Generate a text-to-image with:
  • CFG Scale: 0
  • Steps: 8
  • Resolution: 1024x1024
  • Verify the generated image is coherent (not noise)

Merge Plan

Standard merge, no special considerations needed.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

Add comprehensive support for Z-Image-Turbo (S3-DiT) models including:

Backend:
- New BaseModelType.ZImage in taxonomy
- Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig)
- Model loader for Z-Image transformer and Qwen3 text encoder
- Z-Image conditioning data structures
- Step callback support for Z-Image with FLUX latent RGB factors

Invocations:
- z_image_model_loader: Load Z-Image transformer and Qwen3 encoder
- z_image_text_encoder: Encode prompts using Qwen3 with chat template
- z_image_denoise: Flow matching denoising with time-shifted sigmas
- z_image_image_to_latents: Encode images to 16-channel latents
- z_image_latents_to_image: Decode latents using FLUX VAE

Frontend:
- Z-Image graph builder for text-to-image generation
- Model picker and validation updates for z-image base type
- CFG scale now allows 0 (required for Z-Image-Turbo)
- Clip skip disabled for Z-Image (uses Qwen3, not CLIP)
- Optimal dimension settings for Z-Image (1024x1024)

Technical details:
- Uses Qwen3 text encoder (not CLIP/T5)
- 16 latent channels with FLUX-compatible VAE
- Flow matching scheduler with dynamic time shift
- 8 inference steps recommended for Turbo variant
- bfloat16 inference dtype
@github-actions github-actions bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-deps PRs that change python dependencies labels Nov 30, 2025
Add comprehensive LoRA support for Z-Image models including:

Backend:
- New Z-Image LoRA config classes (LoRA_LyCORIS_ZImage_Config, LoRA_Diffusers_ZImage_Config)
- Z-Image LoRA conversion utilities with key mapping for transformer and Qwen3 encoder
- LoRA prefix constants (Z_IMAGE_LORA_TRANSFORMER_PREFIX, Z_IMAGE_LORA_QWEN3_PREFIX)
- LoRA detection logic to distinguish Z-Image from Flux models
- Layer patcher improvements for proper dtype conversion and parameter
@lstein
Copy link
Collaborator

lstein commented Dec 2, 2025

Very impressive. The model is working with acceptable performance even on my 12 GB RAM card.

I notice the following message in the error log:

[2025-12-01 20:50:58,822]::[ModelManagerService]::WARNING --> [MODEL CACHE] Failed to calculate model size for unexpected model type: <class 'transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer'>. The model will be treated as having size 0.

Would it be possible to add support for the quantized models, e.g. T5B/Z-Image-Turbo-FP8 or jayn7/Z-Image-Turbo-GGUF ?

@Pfannkuchensack
Copy link
Contributor Author

I'll take a look at it and report back.

@lstein
Copy link
Collaborator

lstein commented Dec 2, 2025

I tried two huggingface LoRAs that claim to be based on z-image, but they were detected as Flux lycoris models:

reverentelusarca/elusarca-anime-style-lora-z-image-turbo
tarn59/pixel_art_style_lora_z_image_turbo

…ntification

Move Flux layer structure check before metadata check to prevent misidentifying Z-Image LoRAs (which use `diffusion_model.layers.X`) as Flux AI Toolkit format. Flux models use `double_blocks` and `single_blocks` patterns which are now checked first regardless of metadata presence.
…ibility

Add comprehensive support for GGUF quantized Z-Image models and improve component flexibility:

Backend:
- New Main_GGUF_ZImage_Config for GGUF quantized Z-Image transformers
- Z-Image key detection (_has_z_image_keys) to identify S3-DiT models
- GGUF quantization detection and sidecar LoRA patching for quantized models
- Qwen3Encoder_Qwen3Encoder_Config for standalone Qwen3 encoder models

Model Loader:
- Split Z-Image model
@Pfannkuchensack
Copy link
Contributor Author

image I did tried both of the Lora and both of them get imported as z-images lora.

@Pfannkuchensack Pfannkuchensack marked this pull request as ready for review December 4, 2025 23:46
@lstein
Copy link
Collaborator

lstein commented Dec 5, 2025

When running upscaling, diffusers 0.36.0.dev0 dies because the diffusers.models.controlnet module has been renamed to diffusers.models.controlnets.controlnet. I suggest applying this patch to fix the issue:

diff --git a/invokeai/backend/util/hotfixes.py b/invokeai/backend/util/hotfixes.py
index 7e258b8779..1609fe12c4 100644
--- a/invokeai/backend/util/hotfixes.py
+++ b/invokeai/backend/util/hotfixes.py
@@ -5,7 +5,6 @@ import torch
 from diffusers.configuration_utils import ConfigMixin, register_to_config
 from diffusers.loaders.single_file_model import FromOriginalModelMixin
 from diffusers.models.attention_processor import AttentionProcessor, AttnProcessor
-from diffusers.models.controlnet import ControlNetConditioningEmbedding, ControlNetOutput, zero_module
 from diffusers.models.embeddings import (
     TextImageProjection,
     TextImageTimeEmbedding,
@@ -13,6 +12,7 @@ from diffusers.models.embeddings import (
     TimestepEmbedding,
     Timesteps,
 )
+from diffusers.models.controlnets.controlnet import ControlNetConditioningEmbedding, ControlNetOutput, zero_module
 from diffusers.models.modeling_utils import ModelMixin
 from diffusers.models.unets.unet_2d_blocks import (
     CrossAttnDownBlock2D,
@@ -777,7 +777,7 @@ class ControlNetModel(ModelMixin, ConfigMixin, FromOriginalModelMixin):
 
 
 diffusers.ControlNetModel = ControlNetModel
-diffusers.models.controlnet.ControlNetModel = ControlNetModel
+diffusers.models.controlnets.controlnet.ControlNetModel = ControlNetModel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies Root

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants