[Z-Image] various small changes, Z-Image transformer tests, etc. #12741

sayakpaul · 2025-11-28T13:59:54Z

What does this PR do?

Introduces a dedicated test suite for the Z-Image DiT.
Adds is_flaky decorator to test_inference() in the Z-Image pipeline test suite.
Adds a return_dict argument to the forward() of Z-Image DiT, following other models in the library.
- As a consequence of this, I followed the return pattern, i.e., return a Transformer2DModelOutput type output or something like return (out,).

Notes

The model accepts the hidden states as a list[torch.Tensor] which differs from other models. Output also follows the same type. This is why I had to modify a couple of tests (where it was reasonably easy) to allow this. Tests, where it was not relatively easy, were skipped (such as test_training, `test_ema_training, etc.).
The repeated block in this model is ZImageTransformerBlock, which is used for noise_refiner, context_refiner, and layers. As a consequence of this, the inputs recorded for the block would vary during compilation and full compilation with fullgraph=True would trigger recompilation at least thrice.
Some of the group offloading tests were skipped because of states that interfered in between the tests (as also noted here).
x_pad_token and cap_pad_token params within the DiT are initialized with torch.empty(), possibly for memory efficiency, but they interfere during test in very weird ways. This is because torch.empty() can render NaNs. To prevent this from creeping into the tests, I tried adding is_flaky() to some of the tests that got affected by this, but that didn't help (see this). @JerryWu-code, would it be safe to get x_pad_token and cap_pad_token initialized deterministically, maybe with something like torch.ones()? Or do you think it would have memory implications?

Minor nits

We usually avoid raw assert statements inside the model implementations in favor of properly raising errors. Should we follow something similar here, too?
There is a self.scheduler.sigma_min = 0.0 inside the Z-Image pipeline:

diffusers/src/diffusers/pipelines/z_image/pipeline_z_image.py

Line 477 in 1b91856

self.scheduler.sigma_min = 0.0

. Maybe I am missing out on something but that seems like an antipattern to me.
The signature of forward() of the DiT has shorthand variable names: x, t, cap_feats, unlike hidden_states, timestep, and encoder_hidden_states.
Should _cfg_normalization and _cfg_truncation inside the pipeline be turned into properties like guidance_scale?

Maybe we could consider revisiting them (but not a priority perhaps).

Cc: @JerryWu-code

sayakpaul · 2025-11-28T14:00:59Z

src/diffusers/models/transformers/transformer_z_image.py

-        return x, {}
+        if not return_dict:
+            return (x,)
+
+        return Transformer2DModelOutput(sample=x)


Should be a very safe change?

yes indeed, is this the only actual change in this file? (others seems just format changes)

HuggingFaceDocBuilderDev · 2025-11-28T14:08:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2025-11-28T14:58:43Z

Failures in "Fast tests for PRs / Fast PyTorch Models & Schedulers CPU tests (pull_request)" pass even when run with CUDA_VISIBLE_DEVICES="" pytest tests/models/transformers/test_models_transformer_z_image.py.

Edit: it likely fails when CUDA_VISIBLE_DEVICES="" pytest tests/models/ is run.

This reverts commit bca3e27.

yiyixuxu

thanks!

dg845 · 2025-12-02T04:58:14Z

Not blocking, but I think we should think about how to best support and test the List[torch.Tensor] hidden_states pattern going forward.

My current understanding is that the motivation for using a List[torch.Tensor] rather than a single batched torch.Tensor is to support ragged tensors. This makes it easier to support a batch of hidden_states that would naturally have different shapes (for example, images of different resolutions) without needing extra memory and logic to pad them to the same shape. I'm not sure I fully understand the use cases and advantages/disadvantages for this pattern, so I would greatly appreciate it if someone could shed some more light on it. In particular, what is seen as the main reason(s) to use this pattern for current models?

I could see several support strategies here:

Insist that all model inputs are single batched tensors.
Allow lists of tensors, and try to support them as much as possible within the existing model tests (e.g. ModelTesterMixin).
Create a separate test suite to handle models with List[torch.Tensor] inputs (e.g. a ragged tensor version of ModelTesterMixin).

My current personal preference is for (3), as it allows us to tailor the tests for actual List[torch.Tensor] use cases, (1) seems unnecessarily restrictive, and (2) seems like it might struggle to handle truly ragged inputs, and I'm wary that it might make the test suite more complicated and result in a lot of test skipping for tests which it is hard to support lists of tensors in the current implementation. A downside of (3) is that it likely requires the most work of the three.

It seems like PyTorch will probably eventually support ragged tensors with torch.nested. It's not obvious to me which solution is most "future-proof", but that may be a consideration as well.

dg845 · 2025-12-02T05:08:38Z

tests/models/transformers/test_models_transformer_z_image.py

+        super().test_group_offloading_with_disk()
+
+
+class Flux2TransformerCompileTests(TorchCompileTesterMixin, unittest.TestCase):


Suggested change

class Flux2TransformerCompileTests(TorchCompileTesterMixin, unittest.TestCase):

class ZImageTransformerCompileTests(TorchCompileTesterMixin, unittest.TestCase):

nit: naming

dg845 · 2025-12-02T05:20:55Z

tests/models/test_modeling_common.py

+        elif self.model_class.__name__ == "ZImageTransformer2DModel":
+            recompile_limit = 3


Not blocking, but I think it makes more sense to refactor this test to have a recompile_limit argument:

def test_torch_compile_repeated_blocks(self, recompile_limit: int = 1): ...

and then override the test as follows:

def test_torch_compile_repeated_blocks(self): super().test_torch_compile_repeated_blocks(recompile_limit=3)

IMO it's more clear this way that the Z-Image model is using special testing logic.

Or perhaps making recompile_limit a class attribute might make sense, since it seems like we could reuse it in test_torch_compile_recompilation_and_graph_break. Unless I'm missing something, it seems like if self.recompilation_limit > 1, test_torch_compile_recompilation_and_graph_break should always fail due to

diffusers/tests/models/test_modeling_common.py

Line 2070 in edf36f5

torch._dynamo.config.patch(error_on_recompile=True),

dg845 · 2025-12-02T05:25:50Z

tests/models/transformers/test_models_transformer_z_image.py

+    def prepare_dummy_input(self, height, width):
+        return ZImageTransformerTests().prepare_dummy_input(height=height, width=width)
+
+    @unittest.skip("Fullgraph is broken")


Should the skip reason here reflect this?

The repeated block in this model is ZImageTransformerBlock, which is used for noise_refiner, context_refiner, and layers. As a consequence of this, the inputs recorded for the block would vary during compilation and full compilation with fullgraph=True would trigger recompilation at least thrice.

Or is there another reason why we would expect fullgraph=True to fail?

sayakpaul added 5 commits November 28, 2025 18:43

start zimage model tests.

12608de

up

6d47d10

up

1b0888c

up

7c47ae0

up

d54bd6c

sayakpaul requested review from dg845 and yiyixuxu November 28, 2025 13:59

sayakpaul commented Nov 28, 2025

View reviewed changes

sayakpaul added 10 commits November 28, 2025 20:35

up

9b0028a

up

a74a8f7

up

c137ae1

Merge branch 'main' into z-image-tests

66b6922

up

2c367f8

up

76dbf63

up

8ee24fc

up

a11cdd2

up

bca3e27

Revert "up"

52c6d2f

This reverts commit bca3e27.

sayakpaul mentioned this pull request Dec 1, 2025

Add ZImage LoRA support and integrate into ZImagePipeline #12750

Merged

5 tasks

yiyixuxu approved these changes Dec 2, 2025

View reviewed changes

dg845 reviewed Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Z-Image] various small changes, Z-Image transformer tests, etc. #12741

[Z-Image] various small changes, Z-Image transformer tests, etc. #12741

sayakpaul commented Nov 28, 2025 •

edited

Loading

Uh oh!

sayakpaul Nov 28, 2025

Uh oh!

yiyixuxu Dec 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 28, 2025

Uh oh!

sayakpaul commented Nov 28, 2025 •

edited

Loading

Uh oh!

yiyixuxu left a comment

Uh oh!

dg845 commented Dec 2, 2025

Uh oh!

dg845 Dec 2, 2025

Uh oh!

dg845 Dec 2, 2025

Uh oh!

dg845 Dec 2, 2025

Uh oh!

dg845 Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		super().test_group_offloading_with_disk()


		class Flux2TransformerCompileTests(TorchCompileTesterMixin, unittest.TestCase):

	class Flux2TransformerCompileTests(TorchCompileTesterMixin, unittest.TestCase):
	class ZImageTransformerCompileTests(TorchCompileTesterMixin, unittest.TestCase):

		elif self.model_class.__name__ == "ZImageTransformer2DModel":
		recompile_limit = 3

[Z-Image] various small changes, Z-Image transformer tests, etc. #12741

Are you sure you want to change the base?

[Z-Image] various small changes, Z-Image transformer tests, etc. #12741

Conversation

sayakpaul commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Notes

Minor nits

Uh oh!

sayakpaul Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 28, 2025

Uh oh!

sayakpaul commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

dg845 commented Dec 2, 2025

Uh oh!

dg845 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

dg845 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

dg845 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

dg845 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sayakpaul commented Nov 28, 2025 •

edited

Loading

sayakpaul commented Nov 28, 2025 •

edited

Loading