Skip to content

Partially enabled FP16 reasoning in the Qwen Image Models#12522

Open
Icbears wants to merge 2 commits intoComfy-Org:masterfrom
Icbears:master
Open

Partially enabled FP16 reasoning in the Qwen Image Models#12522
Icbears wants to merge 2 commits intoComfy-Org:masterfrom
Icbears:master

Conversation

@Icbears
Copy link

@Icbears Icbears commented Feb 19, 2026

The original reasoning code for Qwen Image models cannot run correctly with float16 precision, which results in completely black outputs:
#10800
#10668
#10751

Because of this limitation, GPUs without native bfloat16 support are forced to run the model in float32, leading to noticeably slower generation.

The attached updated reasoning code enables parts of the attention computation in Qwen Image models to run safely in float16 without producing black images. Specifically, the following components can now use float16: image K, image V, text Q, text V, and the joint cross attention. When bfloat16 is selected, the code automatically falls back to native bfloat16 on supported GPUs.

I tested this improvement on a Tesla V100, and for the Qwen Image Edit 2511 model with 4-steps lora, the second-generation of the image ran in roughly half the time compared with the original implementation. In this test, I edited a 512×512 image. With the original float32‑only code, the second‑generation step took approximately 41 seconds, whereas the proposed float16‑enabled version reduced this to about 23 seconds.
image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant