Skip to content

MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash)#12446

Merged
comfyanonymous merged 2 commits intoComfy-Org:masterfrom
rattus128:prs/dynamic-vram-fixes/flux-img-in
Feb 16, 2026
Merged

MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash)#12446
comfyanonymous merged 2 commits intoComfy-Org:masterfrom
rattus128:prs/dynamic-vram-fixes/flux-img-in

Conversation

@rattus128
Copy link
Contributor

This weight is a bit special, in that the lora changes its geometry. This situation is very unique: its not handled by existing estimates and doesn't work for either offloading or dynamic_vram.

Fix for dynamic_vram as a special case. Ideally we can fully precalculate these lora geometry changes at load time, but just get these models working first.

Example Test Conditions:

RTX5090, linux, --fast dynamic_vram, Flux1 + flux1-depth-dev-lora

image

Before:

  File "/home/rattus/ComfyUI/comfy/ops.py", line 353, in forward_comfy_cast_weights
    weight, bias, offload_stream = cast_bias_weight(self, input, offloadable=True)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ops.py", line 216, in cast_bias_weight
    return cast_bias_weight_with_vbar(s, dtype, device, bias_dtype, non_blocking, compute_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ops.py", line 190, in cast_bias_weight_with_vbar
    weight = post_cast(s, "weight", weight, dtype, resident, update_weight)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ops.py", line 183, in post_cast
    orig.copy_(y)
RuntimeError: The size of tensor a (64) must match the size of tensor b (128) at non-singleton dimension 1

After:

model_type FLUX
Requested to load Flux
Model Flux prepared for dynamic VRAM loading. 22699MB Staged. 626 patches attached.
100%|██████████| 20/20 [00:10<00:00,  1.90it/s]                                  
0 models unloaded.
Model AutoencodingEngine prepared for dynamic VRAM loading. 159MB Staged. 0 patches attached.
Prompt executed in 15.10 seconds

self.partially_unload_ram(1e32)
self.partially_unload(None, 1e32)

keys = list(self.backup.keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you create a list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it came with the copy paste TBH

This lets the loader know if a lora will change the shape of a weight
so it can take appropriate action.
This weight is a bit special, in that the lora changes its geometry.
This is rather unique, not handled by existing estimate and doesn't
work for either offloading or dynamic_vram.

Fix for dynamic_vram as a special case. Ideally we can fully precalculate
these lora geometry changes at load time, but just get these models
working first.
@rattus128 rattus128 force-pushed the prs/dynamic-vram-fixes/flux-img-in branch from 0c75f1b to 017d0b4 Compare February 15, 2026 13:08
@comfyanonymous comfyanonymous merged commit c037004 into Comfy-Org:master Feb 16, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants