Fix VRAM OOM for model upscaling flows #10684

rattus128 · 2025-11-08T07:36:13Z

This will fix a particular VRAM oom for a range of workflows but in particular flows re-using a model for upscale.

See below for root cause and fix.

Example test case:
upscale_oom.json
WAN 128x128x181f > x8 upscale (1024x1024x181f) > Same WAN model
RTX5090

To see the GUI go to: http://0.0.0.0:8188
To see the GUI go to: http://[::]:8188
got prompt
Using scaled fp8: fp8 matrix mult: True, scale input: True
model weight dtype torch.float16, manual cast: None
model_type FLOW
Using scaled fp8: fp8 matrix mult: False, scale input: False
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 30235.05 MB usable, 6419.48 MB loaded, full load: True
Requested to load WAN21
loaded completely; 23645.45 MB usable, 13629.08 MB loaded, full load: True
100%|██████████| 1/1 [00:00<00:00,  9.40it/s]
0 models unloaded.
  0%|          | 0/1 [00:00<?, ?it/s]
!!! Exception during processing !!! Allocation on device 
Traceback (most recent call last):
  File "/home/rattus/ComfyUI/execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ...
  File "/home/rattus/ComfyUI/comfy/ldm/wan/model.py", line 78, in forward
    q = qkv_fn_q(x)
        ^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ldm/wan/model.py", line 69, in qkv_fn_q
    return apply_rope1(q, freqs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ldm/flux/math.py", line 33, in apply_rope1
    x_out = freqs_cis[..., 0] * x_[..., 0]
            ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
torch.OutOfMemoryError: Allocation on device 

Got an OOM, unloading all loaded models.
Prompt executed in 10.47 seconds

With this fix:

To see the GUI go to: http://0.0.0.0:8188
To see the GUI go to: http://[::]:8188
[DEPRECATION WARNING] Detected import of deprecated legacy API: /scripts/ui.js. This is likely caused by a custom node extension using outdated APIs. Please update your extensions or contact the extension author for an updated version.
[DEPRECATION WARNING] Detected import of deprecated legacy API: /extensions/core/groupNode.js. This is likely caused by a custom node extension using outdated APIs. Please update your extensions or contact the extension author for an updated version.
[DEPRECATION WARNING] Detected import of deprecated legacy API: /scripts/ui/components/button.js. This is likely caused by a custom node extension using outdated APIs. Please update your extensions or contact the extension author for an updated version.
[DEPRECATION WARNING] Detected import of deprecated legacy API: /scripts/ui/components/buttonGroup.js. This is likely caused by a custom node extension using outdated APIs. Please update your extensions or contact the extension author for an updated version.
got prompt
Using scaled fp8: fp8 matrix mult: True, scale input: True
model weight dtype torch.float16, manual cast: None
model_type FLOW
Using scaled fp8: fp8 matrix mult: False, scale input: False
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 30109.99 MB usable, 6419.48 MB loaded, full load: True
Requested to load WAN21
loaded completely; 23520.39 MB usable, 13629.08 MB loaded, full load: True
100%|██████████| 1/1 [00:00<00:00,  1.56it/s]
Unloading WanTEModel
1 idle models unloaded.
Unloading WAN21
1 active models unloaded for increased offloading.
loaded partially; 128.00 MB usable, 124.27 MB loaded, 13504.81 MB offloaded, lowvram patches: 0
100%|██████████| 1/1 [05:00<00:00, 300.66s/it]
Prompt executed in 314.41 seconds

git commit message

In some workflows, its possible for a model to be used twice but with different requirements for the inference VRAM.

Currently, once a model is loaded at a certain level of offload, it will be preserved at that level of offload if it is used again. This will OOM if there is a major change in the size of the inference VRAM. This happens in your classic latent upscaling workflow where the same model is used twice to generate and upscale.

This is very noticable for WAN in particlar.

Fix by two-passing the model VRAM unload process, firstly trying with the existing list on idle models and then try again adding the actual models that are about to be loaded. This will implement the partial offload you need of your hot-in-VRAM model to make space for the bigger inference.

Improve info messages regarding any unloads done.

In some workflows, its possible for a model to be used twice but with different requirements for the inference VRAM. Currently, once a model is loaded at a certain level of offload, it will be preserved at that level of offload if it is used again. This will OOM if there is a major change in the size of the inference VRAM. This happens in your classic latent upscaling workflow where the same model is used twice to generate and upscale. This is very noticable for WAN in particlar. Fix by two-passing the model VRAM unload process, firstly trying with the existing list on idle models and then try again adding the actual models that are about to be loaded. This will implement the partial offload you need of your hot-in-VRAM model to make space for the bigger inference. Improve info messages regarding any unloads done.

rattus128 · 2025-11-08T08:44:04Z

This also reproduced on a flow I was sent here:

city96/ComfyUI-GGUF#357 (comment)

This was a case of model reuse interposing a Lora.

comfyanonymous · 2025-11-08T20:19:55Z

This is slightly incorrect behavior. If you run a workflow where a text encoder that does not fit completely in memory I see it unload it completely between the positive prompt and the negative prompt. What should happen is a small partial unload instead.

rattus128 · 2025-11-08T22:33:08Z

This is slightly incorrect behavior. If you run a workflow where a text encoder that does not fit completely in memory I see it unload it completely between the positive prompt and the negative prompt. What should happen is a small partial unload instead.

I'll take look at this case. Thanks.

comfyanonymous · 2025-11-09T21:03:34Z

I didn't test it that much but this might be a better way: #10690

rattus128 · 2025-11-09T23:48:11Z

I didn't test it that much but this might be a better way: #10690

I tested this and looks good so far. Closing this one.

rattus128 · 2025-11-09T23:48:25Z

^^

rattus128 requested a review from Kosinkadink as a code owner November 8, 2025 07:36

rattus128 marked this pull request as draft November 8, 2025 08:25

rattus128 force-pushed the prs/model-reuse-oom branch from 6468c4c to ca73329 Compare November 8, 2025 08:30

rattus128 marked this pull request as ready for review November 8, 2025 08:42

nestflow mentioned this pull request Nov 9, 2025

Pinned memory causes error with GGUF model #10662

Closed

1 task

rattus128 closed this Nov 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix VRAM OOM for model upscaling flows #10684

Fix VRAM OOM for model upscaling flows #10684

Uh oh!

rattus128 commented Nov 8, 2025 •

edited

Loading

Uh oh!

rattus128 commented Nov 8, 2025

Uh oh!

comfyanonymous commented Nov 8, 2025

Uh oh!

rattus128 commented Nov 8, 2025

Uh oh!

comfyanonymous commented Nov 9, 2025

Uh oh!

rattus128 commented Nov 9, 2025

Uh oh!

rattus128 commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix VRAM OOM for model upscaling flows #10684

Fix VRAM OOM for model upscaling flows #10684

Uh oh!

Conversation

rattus128 commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Nov 8, 2025

Uh oh!

comfyanonymous commented Nov 8, 2025

Uh oh!

rattus128 commented Nov 8, 2025

Uh oh!

comfyanonymous commented Nov 9, 2025

Uh oh!

rattus128 commented Nov 9, 2025

Uh oh!

rattus128 commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rattus128 commented Nov 8, 2025 •

edited

Loading