Dynamic VRAM fixes - Ace 1.5 performance + a VRAM leak by rattus128 · Pull Request #12368 · Comfy-Org/ComfyUI

rattus128 · 2026-02-09T07:47:28Z

Effectively fully load non-comfy weights by using a new Aimdo lower-watermark feature which allows the non-comfy caster to skip the deep copy and unpin extra steps. On top of that, there is a significant speedup just not doing aimdo_to_tensor() in the critical path, as this only needs to be done once.

My performance gains are only very moderate on my setup however it does close the gap to non dynamic_vram for my hardware.

Bump to the new aimdo version to pick up the feature.

Fix a VRAM leak from community testing this morning. Thanks to TK3R from Discord for the joint debug session.

Example test conditions - Ace step:

Ace Step 1.5 AIO. RTX 5090 Linux --fast dynamic_vram

Before:

got prompt
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
VAE load device: cuda:0, offload device: cpu, dtype: torch.float16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ACE15TEModel_
Model ACE15TEModel_ prepared for dynamic VRAM loading. 4673MB Staged. 0 patches attached.
Requested to load ACEStep15
Model ACEStep15 prepared for dynamic VRAM loading. 4565MB Staged. 0 patches attached.
100%|██████████| 8/8 [00:00<00:00, 12.44it/s]                                   
Requested to load AudioOobleckVAE
loaded completely;  321.70 MB loaded, full load: True
Prompt executed in 10.60 seconds
got prompt
Model ACE15TEModel_ prepared for dynamic VRAM loading. 4673MB Staged. 0 patches attached.
Model ACEStep15 prepared for dynamic VRAM loading. 4565MB Staged. 0 patches attached.
100%|██████████| 8/8 [00:00<00:00, 14.05it/s]                                   
Prompt executed in 9.29 seconds
got prompt
Model ACE15TEModel_ prepared for dynamic VRAM loading. 4673MB Staged. 0 patches attached.
Model ACEStep15 prepared for dynamic VRAM loading. 4565MB Staged. 0 patches attached.
100%|██████████| 8/8 [00:00<00:00, 14.04it/s]                                   
Prompt executed in 9.35 seconds

After

To see the GUI go to: http://0.0.0.0:8188
got prompt
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
VAE load device: cuda:0, offload device: cpu, dtype: torch.float16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ACE15TEModel_
Model ACE15TEModel_ prepared for dynamic VRAM loading. 4673MB Staged. 0 patches attached.
Requested to load ACEStep15
Model ACEStep15 prepared for dynamic VRAM loading. 4565MB Staged. 0 patches attached.
100%|██████████| 8/8 [00:00<00:00, 12.47it/s]                                   
Requested to load AudioOobleckVAE
loaded completely;  321.70 MB loaded, full load: True
Prompt executed in 10.39 seconds
got prompt
Model ACE15TEModel_ prepared for dynamic VRAM loading. 4673MB Staged. 0 patches attached.
Model ACEStep15 prepared for dynamic VRAM loading. 4565MB Staged. 0 patches attached.
100%|██████████| 8/8 [00:00<00:00, 14.07it/s]                                   
Prompt executed in 8.95 seconds
got prompt
Model ACE15TEModel_ prepared for dynamic VRAM loading. 4673MB Staged. 0 patches attached.
Model ACEStep15 prepared for dynamic VRAM loading. 4565MB Staged. 0 patches attached.
100%|██████████| 8/8 [00:00<00:00, 14.05it/s]                                   
Prompt executed in 8.89 seconds

No dynamic VRAM:

got prompt
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
VAE load device: cuda:0, offload device: cpu, dtype: torch.float16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ACE15TEModel_
loaded completely; 30235.42 MB usable, 4673.04 MB loaded, full load: True
Requested to load ACEStep15
loaded completely; 25390.26 MB usable, 4565.35 MB loaded, full load: True

[rgthree-comfy] Loaded 48 magnificent nodes. 🎉

OpenCV not installed

Initializing ControlAltAI Nodes
100%|██████████| 8/8 [00:00<00:00, 14.27it/s]
Requested to load AudioOobleckVAE
loaded completely;  321.70 MB loaded, full load: True
Prompt executed in 10.76 seconds
got prompt
100%|██████████| 8/8 [00:00<00:00, 15.41it/s]
Prompt executed in 8.93 seconds
got prompt
100%|██████████| 8/8 [00:00<00:00, 15.39it/s]
Prompt executed in 8.98 seconds
got prompt
100%|██████████| 8/8 [00:00<00:00, 15.38it/s]
Prompt executed in 8.94 seconds

VRAM Leak Example test conditions:

WAN 2.2 14B GGUF Q8 low noise into FP16 high noise. RTX 5090 Linux --fast dynamic_vram

Before:

^^ That dip is inference VRAM release from GGUF low noise then it doesn't unload the model. High noise then dynamically offloads (the pins are the climbing RAM usage).

Requested to load WAN21
loaded completely; 22504.73 MB usable, 14825.46 MB loaded, full load: True
100%|██████████| 2/2 [00:19<00:00,  9.84s/it]                                   
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27252MB Staged. 400 patches attached.
100%|██████████| 2/2 [00:21<00:00, 10.59s/it]                                   
Model WanVAE prepared for dynamic VRAM loading. 484MB Staged. 0 patches attached.
Prompt executed in 85.25 seconds

After:

^^ The dip is the GGUF model getting properly released from VRAM. Then Fully loaded high noise on the 5090 as expected.

Requested to load WAN21
100%|██████████| 2/2 [00:19<00:00,  9.78s/it]                                   
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Model WAN21 prepared for dynamic VRAM loading. 27252MB Staged. 400 patches attached.
100%|██████████| 2/2 [00:19<00:00,  9.89s/it]                                   
Model WanVAE prepared for dynamic VRAM loading. 484MB Staged. 0 patches attached.
Prompt executed in 75.30 seconds

This change was only needed to get around the pytorch 2.7 mempool bugs, and should have been reverted along with Comfy-Org#12260. This fixes a different memory leak where pytorch gets confused about cache emptying.

Apparently this is an expensive operation that slows down things.

New features: watermark limit feature logging enhancements -O2 build on linux

* revert threaded model loader change This change was only needed to get around the pytorch 2.7 mempool bugs, and should have been reverted along with Comfy-Org#12260. This fixes a different memory leak where pytorch gets confused about cache emptying. * load non comfy weights * MPDynamic: Pre-generate the tensors for vbars Apparently this is an expensive operation that slows down things. * bump to aimdo 1.8 New features: watermark limit feature logging enhancements -O2 build on linux

rattus128 requested review from Kosinkadink, comfyanonymous and guill as code owners February 9, 2026 07:47

rattus128 added 4 commits February 9, 2026 17:50

revert threaded model loader change

6198f75

This change was only needed to get around the pytorch 2.7 mempool bugs, and should have been reverted along with Comfy-Org#12260. This fixes a different memory leak where pytorch gets confused about cache emptying.

load non comfy weights

959b301

MPDynamic: Pre-generate the tensors for vbars

12028af

Apparently this is an expensive operation that slows down things.

bump to aimdo 1.8

7346af0

New features: watermark limit feature logging enhancements -O2 build on linux

rattus128 force-pushed the prs/dynamic-vram-fixes/ace-llm-perf branch from 4ead8ae to 7346af0 Compare February 9, 2026 07:50

comfyanonymous merged commit 62315fb into Comfy-Org:master Feb 9, 2026
12 checks passed

ArtemKo7v mentioned this pull request Feb 23, 2026

Memory Management Regression in ComfyUI (Update 13) #12541

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic VRAM fixes - Ace 1.5 performance + a VRAM leak#12368

Dynamic VRAM fixes - Ace 1.5 performance + a VRAM leak#12368
comfyanonymous merged 4 commits intoComfy-Org:masterfrom
rattus128:prs/dynamic-vram-fixes/ace-llm-perf

rattus128 commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rattus128 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rattus128 commented Feb 9, 2026 •

edited

Loading