Dynamic VRAM fixes - Ace 1.5 performance + a VRAM leak#12368
Merged
comfyanonymous merged 4 commits intoComfy-Org:masterfrom Feb 9, 2026
Merged
Dynamic VRAM fixes - Ace 1.5 performance + a VRAM leak#12368comfyanonymous merged 4 commits intoComfy-Org:masterfrom
comfyanonymous merged 4 commits intoComfy-Org:masterfrom
Conversation
This change was only needed to get around the pytorch 2.7 mempool bugs, and should have been reverted along with Comfy-Org#12260. This fixes a different memory leak where pytorch gets confused about cache emptying.
Apparently this is an expensive operation that slows down things.
New features: watermark limit feature logging enhancements -O2 build on linux
4ead8ae to
7346af0
Compare
luna-niemitalo
pushed a commit
to luna-niemitalo/ComfyUI
that referenced
this pull request
Feb 11, 2026
* revert threaded model loader change This change was only needed to get around the pytorch 2.7 mempool bugs, and should have been reverted along with Comfy-Org#12260. This fixes a different memory leak where pytorch gets confused about cache emptying. * load non comfy weights * MPDynamic: Pre-generate the tensors for vbars Apparently this is an expensive operation that slows down things. * bump to aimdo 1.8 New features: watermark limit feature logging enhancements -O2 build on linux
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Effectively fully load non-comfy weights by using a new Aimdo lower-watermark feature which allows the non-comfy caster to skip the deep copy and unpin extra steps. On top of that, there is a significant speedup just not doing aimdo_to_tensor() in the critical path, as this only needs to be done once.
My performance gains are only very moderate on my setup however it does close the gap to non dynamic_vram for my hardware.
Bump to the new aimdo version to pick up the feature.
Fix a VRAM leak from community testing this morning. Thanks to TK3R from Discord for the joint debug session.
Example test conditions - Ace step:
Ace Step 1.5 AIO. RTX 5090 Linux --fast dynamic_vram
Before:
After
No dynamic VRAM:
VRAM Leak Example test conditions:
WAN 2.2 14B GGUF Q8 low noise into FP16 high noise. RTX 5090 Linux --fast dynamic_vram
Before:
^^ That dip is inference VRAM release from GGUF low noise then it doesn't unload the model. High noise then dynamically offloads (the pins are the climbing RAM usage).
After:
^^ The dip is the GGUF model getting properly released from VRAM. Then Fully loaded high noise on the 5090 as expected.