Skip to content

Conversation

@strint
Copy link

@strint strint commented Dec 12, 2025

When --mmap-torch-files is enabled, ComfyUI loads .ckpt and .pt files using mmap, significantly reducing CPU memory usage during file loading.

However, during load_model_weights in UNetLoader, the state dict is normally copied from the memory-mapped file into standard CPU memory, negating the benefit for model weights.

By using assign=True, the loader reuses the underlying tensor storage directly from the memory-mapped state dict. This avoids unnecessary copies and preserves the memory savings when loading large models via mmap.

With this improvement, ComfyUI can load multiple large models without causing CPU memory OOM.

@rattus128
Copy link
Contributor

This is an awesome potential change for performance and I have had it on the radar for a while.

How does this interact with pinned memory? My understanding is mmaped memory cannot be inplace pinned using the apporach currently taken in pin_memory (cuda_host_register). Does this just error and fallback to no pinning?

This is a disruptive change and the community has actually gone to the effort of explictly offloading mmaps in the past due to poor OS support.

https://github.com/city96/ComfyUI-GGUF/blob/main/nodes.py#L98

This probably should spend some time behind a --fast startup argument for stabilization.

@strint
Copy link
Author

strint commented Dec 15, 2025

How does this interact with pinned memory? My understanding is mmaped memory cannot be inplace pinned using the apporach currently taken in pin_memory (cuda_host_register). Does this just error and fallback to no pinning?

At the diffusion model loading node, it appears that pinned memory is not used. The function load_torch_file loads the model file from disk into a tensor dictionary. Then load_model_weights copies that tensor dictionary into the model’s state dict. Finally, the KSampler node transfers the model state dict to GPU VRAM through load_models_gpu. Based on this flow, it seems that the model parameters never reside in pinned memory.

@strint
Copy link
Author

strint commented Dec 15, 2025

This is a disruptive change and the community has actually gone to the effort of explictly offloading mmaps in the past due to poor OS support.

https://github.com/city96/ComfyUI-GGUF/blob/main/nodes.py#L98

This probably should spend some time behind a --fast startup argument for stabilization.

You are correct. Safetensors loading uses mmap by default, and there is an argument --disable-mmap to turn this off.

Meanwhile, when mmap loading can be enabled, there is no need to load the tensor dictionary into regular CPU memory before the KSample node uses it, which saves a significant amount of CPU RAM.

To make this compatible, I added the following check:

delay_copy_with_assign = utils.MMAP_TORCH_FILES or not utils.DISABLE_MMAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants