Unload to mmap when the CPU mem is low #11289

strint · 2025-12-12T10:00:21Z

When CPU memory is limited, unloading a GPU tensor back to CPU RAM may trigger OOM and crash the ComfyUI process.

This PR adds support for unloading models to an MMAP-backed disk file instead of CPU memory. By offloading weights to disk, it prevents CPU OOM conditions and avoids ComfyUI crashes during model unloading.

MMAP-backed tensors can be moved to GPU using the standard to("cuda") operation, making them straightforward to reload.

Usage

python main.py --offload-reserve-ram-gb 5

If the available CPU memory is below this threshold (e.g., 5 GB), ComfyUI will offload model weights to an MMAP (disk-based) file rather than CPU RAM during unloading, thereby avoiding CPU memory exhaustion.

…refine_offload

* allow offload quant * rm cuda * refine and pass test

asagi4 · 2025-12-12T11:45:06Z

Would it not be better to just drop the memory and reload tensors from the model file on disk that you already have? This looks like a manual implementation of a swap file; I don't think it's really helpful to write things to the disk. If you're going to do that, it will be easier and probably more efficient to just add swap space.

I would like it if ComfyUI had the ability to drop tensors from memory when RAM runs low and reload them on demand, but if it involves disk writes, it's not going to be more efficient than OS swapping.

I guess if you could load a checkpoint using mmap in a way that allows you to "return" mmaped memory to the disk (optimally letting the OS decide if it's actually in memory or on disk), that would solve the problem pretty neatly.

rattus128 · 2025-12-12T12:23:48Z

Agree. This is the same behaviour as swap. I actually have some work-in-progress for what @asagi4 describes by just going back to the model file in this scenario.

jovan2009 · 2025-12-12T12:35:04Z

Agree. This is the same behaviour as swap. I actually have some work-in-progress for what @asagi4 describes by just going back to the model file in this scenario.

If I may interject. There is also the scenario in which the model safetensors are on a disk slower than the disk where the swap file is. For example I keep the swap file on an NVMe SSD while the model is on a SATA SSD. I prefer to reload the data from the swap file (NVMe speed) than from the model (SATA speed). Not to mention that keeping all the data in "RAM" (meaning actual RAM + swap) I think allows Windows to use memory compression on what it can apply it (for example my compressed area reaches about 17+ GB in wan 2.2 workflows).

rattus128 · 2025-12-12T12:58:53Z

Agree. This is the same behaviour as swap. I actually have some work-in-progress for what @asagi4 describes by just going back to the model file in this scenario.

If I may interject. There is also the scenario in which the model safetensors are on a disk slower than the disk where the swap file is. For example I keep the swap file on an NVMe SSD while the model is on a SATA SSD. I prefer to reload the data from the swap file (NVMe speed) than from the model (SATA speed). Not to mention that keeping all the data in "RAM" (meaning actual RAM + swap) I think allows Windows to use memory compression on what it can apply it (for example my compressed area reaches about 17+ GB in wan 2.2 workflows).

Yeah i've thought about this case too. I think this setup is however the exception rather than the rule. The majority of users will have either no swap or a same-disk swap and have a lot to gain by just ditching that write-on-unload completely. Note that even if this change was made your use case can still be handled by configuring your NVME as a vanilla disk cache on top of your model library.

jovan2009 · 2025-12-12T14:03:59Z

Note that even if this change was made your use case can still be handled by configuring your NVME as a vanilla disk cache on top of your model library.

This is something I'm unaware how can be done. Can it be done now or it will be a starting argument in the future?

Anyway, even if I can assign a ComfyUI model cache to the fastest drive it will still be a cache separated from windows swap. Maybe I'm wrong or stubborn but I would like to still have the option to let windows manage its "continuous" RAM + swap space if the future modifications don't prove themselves being beneficial for my case. I bought the NVME for the single purpose to make it some sort of "RAM expansion", it was the best I could think of after I maxed out what my motherboard can accommodate (64 GB RAM).

RandomGitUser321 · 2025-12-12T14:52:47Z

Just use --cache-ram x.x. It defaults to 4.0 I think. Let's say you have 32gb ram, 4.0 would mean if offloading were going to cause it to exceed 28gb, it will dump cached things with a priority and then you'll just reload them if they are needed again. What you're trying to implement, like others have stated, is essentially yet another page file. This means more wear on SSDs and would likely be much slower than just reloading a model again and that's counting it doing all the initial processing of loading the model weights.

I bought the NVME for the single purpose to make it some sort of "RAM expansion"

And you'll see your drive's health rapidly decline when you're writing 10s of gigabytes constantly.

strint · 2025-12-15T10:05:34Z

Would it not be better to just drop the memory and reload tensors from the model file on disk that you already have?

ComfyUI’s unload and load operations work on the parameters and buffers of an nn.Module, and typically only a subset of these tensors is unloaded. Releasing and restoring selected tensors from disk is complex and affects many parts of the existing unload/load implementation.

In contrast, a tensor backed by an mmap storage behaves like a regular CPU tensor and can be moved to CPU or CUDA easily using the standard Tensor.to operation. This simplicity is the primary reason for using mmap-backed tensors.

This looks like a manual implementation of a swap file; I don't think it's really helpful to write things to the disk. If you're going to do that, it will be easier and probably more efficient to just add swap space.

You are right that both mmap and swap operate on VM pages, but swap is managed by the OS and only handles anonymous memory.

In ComfyUI, most model weights are file-backed or explicitly managed tensors. By unloading them manually into a dedicated mmap file, we can control what gets evicted and when, instead of relying on global OS heuristics across all processes.

This becomes important because CPU OOM often crashes the entire ComfyUI process. The mmap offload provides a predictable, controlled way to prevent such crashes, which swap alone may not reliably avoid in these scenarios.

When CPU RAM approaches its limit, a GPU tensor will be offloaded to a dedicated mmap file on disk instead of causing an OOM in CPU or GPU memory.

@asagi4 @rattus128

strint · 2025-12-15T10:10:34Z

Agree. This is the same behaviour as swap. I actually have some work-in-progress for what @asagi4 describes by just going back to the model file in this scenario.

Going back to the original model file is indeed the most straightforward approach for full unloads.

However, when only part of the model’s parameters or buffers need to be unloaded, selectively loading them back from model file will become complicated?

strint added 30 commits October 16, 2025 16:45

debug error

6e33ee3

debug offload

fa19dd4

add detail debug

f40e00c

add debug log

2b22296

add debug log

c1eac55

add log

9352987

rm useless log

a207301

rm useless log

71b23d1

refine log

e5ff6a1

add debug log of cpu load

5c3c6c0

debug load mem

6583cc0

load remains mmap

49597bf

debug free mem

21ebcad

unload partial

4ac827d

add mmap tensor

e9e1d2f

fix log

4956178

fix to

8aeebbf

refact mmap

05c2518

refine code

2f0d566

refine code

2d010f5

fix format

fff56de

use native mmap

08e094e

lazy rm file

8038393

add env

98ba311

Merge branch 'master' of https://github.com/siliconflow/ComfyUI into …

f3c673d

…refine_offload

fix MMAP_MEM_THRESHOLD_GB default

aab0e24

no limit for offload size

58d28ed

refine log

c312733

better partial unload

dc7c77e

debug mmap

5c5fbdd

doombeaker and others added 7 commits November 26, 2025 16:58

Merge branch 'master' into refine_offload

d28093f

Merge branch 'master' into refine_offload

96c7f18

try fix flux2 (#9)

7733d51

Merge branch 'master' into refine_offload

211fa31

allow offload quant (#10)

1122cd0

* allow offload quant * rm cuda * refine and pass test

rm comment

532eb01

rm debug log

2c5b9da

strint requested review from Kosinkadink, comfyanonymous and guill as code owners December 12, 2025 10:00

rm useless

5495b55

strint marked this pull request as draft December 15, 2025 06:58

refine

fa674cc

strint marked this pull request as ready for review December 15, 2025 10:48

Merge branch 'master' into offload_to_mmap

8b433f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unload to mmap when the CPU mem is low #11289

Unload to mmap when the CPU mem is low #11289

strint commented Dec 12, 2025 •

edited

Loading

Uh oh!

asagi4 commented Dec 12, 2025 •

edited

Loading

Uh oh!

rattus128 commented Dec 12, 2025

Uh oh!

jovan2009 commented Dec 12, 2025 •

edited

Loading

Uh oh!

rattus128 commented Dec 12, 2025 •

edited

Loading

Uh oh!

jovan2009 commented Dec 12, 2025

Uh oh!

RandomGitUser321 commented Dec 12, 2025 •

edited

Loading

Uh oh!

strint commented Dec 15, 2025

Uh oh!

strint commented Dec 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Unload to mmap when the CPU mem is low #11289

Are you sure you want to change the base?

Unload to mmap when the CPU mem is low #11289

Conversation

strint commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Uh oh!

asagi4 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Dec 12, 2025

Uh oh!

jovan2009 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jovan2009 commented Dec 12, 2025

Uh oh!

RandomGitUser321 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strint commented Dec 15, 2025

Uh oh!

strint commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

strint commented Dec 12, 2025 •

edited

Loading

asagi4 commented Dec 12, 2025 •

edited

Loading

jovan2009 commented Dec 12, 2025 •

edited

Loading

rattus128 commented Dec 12, 2025 •

edited

Loading

RandomGitUser321 commented Dec 12, 2025 •

edited

Loading

strint commented Dec 15, 2025 •

edited

Loading