Fix safetensor cause python crash on some windows platform#12611
Fix safetensor cause python crash on some windows platform#12611roj234 wants to merge 3 commits intoComfy-Org:masterfrom
Conversation
📝 WalkthroughWalkthroughAdded CLI flag 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
comfy/utils.py (1)
116-123:⚠️ Potential issue | 🟠 Major
DISABLE_MMAPstill uses mmap internally and ignores thedeviceparameter.When
DISABLE_MMAPis set,load_safetensors()is called, which explicitly usesmmap.mmap()to read the file and creates CPU tensors viatorch.frombuffer(). This means the flag contradicts its intent—it doesn't actually disable mmap—and also bypasses thedeviceparameter entirely. The else branch respectsdevice=device.type, so tensors loaded withDISABLE_MMAPstay on CPU and require an extra move to reach the target device. Consider either implementing a true non-mmap path forDISABLE_MMAPthat respects device placement, or document that this flag has different semantics than its name suggests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/utils.py` around lines 116 - 123, DISABLE_MMAP currently calls load_safetensors (which uses mmap and creates CPU tensors) and ignores the requested device; update the DISABLE_MMAP branch so it either (A) implements a true non-mmap read path that loads tensors directly onto the requested device (respecting the device parameter) or (B) clearly documents the differing semantics and moves tensors to device after load — modify the DISABLE_MMAP branch around load_safetensors(ckpt) in comfy/utils.py to avoid mmap-backed reads and to honor device (or add an explicit torch.tensor(..., device=device) placement step) and ensure return_metadata handling remains correct when return_metadata is False; reference DISABLE_MMAP, load_safetensors, safetensors.safe_open, device, and return_metadata when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@comfy/utils.py`:
- Around line 116-123: DISABLE_MMAP currently calls load_safetensors (which uses
mmap and creates CPU tensors) and ignores the requested device; update the
DISABLE_MMAP branch so it either (A) implements a true non-mmap read path that
loads tensors directly onto the requested device (respecting the device
parameter) or (B) clearly documents the differing semantics and moves tensors to
device after load — modify the DISABLE_MMAP branch around load_safetensors(ckpt)
in comfy/utils.py to avoid mmap-backed reads and to honor device (or add an
explicit torch.tensor(..., device=device) placement step) and ensure
return_metadata handling remains correct when return_metadata is False;
reference DISABLE_MMAP, load_safetensors, safetensors.safe_open, device, and
return_metadata when making the change.
|
This problem persists in safetensors=0.7.0. By the way, describe the 'crash' behavior: The Python process terminates abruptly without a Python-level backtrace. It triggers the Windows WerFault.exe handler and (often) generates a python.exe.dmp file. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
comfy/utils.py (1)
99-107:⚠️ Potential issue | 🟠 MajorEmpty tensors are never moved to the target device — device mismatch.
The
start == endbranch (line 99-100) creates a CPU tensor unconditionally and has no corresponding.to(device). Whendeviceis non-CPU (e.g.,cuda), non-empty tensors are moved to the device while empty tensors stay on CPU, producing a mixed-device state dict that will fault at inference time.Additionally, the
sd[name].to(device)call (line 107) is nested insidewarnings.catch_warnings(), whose sole purpose is suppressing thetorch.frombuffer"not writable" warning. The.to()call doesn't trigger that warning and can be moved outside.🐛 Proposed fix
if start == end: - sd[name] = torch.empty(info["shape"], dtype =_TYPES[info["dtype"]]) + t = torch.empty(info["shape"], dtype=_TYPES[info["dtype"]]) else: with warnings.catch_warnings(): `#We` are working with read-only RAM by design warnings.filterwarnings("ignore", message="The given buffer is not writable") - sd[name] = torch.frombuffer(mv[start:end], dtype=_TYPES[info["dtype"]]).view(info["shape"]) - if (device != 'cpu' if isinstance(device, str) else device.type != 'cpu'): - sd[name] = sd[name].to(device) + t = torch.frombuffer(mv[start:end], dtype=_TYPES[info["dtype"]]).view(info["shape"]) + if (device != 'cpu' if isinstance(device, str) else device.type != 'cpu'): + t = t.to(device) + sd[name] = t🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/utils.py` around lines 99 - 107, The empty-tensor branch creates CPU tensors with torch.empty and never moves them to the target device, and the sd[name].to(device) call is unnecessarily inside the warnings.catch_warnings block; update the code so that after creating sd[name] in both branches (the torch.empty(...) path and the torch.frombuffer(...) path) you apply the same device move using the existing device-check conditional ((device != 'cpu' if isinstance(device, str) else device.type != 'cpu')) and move the sd[name] = sd[name].to(device) call out of the warnings.catch_warnings block so both empty and non-empty tensors end up on the correct device.
🧹 Nitpick comments (1)
comfy/utils.py (1)
85-85:load_safetensorsuses Python'smmap.mmapinternally — may not solve all Windows crash scenarios.The condition change (line 119) bypasses
safetensors.safe_open's Rust mmap, butload_safetensorsitself callsmmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)(line 85) using Python's own mmap module. For the CPU device path (the default), returned tensors remain mmap-backed via this Python mmap and are not cloned to regular memory.The fix works for the specific Windows crash reported, and the TODO comment already flags the approach's uncertainty. However, Python's mmap on Windows also lacks structured exception handling for certain storage configurations and may crash on unusual hardware setups.
For guaranteed mmap-free loading on fragile systems (per issue
#10896), the CPU device branch would needsd[name].clone()aftertorch.frombufferto fully detach from the memory mapping.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/utils.py` at line 85, load_safetensors currently uses Python's mmap (mapping = mmap.mmap(...)) and returns mmap-backed tensors when using the CPU device path; to guarantee mmap-free tensors on fragile Windows setups, after creating tensors from the buffer (via torch.frombuffer) in the CPU branch, explicitly detach by calling .clone() on each tensor (e.g., sd[name] = sd[name].clone()) so returned tensors are not memory-mapped; update the CPU device path logic in load_safetensors to clone tensors after torch.frombuffer creation to fully remove mmap backing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@comfy/utils.py`:
- Around line 99-107: The empty-tensor branch creates CPU tensors with
torch.empty and never moves them to the target device, and the
sd[name].to(device) call is unnecessarily inside the warnings.catch_warnings
block; update the code so that after creating sd[name] in both branches (the
torch.empty(...) path and the torch.frombuffer(...) path) you apply the same
device move using the existing device-check conditional ((device != 'cpu' if
isinstance(device, str) else device.type != 'cpu')) and move the sd[name] =
sd[name].to(device) call out of the warnings.catch_warnings block so both empty
and non-empty tensors end up on the correct device.
---
Nitpick comments:
In `@comfy/utils.py`:
- Line 85: load_safetensors currently uses Python's mmap (mapping =
mmap.mmap(...)) and returns mmap-backed tensors when using the CPU device path;
to guarantee mmap-free tensors on fragile Windows setups, after creating tensors
from the buffer (via torch.frombuffer) in the CPU branch, explicitly detach by
calling .clone() on each tensor (e.g., sd[name] = sd[name].clone()) so returned
tensors are not memory-mapped; update the CPU device path logic in
load_safetensors to clone tensors after torch.frombuffer creation to fully
remove mmap backing.
| sd, metadata = load_safetensors(ckpt) | ||
| # TODO: Not sure if this is the best way to bypass the mmap issues | ||
| if DISABLE_MMAP or enables_dynamic_vram(): | ||
| sd, metadata = load_safetensors(ckpt, device) |
There was a problem hiding this comment.
this seems a little strange, in that load_safetensors is very deliberately an MMAP. The main difference between load_safetensors and the sft package is the MMAP open flags so what you are doing here is a different kind of mmap.
You need to be careful here in that you are adding a path where non-dynamic is using this zero-copy sft loader which is incompatible with the non-dynamic memory pinner.
Try a workflow with --disable-mmap --novram and im suspicious you will see "Pin error"
There was a problem hiding this comment.
Log: Copy tensor:
got prompt
E:\AI\ComfyUI>
Log: No copy tensor
got prompt
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load LTXAVTEModel_
Pin error. repeat N times
loaded partially; 8902.00 MB usable, 1649.94 MB loaded, 10073.95 MB offloaded, 7252.06 MB buffer reserved, lowvram patches: 0
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Found quantization metadata version 1
Detected mixed precision quantization
Using mixed precision operations
model weight dtype torch.bfloat16, manual cast: torch.bfloat16
model_type FLUX
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
Requested to load LTXAV
Pin error. repeat N times
loaded partially; 9551.67 MB usable, 9463.08 MB loaded, 12428.51 MB offloaded, 112.04 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:33<00:00, 4.18s/it]
Unloaded partially: 976.66 MB freed, 8486.42 MB remains loaded, 112.04 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1508.48 MB freed, 6977.94 MB remains loaded, 368.54 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00, 7.44s/it]
Requested to load AudioVAE
loaded completely; 2571.74 MB usable, 415.20 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 2331.69 MB offloaded, 648.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 95.03 seconds
… alternate MMAP loader is weird)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/cli_args.py`:
- Line 170: Rename the CLI flag and its internal attribute from sft to
safetensors for clarity: change the parser.add_argument("--sft-alt-loader", ...)
to use "--safetensors-alt-loader" and update any uses of args.sft_alt_loader to
args.safetensors_alt_loader (notably the reference in comfy/utils.py where
args.sft_alt_loader is read). Ensure the help text is expanded to something
explicit like "Use alternate loader for safetensors (fixes Windows crashes with
large models)" and update any tests or docs that reference the old flag name.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
comfy/cli_args.pycomfy/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- comfy/utils.py
|
|
||
| parser.add_argument("--mmap-torch-files", action="store_true", help="Use mmap when loading ckpt/pt files.") | ||
| parser.add_argument("--disable-mmap", action="store_true", help="Don't use mmap when loading safetensors.") | ||
| parser.add_argument("--sft-alt-loader", action="store_true", help="Use alternate method to load safetensors.") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "sft_alt_loader|sft-alt-loader|USE_ALT_SFT_LOADER" --type py -A2 -B2Repository: Comfy-Org/ComfyUI
Length of output: 1160
Help text lacks clarity; sft abbreviation is inconsistent with neighboring flags.
The flag --sft-alt-loader with help text "Use alternate method to load safetensors." is too vague—users encountering Windows crashes with large models won't know this is what they need. Additionally, the abbreviated sft is inconsistent with neighboring flags like --disable-mmap and --mmap-torch-files, which use full words.
Renaming to --safetensors-alt-loader is feasible because this is a newly added flag with no existing users. If renamed, update the attribute reference in comfy/utils.py line 40 from args.sft_alt_loader to args.safetensors_alt_loader.
✏️ Suggested improvement
-parser.add_argument("--sft-alt-loader", action="store_true", help="Use alternate method to load safetensors.")
+parser.add_argument("--safetensors-alt-loader", action="store_true", help="Use an alternate (non-mmap) loader for safetensors files. Recommended on Windows when loading large models causes crashes or when --disable-mmap alone is insufficient.")Then in comfy/utils.py line 40:
-USE_ALT_SFT_LOADER = args.sft_alt_loader
+USE_ALT_SFT_LOADER = args.safetensors_alt_loader📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parser.add_argument("--sft-alt-loader", action="store_true", help="Use alternate method to load safetensors.") | |
| parser.add_argument("--safetensors-alt-loader", action="store_true", help="Use an alternate (non-mmap) loader for safetensors files. Recommended on Windows when loading large models causes crashes or when --disable-mmap alone is insufficient.") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy/cli_args.py` at line 170, Rename the CLI flag and its internal
attribute from sft to safetensors for clarity: change the
parser.add_argument("--sft-alt-loader", ...) to use "--safetensors-alt-loader"
and update any uses of args.sft_alt_loader to args.safetensors_alt_loader
(notably the reference in comfy/utils.py where args.sft_alt_loader is read).
Ensure the help text is expanded to something explicit like "Use alternate
loader for safetensors (fixes Windows crashes with large models)" and update any
tests or docs that reference the old flag name.


Problem
Python always crash when loading
ltx-2-19b-distilled-fp8.safetensors, even I have enough (48GB) memory.Then I found
utils.load_torch_fileand enable--disable-mmapoption, but problem persistent.Later, I discovered that python crashed before first tensor loaded (or, before first
f.get_tensorcall return).Solution
By simply call
utils.load_safetensors, I'm able to load and inference without crash.However, without change code, you must sacrificing inference speed by specify
--fast dynamic_vram.So I make this PR, change
enables_dynamic_vram()check toDISABLE_MMAP or enables_dynamic_vram(), and hopefully this will fix #10896Tested with
ltx-2-19b-distilled-fp8.safetensorson Python 3.11, safetensors 0.6.2, Windows 10.--disable-mmap.--disable-mmap.