Skip to content

Fix safetensor cause python crash on some windows platform#12611

Open
roj234 wants to merge 3 commits intoComfy-Org:masterfrom
roj234:mmap_option
Open

Fix safetensor cause python crash on some windows platform#12611
roj234 wants to merge 3 commits intoComfy-Org:masterfrom
roj234:mmap_option

Conversation

@roj234
Copy link

@roj234 roj234 commented Feb 24, 2026

Problem

Python always crash when loading ltx-2-19b-distilled-fp8.safetensors, even I have enough (48GB) memory.
Then I found utils.load_torch_file and enable --disable-mmap option, but problem persistent.
Later, I discovered that python crashed before first tensor loaded (or, before first f.get_tensor call return).

Solution

By simply call utils.load_safetensors, I'm able to load and inference without crash.
However, without change code, you must sacrificing inference speed by specify --fast dynamic_vram.
So I make this PR, change enables_dynamic_vram() check to DISABLE_MMAP or enables_dynamic_vram(), and hopefully this will fix #10896

Tested with ltx-2-19b-distilled-fp8.safetensors on Python 3.11, safetensors 0.6.2, Windows 10.

  • Before: Crashed consistently even with --disable-mmap.
  • After: Loads and inferences successfully with --disable-mmap.
  • I'm re-downloading weight (as I'm using GGUF) and test with latest safetensors 0.7.0 now

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

Added CLI flag --sft-alt-loader and global USE_ALT_SFT_LOADER. load_safetensors signature changed to accept device and callers (e.g., load_torch_file) now pass that device. Safetensors loading decision now checks USE_ALT_SFT_LOADER (or enables_dynamic_vram() / DISABLE_MMAP) to choose an alternate in-memory loader. When using the alternate loader, tensors are conditionally moved to the target device after reading instead of always performing in-place conversions.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately describes the main objective: fixing a safetensor-related Python crash on Windows platforms, which is the core problem addressed by the PR changes.
Description check ✅ Passed Description is well-related to the changeset, explaining the crash problem, the solution approach, and testing results on Windows with specific safetensors files.
Linked Issues check ✅ Passed Changes implement the core solution from issue #10896: adding DISABLE_MMAP check to prevent double memory allocation by using non-mmap safetensors loader when mmap is disabled, directly addressing the linked issue requirements.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the objectives: adding the --sft-alt-loader flag, modifying load_safetensors signature to accept device parameter, and updating the loader logic to check DISABLE_MMAP. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
comfy/utils.py (1)

116-123: ⚠️ Potential issue | 🟠 Major

DISABLE_MMAP still uses mmap internally and ignores the device parameter.

When DISABLE_MMAP is set, load_safetensors() is called, which explicitly uses mmap.mmap() to read the file and creates CPU tensors via torch.frombuffer(). This means the flag contradicts its intent—it doesn't actually disable mmap—and also bypasses the device parameter entirely. The else branch respects device=device.type, so tensors loaded with DISABLE_MMAP stay on CPU and require an extra move to reach the target device. Consider either implementing a true non-mmap path for DISABLE_MMAP that respects device placement, or document that this flag has different semantics than its name suggests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/utils.py` around lines 116 - 123, DISABLE_MMAP currently calls
load_safetensors (which uses mmap and creates CPU tensors) and ignores the
requested device; update the DISABLE_MMAP branch so it either (A) implements a
true non-mmap read path that loads tensors directly onto the requested device
(respecting the device parameter) or (B) clearly documents the differing
semantics and moves tensors to device after load — modify the DISABLE_MMAP
branch around load_safetensors(ckpt) in comfy/utils.py to avoid mmap-backed
reads and to honor device (or add an explicit torch.tensor(..., device=device)
placement step) and ensure return_metadata handling remains correct when
return_metadata is False; reference DISABLE_MMAP, load_safetensors,
safetensors.safe_open, device, and return_metadata when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@comfy/utils.py`:
- Around line 116-123: DISABLE_MMAP currently calls load_safetensors (which uses
mmap and creates CPU tensors) and ignores the requested device; update the
DISABLE_MMAP branch so it either (A) implements a true non-mmap read path that
loads tensors directly onto the requested device (respecting the device
parameter) or (B) clearly documents the differing semantics and moves tensors to
device after load — modify the DISABLE_MMAP branch around load_safetensors(ckpt)
in comfy/utils.py to avoid mmap-backed reads and to honor device (or add an
explicit torch.tensor(..., device=device) placement step) and ensure
return_metadata handling remains correct when return_metadata is False;
reference DISABLE_MMAP, load_safetensors, safetensors.safe_open, device, and
return_metadata when making the change.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 048dd2f and 6cfe0cf.

📒 Files selected for processing (1)
  • comfy/utils.py

@roj234
Copy link
Author

roj234 commented Feb 24, 2026

This problem persists in safetensors=0.7.0.

By the way, describe the 'crash' behavior: The Python process terminates abruptly without a Python-level backtrace. It triggers the Windows WerFault.exe handler and (often) generates a python.exe.dmp file.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
comfy/utils.py (1)

99-107: ⚠️ Potential issue | 🟠 Major

Empty tensors are never moved to the target device — device mismatch.

The start == end branch (line 99-100) creates a CPU tensor unconditionally and has no corresponding .to(device). When device is non-CPU (e.g., cuda), non-empty tensors are moved to the device while empty tensors stay on CPU, producing a mixed-device state dict that will fault at inference time.

Additionally, the sd[name].to(device) call (line 107) is nested inside warnings.catch_warnings(), whose sole purpose is suppressing the torch.frombuffer "not writable" warning. The .to() call doesn't trigger that warning and can be moved outside.

🐛 Proposed fix
     if start == end:
-        sd[name] = torch.empty(info["shape"], dtype =_TYPES[info["dtype"]])
+        t = torch.empty(info["shape"], dtype=_TYPES[info["dtype"]])
     else:
         with warnings.catch_warnings():
             `#We` are working with read-only RAM by design
             warnings.filterwarnings("ignore", message="The given buffer is not writable")
-            sd[name] = torch.frombuffer(mv[start:end], dtype=_TYPES[info["dtype"]]).view(info["shape"])
-            if (device != 'cpu' if isinstance(device, str) else device.type != 'cpu'):
-                sd[name] = sd[name].to(device)
+            t = torch.frombuffer(mv[start:end], dtype=_TYPES[info["dtype"]]).view(info["shape"])
+    if (device != 'cpu' if isinstance(device, str) else device.type != 'cpu'):
+        t = t.to(device)
+    sd[name] = t
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/utils.py` around lines 99 - 107, The empty-tensor branch creates CPU
tensors with torch.empty and never moves them to the target device, and the
sd[name].to(device) call is unnecessarily inside the warnings.catch_warnings
block; update the code so that after creating sd[name] in both branches (the
torch.empty(...) path and the torch.frombuffer(...) path) you apply the same
device move using the existing device-check conditional ((device != 'cpu' if
isinstance(device, str) else device.type != 'cpu')) and move the sd[name] =
sd[name].to(device) call out of the warnings.catch_warnings block so both empty
and non-empty tensors end up on the correct device.
🧹 Nitpick comments (1)
comfy/utils.py (1)

85-85: load_safetensors uses Python's mmap.mmap internally — may not solve all Windows crash scenarios.

The condition change (line 119) bypasses safetensors.safe_open's Rust mmap, but load_safetensors itself calls mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) (line 85) using Python's own mmap module. For the CPU device path (the default), returned tensors remain mmap-backed via this Python mmap and are not cloned to regular memory.

The fix works for the specific Windows crash reported, and the TODO comment already flags the approach's uncertainty. However, Python's mmap on Windows also lacks structured exception handling for certain storage configurations and may crash on unusual hardware setups.

For guaranteed mmap-free loading on fragile systems (per issue #10896), the CPU device branch would need sd[name].clone() after torch.frombuffer to fully detach from the memory mapping.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/utils.py` at line 85, load_safetensors currently uses Python's mmap
(mapping = mmap.mmap(...)) and returns mmap-backed tensors when using the CPU
device path; to guarantee mmap-free tensors on fragile Windows setups, after
creating tensors from the buffer (via torch.frombuffer) in the CPU branch,
explicitly detach by calling .clone() on each tensor (e.g., sd[name] =
sd[name].clone()) so returned tensors are not memory-mapped; update the CPU
device path logic in load_safetensors to clone tensors after torch.frombuffer
creation to fully remove mmap backing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@comfy/utils.py`:
- Around line 99-107: The empty-tensor branch creates CPU tensors with
torch.empty and never moves them to the target device, and the
sd[name].to(device) call is unnecessarily inside the warnings.catch_warnings
block; update the code so that after creating sd[name] in both branches (the
torch.empty(...) path and the torch.frombuffer(...) path) you apply the same
device move using the existing device-check conditional ((device != 'cpu' if
isinstance(device, str) else device.type != 'cpu')) and move the sd[name] =
sd[name].to(device) call out of the warnings.catch_warnings block so both empty
and non-empty tensors end up on the correct device.

---

Nitpick comments:
In `@comfy/utils.py`:
- Line 85: load_safetensors currently uses Python's mmap (mapping =
mmap.mmap(...)) and returns mmap-backed tensors when using the CPU device path;
to guarantee mmap-free tensors on fragile Windows setups, after creating tensors
from the buffer (via torch.frombuffer) in the CPU branch, explicitly detach by
calling .clone() on each tensor (e.g., sd[name] = sd[name].clone()) so returned
tensors are not memory-mapped; update the CPU device path logic in
load_safetensors to clone tensors after torch.frombuffer creation to fully
remove mmap backing.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6cfe0cf and 664069c.

📒 Files selected for processing (1)
  • comfy/utils.py

sd, metadata = load_safetensors(ckpt)
# TODO: Not sure if this is the best way to bypass the mmap issues
if DISABLE_MMAP or enables_dynamic_vram():
sd, metadata = load_safetensors(ckpt, device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a little strange, in that load_safetensors is very deliberately an MMAP. The main difference between load_safetensors and the sft package is the MMAP open flags so what you are doing here is a different kind of mmap.

You need to be careful here in that you are adding a path where non-dynamic is using this zero-copy sft loader which is incompatible with the non-dynamic memory pinner.

Try a workflow with --disable-mmap --novram and im suspicious you will see "Pin error"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are Pin error. shown, but unfortunately, ComfyUI says Default CPU Allocator cannot allocate after I add tensor.to(device=device, copy=True), and here is process manager comparsion.

No copy tensor:
2026-2-25 13-54-7

Copy tensor:
2026-2-25 13-52-4

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log: Copy tensor:

got prompt

E:\AI\ComfyUI>

Log: No copy tensor

got prompt
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load LTXAVTEModel_

Pin error. repeat N times

loaded partially; 8902.00 MB usable, 1649.94 MB loaded, 10073.95 MB offloaded, 7252.06 MB buffer reserved, lowvram patches: 0
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
Found quantization metadata version 1
Detected mixed precision quantization
Using mixed precision operations
model weight dtype torch.bfloat16, manual cast: torch.bfloat16
model_type FLUX
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
Requested to load LTXAV

Pin error. repeat N times

loaded partially; 9551.67 MB usable, 9463.08 MB loaded, 12428.51 MB offloaded, 112.04 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:33<00:00,  4.18s/it]
Unloaded partially: 976.66 MB freed, 8486.42 MB remains loaded, 112.04 MB buffer reserved, lowvram patches: 0
0 models unloaded.
Unloaded partially: 1508.48 MB freed, 6977.94 MB remains loaded, 368.54 MB buffer reserved, lowvram patches: 0
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:22<00:00,  7.44s/it]
Requested to load AudioVAE
loaded completely; 2571.74 MB usable, 415.20 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
loaded partially; 0.00 MB usable, 0.00 MB loaded, 2331.69 MB offloaded, 648.02 MB buffer reserved, lowvram patches: 0
Prompt executed in 95.03 seconds

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy/cli_args.py`:
- Line 170: Rename the CLI flag and its internal attribute from sft to
safetensors for clarity: change the parser.add_argument("--sft-alt-loader", ...)
to use "--safetensors-alt-loader" and update any uses of args.sft_alt_loader to
args.safetensors_alt_loader (notably the reference in comfy/utils.py where
args.sft_alt_loader is read). Ensure the help text is expanded to something
explicit like "Use alternate loader for safetensors (fixes Windows crashes with
large models)" and update any tests or docs that reference the old flag name.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 664069c and 8301dcf.

📒 Files selected for processing (2)
  • comfy/cli_args.py
  • comfy/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • comfy/utils.py


parser.add_argument("--mmap-torch-files", action="store_true", help="Use mmap when loading ckpt/pt files.")
parser.add_argument("--disable-mmap", action="store_true", help="Don't use mmap when loading safetensors.")
parser.add_argument("--sft-alt-loader", action="store_true", help="Use alternate method to load safetensors.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "sft_alt_loader|sft-alt-loader|USE_ALT_SFT_LOADER" --type py -A2 -B2

Repository: Comfy-Org/ComfyUI

Length of output: 1160


Help text lacks clarity; sft abbreviation is inconsistent with neighboring flags.

The flag --sft-alt-loader with help text "Use alternate method to load safetensors." is too vague—users encountering Windows crashes with large models won't know this is what they need. Additionally, the abbreviated sft is inconsistent with neighboring flags like --disable-mmap and --mmap-torch-files, which use full words.

Renaming to --safetensors-alt-loader is feasible because this is a newly added flag with no existing users. If renamed, update the attribute reference in comfy/utils.py line 40 from args.sft_alt_loader to args.safetensors_alt_loader.

✏️ Suggested improvement
-parser.add_argument("--sft-alt-loader", action="store_true", help="Use alternate method to load safetensors.")
+parser.add_argument("--safetensors-alt-loader", action="store_true", help="Use an alternate (non-mmap) loader for safetensors files. Recommended on Windows when loading large models causes crashes or when --disable-mmap alone is insufficient.")

Then in comfy/utils.py line 40:

-USE_ALT_SFT_LOADER = args.sft_alt_loader
+USE_ALT_SFT_LOADER = args.safetensors_alt_loader
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parser.add_argument("--sft-alt-loader", action="store_true", help="Use alternate method to load safetensors.")
parser.add_argument("--safetensors-alt-loader", action="store_true", help="Use an alternate (non-mmap) loader for safetensors files. Recommended on Windows when loading large models causes crashes or when --disable-mmap alone is insufficient.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/cli_args.py` at line 170, Rename the CLI flag and its internal
attribute from sft to safetensors for clarity: change the
parser.add_argument("--sft-alt-loader", ...) to use "--safetensors-alt-loader"
and update any uses of args.sft_alt_loader to args.safetensors_alt_loader
(notably the reference in comfy/utils.py where args.sft_alt_loader is read).
Ensure the help text is expanded to something explicit like "Use alternate
loader for safetensors (fixes Windows crashes with large models)" and update any
tests or docs that reference the old flag name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading .safetensors files requires double memory on DGX Spark

2 participants