Loading .safetensors files requires double memory on DGX Spark

### Updated: A custom loader node for ComfyUI now available:
https://github.com/phaserblast/ComfyUI-DGXSparkSafetensorsLoader

### Your question

Loading a single .safetensors files on DGX Spark causes a problem because of the mmap strategy used by the model loader. This happens even when the --disable-mmap option is used. I have been testing FLUX.2-dev, and cannot load the FP16 model despite the DGX Spark having plenty of RAM for loading both the model and text encoder. It seems mmap is a disaster on DGX Spark due to the coherent memory implementation. The result is the safetensors loader tries to load the model twice: First to "RAM," then a copy to "VRAM," which obviously fails since there is no separate RAM/VRAM and we run out of memory loading large models.

This is also a problem with llama-server and LM Studio. With mmap enabled, llama-server tries to load GGUF models into "RAM" first, then copy to "VRAM." Disabling mmap solves the problem, and models can be loaded directly into memory _once_ without the additional move from "RAM" to "VRAM." A similar workaround with the safetensors model loader would be great, and would save time.

**Update:**

Everything works perfectly and as expected with the BF16 GGUF version of FLUX.2-dev from here:
https://huggingface.co/city96/FLUX.2-dev-gguf
The model loads without doubling the RAM/VRAM requirement, holding just under 90GB with the text encoder also loaded. So the problem is somewhere with the .safetensors loader.

**Update 2:**

I figured out a way to prevent the ballooning memory when loading a .safetensors file the normal way. It requires an edit to ComfyUI/comfy/utils.py:
```
if DISABLE_MMAP:  # TODO: Not sure if this is the best way to bypass the mmap issues
    tensor = tensor.to(device=device, copy=True)
```
Changing copy=True to copy=False allows the model to load without the machine running out of memory:
`tensor = tensor.to(device=device, copy=False) # For DGX Spark`
Total memory usage is a bit higher than the GGUF version, but inference runs slightly faster. Both models generate identical output.

Remember to launch comfy with the --disable-mmap option.


### Logs

```powershell

```

### Other

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading .safetensors files requires double memory on DGX Spark #10896

Updated: A custom loader node for ComfyUI now available:

Your question

Logs

Other

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loading .safetensors files requires double memory on DGX Spark #10896

Description

Updated: A custom loader node for ComfyUI now available:

Your question

Logs

Other

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions