Skip to content

GGUF + --fast pinned_memory = CUDA crash #10601

@kaptainkory

Description

@kaptainkory

Custom Node Testing

Expected Behavior

GGUF Qwen models (e.g., Q4_K_M) should run with the --fast argument and not crash.

Actual Behavior

Even smaller GGUF Qwen models (e.g., Q4_K_M) that have run previously now produce the following error when run with the --fast argument or --fast pinned_memory argument:

KSampler

CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I'm aware the --fast argument "enables some untested and potentially quality deteriorating optimizations". The culprit appears to be the pinned_memory optimization.

Steps to Reproduce

Launch ComfyUI with --fast or --fast pinned_memory argument. Run a simple workflow that includes a GGUF Unet loader node. Notice the (likely) CUDA crash.

Debug Logs

--

Other

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Potential BugUser is reporting a bug. This should be tested.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions