Skip to content

fix: skip main_gpu validation when no gpus are available#23405

Open
Dev-iL wants to merge 1 commit into
ggml-org:masterfrom
SummitSG-LLC:2605/fix_sm
Open

fix: skip main_gpu validation when no gpus are available#23405
Dev-iL wants to merge 1 commit into
ggml-org:masterfrom
SummitSG-LLC:2605/fix_sm

Conversation

@Dev-iL
Copy link
Copy Markdown

@Dev-iL Dev-iL commented May 20, 2026

Overview

Setting --split-mode none on a CPU-only build causes model loading to fail (see trace below), because main_gpu defaults to 0 and the bounds check fired against an empty device list. The accompanying warning already states that split mode should have no effect without GPU support - so this PR makes it so we skip the GPU filtering block entirely.

Additional information

[52521] warning: llama.cpp was compiled without support for GPU offload. Setting the split mode has no effect.
[52521] 0.00.024.764 I log_info: verbosity = 3 (adjust with the `-lv N` CLI arg)
[52521] 0.00.024.768 I device_info:
[52521] 0.00.024.788 I   - CPU     : AMD EPYC 9745 128-Core Processor (1030363 MiB, 1030363 MiB free)
[52521] 0.00.024.842 I system_info: n_threads = 256 (n_threads_batch = 256) / 512 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
[52521] 0.00.024.847 I srv          main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
[52521] 0.00.024.889 I srv          init: running without SSL
[52521] 0.00.024.916 I srv          init: using 511 threads for HTTP server
[52521] 0.00.025.001 W srv          main: -----------------
[52521] 0.00.025.003 W srv          main: Built-in tools are enabled, do not expose server to untrusted environments
[52521] 0.00.025.003 W srv          main: This feature is EXPERIMENTAL and may be changed in the future
[52521] 0.00.025.003 W srv          main: -----------------
[52521] 0.00.025.006 I srv         start: binding port with default address family
[52521] 0.00.026.184 I srv          main: loading model
[52521] 0.00.026.193 I srv    load_model: loading model '/llm-models/Qwen3-Coder-Next_UD-Q6_K/Qwen3-Coder-Next-UD-Q6_K-00001-of-00003.gguf'
[52521] 0.00.026.232 I common_init_result: fitting params to device memory ...
[52521] 0.00.026.234 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on)
[52521] 0.00.076.447 E llama_prepare_model_devices: invalid value for main_gpu: 0 (available devices: 0)
[52521] 0.00.077.830 E llama_model_load_from_file_impl: failed to load model
[52521] 0.00.077.880 E common_fit_params: encountered an error while trying to fit params to free device memory: failed to load model
[52521] 0.00.125.935 E llama_prepare_model_devices: invalid value for main_gpu: 0 (available devices: 0)
[52521] 0.00.127.042 E llama_model_load_from_file_impl: failed to load model
[52521] 0.00.127.047 E common_init_from_params: failed to load model '/llm-models/Qwen3-Coder-Next_UD-Q6_K/Qwen3-Coder-Next-UD-Q6_K-00001-of-00003.gguf'
[52521] 0.00.127.050 E srv    load_model: failed to load model, '/llm-models/Qwen3-Coder-Next_UD-Q6_K/Qwen3-Coder-Next-UD-Q6_K-00001-of-00003.gguf'
[52521] 0.00.127.057 I srv    operator(): operator(): cleaning up before exit...
[52521] 0.00.138.291 E srv          main: exiting due to model loading error

Requirements

  • I have read and agree with the contributing guidelines - YES
  • AI usage disclosure: YES - was used to diagnose and verify the correctness of the fix.

@Dev-iL Dev-iL requested a review from ggerganov as a code owner May 20, 2026 09:31
@Dev-iL Dev-iL force-pushed the 2605/fix_sm branch 2 times, most recently from 3d14742 to 2fea69b Compare May 20, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant