Skip to content

Qwen3-Coder-Next (80B Q4_K_M) on 16GB GPU: CUDA graphs disabled, CPU-bound inference (gpulayers limited to 15) #1964

@Iyanzl

Description

@Iyanzl

Description

I'm running the multi-part GGUF model Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf (80B params, Q4_K_M quantization) using KoboldCpp v1.107 with an NVIDIA RTX 3070 (16GB VRAM), but face severe performance issues with the following constraints:

  1. GPU layer limit: Setting --gpulayers higher than 15 causes out-of-memory (OOM) errors on my 16GB GPU, so I have to keep it at 15.
  2. Batch size adjustment: I originally used --batchsize 512, then reduced it to 32 to mitigate issues, but the core problems remain.
  3. CUDA graphs disabled: Logs consistently show record_update: disabling CUDA graphs due to too many consecutive updates.
  4. CPU-dominant inference: The model runs almost entirely on CPU (only 15/48 layers on GPU) leading to extremely slow inference speed, even when processing 6144 tokens.
  5. Frequent state writes: I see repeated state_write_data: writing state / writing memory module logs during inference, which further degrades performance.

Environment

  • GPU: NVIDIA RTX 3070 (16GB VRAM, compute capability 8.6)
  • KoboldCpp version: 1.107
  • OS: Windows 11 64-bit
  • Model: Qwen3-Coder-Next-Q4_K_M (multi-part GGUF, 4.86 BPW, 80B parameters)

Question

Given my hardware constraint (16GB VRAM, gpulayers can only be set to 15 without OOM), could you guide me on how to configure KoboldCpp parameters (including but not limited to batch size, context size, CUDA-related flags, memory optimization settings) to:

  1. Fix the "CUDA graphs disabled due to too many consecutive updates" issue?
  2. Reduce CPU usage and maximize GPU utilization within the 15 gpulayers limit?
  3. Eliminate frequent state_write_data logs and improve inference speed for this Qwen3-Coder-Next model?

Any general tuning principles or parameter strategies for 80B Q4_K_M models on 16GB GPUs would be highly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions