Skip to content

Fix CUDA KV cache clear on non-host buffers#4

Open
double-thinker wants to merge 1 commit intoandrijdavid:mainfrom
double-thinker:fix/cuda-kv-cache-clear-host-access
Open

Fix CUDA KV cache clear on non-host buffers#4
double-thinker wants to merge 1 commit intoandrijdavid:mainfrom
double-thinker:fix/cuda-kv-cache-clear-host-access

Conversation

@double-thinker
Copy link

Note: This PR was assisted by AI but has been human-reviewed.

This fixes a CUDA crash when running with --gpu cuda.

The issue was caused by host-side memset/memmove on KV cache tensors (kv_self_k, kv_self_v) that can live in non-host backend buffers (CUDA/Vulkan/Metal). In that case, ggml_get_data() may not return host-safe memory, which can segfault.

Crash path (reproducible on GTX 1080 Ti, sm_61):

  • voxtral_transcribe_from_audio()
  • clear_kv_cache()
  • memset(ggml_get_data(kv_self_*), ...)

gdb points to:

  • src/voxtral.cpp:1082 (clear_kv_cache)

Fix

  • In clear_kv_cache():
    • Detect whether KV tensors are host-backed via ggml_backend_buffer_is_host.
    • For non-host buffers, clear using backend-safe operations:
      • ggml_backend_buffer_clear(ctx->buf_persistent, 0) when available.
      • fallback to chunked ggml_backend_tensor_set(...) with zeroed data.
    • Keep direct memset only for host buffers.
  • In kv_cache_shift_left():
    • Detect non-host KV buffers and avoid host memmove/memset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant