Fix CUDA KV cache clear on non-host buffers by double-thinker · Pull Request #4 · andrijdavid/voxtral.cpp

double-thinker · 2026-02-24T11:03:32Z

Note: This PR was assisted by AI but has been human-reviewed.

This fixes a CUDA crash when running with --gpu cuda.

The issue was caused by host-side memset/memmove on KV cache tensors (kv_self_k, kv_self_v) that can live in non-host backend buffers (CUDA/Vulkan/Metal). In that case, ggml_get_data() may not return host-safe memory, which can segfault.

Crash path (reproducible on GTX 1080 Ti, sm_61):

voxtral_transcribe_from_audio()
clear_kv_cache()
memset(ggml_get_data(kv_self_*), ...)

gdb points to:

src/voxtral.cpp:1082 (clear_kv_cache)

Fix

In clear_kv_cache():
- Detect whether KV tensors are host-backed via ggml_backend_buffer_is_host.
- For non-host buffers, clear using backend-safe operations:
  - ggml_backend_buffer_clear(ctx->buf_persistent, 0) when available.
  - fallback to chunked ggml_backend_tensor_set(...) with zeroed data.
- Keep direct memset only for host buffers.
In kv_cache_shift_left():
- Detect non-host KV buffers and avoid host memmove/memset.

Fix CUDA KV cache clear on non-host buffers

3a194da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA KV cache clear on non-host buffers#4

Fix CUDA KV cache clear on non-host buffers#4
double-thinker wants to merge 1 commit intoandrijdavid:mainfrom
double-thinker:fix/cuda-kv-cache-clear-host-access

double-thinker commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

double-thinker commented Feb 24, 2026

Crash path (reproducible on GTX 1080 Ti, sm_61):

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant