Deepseek2 does not support K-shift Denial-of-Service vulnerability #10380
Description
Long prompts/responses crash llama-server because "Deepseek2 does not support K-shift". For long prompts/responses, llama-server should return an error message or truncate the response, but instead, GGML_ABORT
is called, which crashes the server. I believe that this is a Denial-of-Service vulnerability. A client should never be able to trigger GGML_ABORT
.
The relevant line in the code is here:
Line 18032 in 9b75f03
I have reported this security vulnerability almost three months ago here (link only visible for maintainers), but have received no response and it is public knowledge now anyway, so I also opened this issue to increase visibility.
Discussed in #9092
Originally posted by 99991 August 19, 2024
It is my understanding that llama.cpp shifts the key-value cache when generating more tokens than fit into the context window, which is not supported for DeepSeek Coder V2. To reproduce, start a server with this model
./llama-server -m DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf -c 32 -ngl 999 --port 8080
and then request a prompt completion:
curl -H "Content-Type: application/json" --request POST --data '{"prompt": "Mergesort in Python:", "n_predict": 32}' http://127.0.0.1:8080/completion
This should trigger the error
src/llama.cpp:15646: Deepseek2 does not support K-shift
Aborted
with llama.cpp release b3600.
The corresponding code in llama.cpp is here:
I believe that a saner approach would simply stop generating tokens instead of crashing the server. Is there some option that can be set to prevent clients from crashing the server?