Skip to content

Deepseek2 does not support K-shift Denial-of-Service vulnerability #10380

Closed
@99991

Description

Long prompts/responses crash llama-server because "Deepseek2 does not support K-shift". For long prompts/responses, llama-server should return an error message or truncate the response, but instead, GGML_ABORT is called, which crashes the server. I believe that this is a Denial-of-Service vulnerability. A client should never be able to trigger GGML_ABORT.

The relevant line in the code is here:

GGML_ABORT("Deepseek2 does not support K-shift");

I have reported this security vulnerability almost three months ago here (link only visible for maintainers), but have received no response and it is public knowledge now anyway, so I also opened this issue to increase visibility.

Discussed in #9092

Originally posted by 99991 August 19, 2024
It is my understanding that llama.cpp shifts the key-value cache when generating more tokens than fit into the context window, which is not supported for DeepSeek Coder V2. To reproduce, start a server with this model

./llama-server -m DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf -c 32 -ngl 999 --port 8080

and then request a prompt completion:

curl -H "Content-Type: application/json" --request POST --data '{"prompt": "Mergesort in Python:", "n_predict": 32}' http://127.0.0.1:8080/completion

This should trigger the error

src/llama.cpp:15646: Deepseek2 does not support K-shift
Aborted

with llama.cpp release b3600.

The corresponding code in llama.cpp is here:

https://github.com/ggerganov/llama.cpp/blob/cfac111e2b3953cdb6b0126e67a2487687646971/src/llama.cpp#L15643C31-L15648C1

I believe that a saner approach would simply stop generating tokens instead of crashing the server. Is there some option that can be set to prevent clients from crashing the server?

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions