Skip to content

Cache and system prompt on server makes output non-deterministic #4902

Closed
@Andreybest

Description

@Andreybest

Good day!

I was testing system_prompt field on server, and tried to get the same answer from raw variant (with a system prompt written in prompt) and system_prompt approach. (I suppose that it's a concatenation of system_prompt.prompt + prompt and a caching of system prompt. If somebody can explain system_prompt, will really appreciate this!)

So to make this test I've used temperature - 0 (to get non-random answers), but I got random answers all the time. Then I removed all params that I used before and left with this json:

curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\nTell a story about llama[/INST]\n",
    "cache_prompt": true,
    "temperature": 0
}'

On that sh I get random completions.
But on the one without "cache_prompt", I don't:

curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\nTell a story about llama[/INST]\n",
    "temperature": 0
}'

So cached_prompt is the issue here.

Returning to system_prompt version I get issues both on version with and without cached_prompt.

curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "Tell a story about llama[/INST]\n",
    "temperature": 0,
    "system_prompt": {
        "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\n"
    }
}'

On this one (without cache_prompt) I get same completions on second and further times (2nd, 3rd, 4th ...) but it's not the same completion as the first completion.

curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "Tell a story about llama[/INST]\n",
    "temperature": 0,
    "cache_prompt": true,
    "system_prompt": {
        "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\n"
    }
}'

On this one I get same behaviour as the one without cache_prompt (1 completion not equals to 2nd, 3rd, 4th ...), but the second and further completions are the completion that is not a story about llama, but a random questions...

So there is issues with both system_prompt and cache_prompt fields.

TL;DR

Was testing to understand how system_prompt works on server.cpp.
During testing found issue that cache_prompt makes completions random (with temperature: 0).
If system_prompt used (without cache_prompt) 2nd and further completions the same, but differs from 1st.
If system_prompt used with cache_prompt, 2nd and further completions the same and do not answer users request, and differs from 1st.

System information

OS: macOS 13.4.0
llama.cpp: sha - e790eef
model: llama 2 7b chat Q6_K (TheBloke)
run command:

./server -t 10 -ngl 32 -m "models/llama-2-7b-chat.Q6_K.gguf" -c 4096

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions