Cache and system prompt on server makes output non-deterministic

Good day!

I was testing system_prompt field on server, and tried to get the same answer from raw variant (with a system prompt written in prompt) and system_prompt approach. (I suppose that it's a concatenation of system_prompt.prompt + prompt and a caching of system prompt. If somebody can explain system_prompt, will really appreciate this!)

So to make this test I've used temperature - 0 (to get non-random answers), but I got random answers all the time. Then I removed all params that I used before and left with this json:

```sh
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\nTell a story about llama[/INST]\n",
    "cache_prompt": true,
    "temperature": 0
}'
```

On that sh I get random completions.
But on the one without "cache_prompt", I don't:

```sh
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\nTell a story about llama[/INST]\n",
    "temperature": 0
}'
```

So cached_prompt is the issue here.

Returning to system_prompt version I get issues both on version with and without cached_prompt.

```sh
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "Tell a story about llama[/INST]\n",
    "temperature": 0,
    "system_prompt": {
        "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\n"
    }
}'
```

On this one (without cache_prompt) I get same completions on second and further times (2nd, 3rd, 4th ...) but it's not the same completion as the first completion.

```sh
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "Tell a story about llama[/INST]\n",
    "temperature": 0,
    "cache_prompt": true,
    "system_prompt": {
        "prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\n"
    }
}'
```

On this one I get same behaviour as the one without cache_prompt (1 completion not equals to 2nd, 3rd, 4th ...), but the second and further completions are the completion that is not a story about llama, but a random questions...

So there is issues with both system_prompt and cache_prompt fields.

### TL;DR

Was testing to understand how system_prompt works on server.cpp.
During testing found issue that cache_prompt makes completions random (with temperature: 0).
If system_prompt used (without cache_prompt) 2nd and further completions the same, but differs from 1st.
If system_prompt used with cache_prompt, 2nd and further completions the same and do not answer users request, and differs from 1st.

### System information

OS: macOS 13.4.0
llama.cpp: sha - e790eef21ce659f5c16d59f8a5c8dcf6cde0692a
model: llama 2 7b chat Q6_K (TheBloke)
run command:
```sh
./server -t 10 -ngl 32 -m "models/llama-2-7b-chat.Q6_K.gguf" -c 4096
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache and system prompt on server makes output non-deterministic #4902

TL;DR

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache and system prompt on server makes output non-deterministic #4902

Description

TL;DR

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions