Description
Good day!
I was testing system_prompt field on server, and tried to get the same answer from raw variant (with a system prompt written in prompt) and system_prompt approach. (I suppose that it's a concatenation of system_prompt.prompt + prompt and a caching of system prompt. If somebody can explain system_prompt, will really appreciate this!)
So to make this test I've used temperature - 0 (to get non-random answers), but I got random answers all the time. Then I removed all params that I used before and left with this json:
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\nTell a story about llama[/INST]\n",
"cache_prompt": true,
"temperature": 0
}'
On that sh I get random completions.
But on the one without "cache_prompt", I don't:
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\nTell a story about llama[/INST]\n",
"temperature": 0
}'
So cached_prompt is the issue here.
Returning to system_prompt version I get issues both on version with and without cached_prompt.
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "Tell a story about llama[/INST]\n",
"temperature": 0,
"system_prompt": {
"prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\n"
}
}'
On this one (without cache_prompt) I get same completions on second and further times (2nd, 3rd, 4th ...) but it's not the same completion as the first completion.
curl --location 'http://localhost:8080/completion' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "Tell a story about llama[/INST]\n",
"temperature": 0,
"cache_prompt": true,
"system_prompt": {
"prompt": "[INST]\n<<SYS>>\nEnd each answer with a word '\''amogus'\''\n<</SYS>>\n\n"
}
}'
On this one I get same behaviour as the one without cache_prompt (1 completion not equals to 2nd, 3rd, 4th ...), but the second and further completions are the completion that is not a story about llama, but a random questions...
So there is issues with both system_prompt and cache_prompt fields.
TL;DR
Was testing to understand how system_prompt works on server.cpp.
During testing found issue that cache_prompt makes completions random (with temperature: 0).
If system_prompt used (without cache_prompt) 2nd and further completions the same, but differs from 1st.
If system_prompt used with cache_prompt, 2nd and further completions the same and do not answer users request, and differs from 1st.
System information
OS: macOS 13.4.0
llama.cpp: sha - e790eef
model: llama 2 7b chat Q6_K (TheBloke)
run command:
./server -t 10 -ngl 32 -m "models/llama-2-7b-chat.Q6_K.gguf" -c 4096