Closed

Description
As far as I can tell, Vicuna 1.1 uses </s>
as the separator for dialogue responses. It's tokenized as the EOS token (2) when I tried it in a Python script using Transformers, but it's tokenized as a normal string when using llama.cpp. I got something like this instead:
main: prompt: ' </s>'
main: number of tokens in prompt = 4
1 -> ''
1533 -> ' </'
29879 -> 's'
29958 -> '>'
I found these docs when looking for a reference of how the prompt should look like for Vicuna 1.1, to check if </s>
should appear in the prompt:
https://github.com/lm-sys/FastChat/blob/7ae721fa3c881e1e24cf181305d127a316acd463/docs/vicuna_weights_version.md#example-prompt-weight-v11
A chat between a user and an assistant.
USER: Hello!
ASSISTANT: Hello!</s>
USER: How are you?
ASSISTANT: I am good.</s>
The docs at the end mention a special_tokens_map.json file that has something like this, but it doesn't seem to be used by convert.py:
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}