Llama.cpp server destroys <|eot_id|> token even midway through prompt!

In ./server, trying to correctly use Continuation mode with Llama 3 70B is not possible, as the correct prompt template cannot be entered. This is because the token <|eot_id|> becomes *zero* tokens, even when it occurs midway through the prompt:

![image](https://github.com/ggerganov/llama.cpp/assets/70412719/ab9ad75e-2811-4050-a702-987e7d81160f)

(In the above image, I hit start and looked at the number of tokens cached minus the number of tokens predicted: 402 - 400 = 2. This value is the number of tokens I typed as my prompt. Llama's result shown is 2 where it should be 3. I deleted the generated tokens before taking this screenshot, to show what I originally typed)

This token is **required** multiple times by the prompt template, which looks like this:
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

[system prompt goes here]<|eot_id|><|start_header_id|>user<|end_header_id|>

[user prompt goes here]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

[ai response will go here]
```
Not adhering to the prompt usually decreases the ability of the LLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama.cpp server destroys <|eot_id|> token even midway through prompt! #6793

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama.cpp server destroys <|eot_id|> token even midway through prompt! #6793

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions