Skip to content

OpenAI-Compatible Chat Completions API Endpoint Responses include EOS / stop tokens #6859

Closed
@K-Mistele

Description

Commit: 4e96a81 (origin/master)

Expected Behavior: Chat completions from /v1/chat/completions should not include the stop token in the text returned to the client

Actual Behavior: Stop token is included when using Mistral 7B instruct v0.2 and either no chat template, or the llama2 chat template.

Example of Broken Behavior

When I run inference with the server and mistral-7b-instruct-v0.2, I use the following command:

./server -m ~/Documents/AI/models/mistral-7b-instruct-v0.2.Q8_0.gguf -c 32768 -cb -np 1 -ngl -1 --host 0.0.0.0

The result of using the /v1/chat/completions OpenAI endpoint with The Bloke's Quant of the model, includes the EOS </s> string in the output:

Screenshot 2024-04-23 at 9 17 25 PM

This happens when I omit the --chat-template option, and when I use --chat-template llama2 as indicated in this repository's wiki

In the past, when I have used chatml fine-tunes of mistral, I did not see a stop token at the end of the generated text.

However now, using the chatml-tuned Hermes 2 Pro Mistral 7B:

./server -m ~/Documents/AI/models/optimal/Hermes-2-Pro-Mistral-7B.Q8_0.gguf -cb -np 1 -c 8096 --host 0.0.0.0, I see the <|im_end|> stop token:

Screenshot 2024-04-23 at 9 24 01 PM

I am confident that I had never seen stop tokens included in chat completion response from the OpenAI compatible completions endpoint before with older versions of llama.cpp

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions