Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Description
The same set of parameters should be available when calling from either completion
or v1/chat/completions
endpoints. Most notably min_p
and grammar
are useful to have.
A call like this should be possible for example:
curl http://localhost:3077/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
"temperature": 1.0,
"min_p": 0.01,
"top_k": 0,
"top_p": 1,
"repeat_penalty": 1,
"grammar": "root ::= (\"Hello!\" | \"Hi!\")",
"messages": [
{
"role": "system",
"content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."
},
{
"role": "user",
"content": "Hi"
}
]
}'
Motivation
To be able to fully make use the llama.cpp backend, when replacing another LLM call that uses openai sdk for example, its useful to have access to the full set of parameters to tune the output for the task. It's possible to add those parameters as a dictionary using the extra_body
input parameter when making a call using the python openai library.
If the parameters aren't available when making the switch, the dev will have to consider changing the code to use the completion
endpoint instead, or even have separate versions of the same code to be able to compare different LLMs.
Possible Implementation
I'm guessing oaicompat_completion_params_parse
function in examples/server/server.cpp
can be used to add more parameters.