-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server: format error to json #5961
Conversation
Small note for @phymbert : Since we're using the correct OpenAI format now, the python test code will need to use correct exception classes - no actions needed from your side now, but just for reminding. |
BOS token is only added when a set of conditions is satisfied. From the docs:
For example, as i showed here, an empty prompt specified as an array always returns an empty response:
I think it's counter-intuitive, and instead an error should be returned in this case. |
This PR breaks the web UI error handling, so I think that all the changes in web UI from my PR (in Also, the README part can be taken from there as well. Though the formatting is a little bit confusing there (the "where" list blends with the "POST" section below), maybe needs to be done differently. |
@z80maniac Thanks for the suggestions, I took the completions.js file (and hpp) from your PR. For the README, I added a dedicated section to specify that we use the same format as OpenAI. Only llama.cpp specific errors (for example: /slots, /metrics, grammar) will be specified in the docs. |
examples/server/server.cpp
Outdated
@@ -1580,7 +1595,7 @@ struct server_context { | |||
queue_results.send(result); | |||
} | |||
|
|||
bool update_slots() { | |||
void run_slots() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO update_slots
was clearer for me, the slots are not run, but updated from batch view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not get ride of the return type. Returning false might interrupt the main loop for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, the main loop only stop if running
is set to false. So if the loop stop, we know exactly why.
Adding ability for update_slots
to exit the main loop may make it harder to debug if there is an error inside update_slots
@@ -2035,20 +2053,20 @@ struct server_context { | |||
n_batch /= 2; | |||
i -= n_batch; | |||
|
|||
continue; | |||
continue; // continue loop of n_batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest // continue batch decoding loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the loop is to retry with llama_decode with smaller n_batch, until it successful.
For now I don't know how to rephrase the idea, so I just leave it here to distinguish it from the continue and break statements below. Since here we're inside a nested loop, I think having comments will be easier to navigate.
Or maybe we don't need these comments at all, do you have any input for that @ggerganov ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decide to keep the comment this way, because the goal is to reflect the variable name, which helps when we navigate in a nested loop (but not to explain what the code does).
If you're ok with that, then I'll merge this PR @phymbert
} | ||
|
||
for (auto & slot : slots) { | ||
if (slot.state != SLOT_STATE_PROCESSING || slot.i_batch < (int) i || slot.i_batch >= (int) (i + n_tokens)) { | ||
continue; | ||
continue; // continue loop of slots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest // continue updating next token of next slots
or something similar
Turns out, while the error format did not change, its delivery to the web UI did. Now the error response is not wrapped in |
@z80maniac Thanks, it's updated in 0e72879 |
* server: format error to json * server: do not crash on grammar error * fix api key test case * revert limit max n_predict * small fix * correct coding style * update completion.js * launch_slot_with_task * update docs * update_slots * update webui * update readme
* server: format error to json * server: do not crash on grammar error * fix api key test case * revert limit max n_predict * small fix * correct coding style * update completion.js * launch_slot_with_task * update docs * update_slots * update webui * update readme
* server: format error to json * server: do not crash on grammar error * fix api key test case * revert limit max n_predict * small fix * correct coding style * update completion.js * launch_slot_with_task * update docs * update_slots * update webui * update readme
This PR migrate some changes from #5776 and #5710
Errors are now formatted to JSON as suggested in #5882 (comment)
Examples
Unhandled exceptions (for example, invalid input JSON)
404 not found
API key incorrect
Endpoint is disabled
Invalid grammar
Missing both "input" and "content" for /embeddings
Missing both "prompt" and "messages" for /completions
NOTE:
run_slots()
. This may be fixed in the future.