-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks queue logic doesn't seem to be logical #5000
Comments
I'm not sure if I understand it correctly:
So the task does get deleted, but it just being deleted before it is processed, not after it is processed, right? In any cases, I'm agree that the current behavior (returning "slot unavailable" message) does break OpenAI-compatible in some cases. Ideally, if all slots are unavailable, we can just delay the task until one become unavailable. I already try this by using a proxy and it works, but it would be nice if the behavior can be implemented in server.cpp, and be controlled via an argument. |
You got the point, i have many issue with this example |
I did a small patch here, but I'm not sure if it will get merged: #5018 |
Great! Queue is needed! |
When will #5018 be merged? |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
https://github.com/ggerganov/llama.cpp/blob/2b3a665d3917edf393761a24c4835447894df74a/examples/server/server.cpp#L1558Motivation
Task queue should only erase task "after" the task was resolved, with current implementation, the task queue does not allow user to know how many tasks are being resolved at the same time + other possible limitations.
Possible Implementation
Delete the task after it was resolved
The text was updated successfully, but these errors were encountered: