Skip to content

Commit

Permalink
server : defer tasks when "slot unavailable" (ggerganov#5018)
Browse files Browse the repository at this point in the history
* server: defer task when no slot is available

* remove unnecessary log

---------

Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>
  • Loading branch information
2 people authored and jordankanter committed Feb 3, 2024
1 parent 935f623 commit c197344
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1558,6 +1558,7 @@ struct llama_server_context
void process_tasks()
{
std::unique_lock<std::mutex> lock(mutex_tasks);
std::vector<task_server> deferred_tasks;
while (!queue_tasks.empty())
{
task_server task = queue_tasks.front();
Expand All @@ -1568,9 +1569,8 @@ struct llama_server_context
llama_client_slot *slot = get_slot(json_value(task.data, "slot_id", -1));
if (slot == nullptr)
{
LOG_TEE("slot unavailable\n");
// send error result
send_error(task, "slot unavailable");
// if no slot is available, we defer this task for processing later
deferred_tasks.push_back(task);
break;
}

Expand Down Expand Up @@ -1616,6 +1616,12 @@ struct llama_server_context
}
}

// add all the deferred tasks back the the queue
for (task_server &task : deferred_tasks)
{
queue_tasks.push_back(task);
}

// remove finished multitasks from the queue of multitasks, and add the corresponding result to the result queue
std::vector<task_result> agg_results;
auto queue_iterator = queue_multitasks.begin();
Expand Down

0 comments on commit c197344

Please sign in to comment.