Open
Description
Motivation.
Currently, the API server runs in a single process, utilizing a single CPU for its work. As GPUs continue to get faster, it is important that we scale the API server to ensure that it is able to process requests fast enough to keep GPU resources fully utilized.
Proposed Change.
From a high level, this proposal is to move from the API server being a single process to being a configurable pool of processes to ensure that a single CPU for the apiserver will not become a bottleneck in server utilization.
Design notes: https://docs.google.com/document/d/1Y2S011RKYkFKtrcz_MuEqEf3cRXORNGsVvMHCaqqc-k/edit?tab=t.0
Feedback Period.
No response
CC List.
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.