Description
Description
When using the Uvicorn process manager, requests are assigned to Uvicorn workers randomly. This causes issues including unbalanced queue wait times and the max concurrency limit not working as expected (since each process applies max concurrency independently).
Possible solutions
NGINX
See #1298
Gunicorn + Uvicorn workers
Consider switching to Gunicorn + Uvicorn workers. Gunicorn is a more full-featured process manager than Uvicorn's built in one, and may balance requests across processes better.
Blockers for switching to Gunicorn
Currently there are two features of the Uvicorn process manager that are not supported by Gunicorn:
--limit-concurrency
is used to respond with 503s when the user specified concurrency limit is reached. Here, Uvicorn currently says "Gunicorn provides a different set of configuration options to Uvicorn, so some options such as --limit-concurrency are not yet supported when running with Gunicorn."- It is not currently possible to configure how many threads are used by the Uvicorn worker
Here is the Uvicorn Changelog
Add request forwarder sidecar container
A sidecar container could receive all requests, count in-flight requests, manage max_replica_concurrency
, and forward them to the application container. Within the application container, we'd use FastAPI on Gunicorn with Uvicorn workers (for worker queue fairness), and configure unlimited backlog
and limit-concurrency
. Knative does something similar to this.