Closed
Description
Description
Replace Flask with another web framework (Fast API?)
Motivation
Improve API, concurrency, performance and efficiency.
Additional Context
- currently, waitress serves 4 threads (default)
- ensure readiness probe/health check is responded to in a timely fashion, otherwise, pod status and in load balancing may be affected (requests may start being queued in load balancer if readiness probes are not responding in a timely fashion)
- https://www.reddit.com/r/MachineLearning/comments/dy8hjh/p_cortex_deploy_models_from_any_framework_as
Note from previous ticket #525:
Something like FastAPI would support multithreading, which may improve throughput
API refactor checklist
- Revisit Python error wrapping
- Expose multiple-workers for parallelism