Open
Description
🚀 The feature
Currently when using the rest api, the prediction endpoint returns a 503 when the number of concurrent requests is larger than a worker's job queue. It would be great if we could get a 429 so that we know the service is not available due to high request load.
Motivation, pitch
We'd like to be able to disambiguate errors caused by too many requests from other transient 503s (either on server or service mesh side). Having the server return 429 would allow the client to handle retries differently in the case of high load.
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Backlog