Skip to content

Return 429 instead of 503 when worker job queue is full #2764

Open
@alazareva

Description

@alazareva

🚀 The feature

Currently when using the rest api, the prediction endpoint returns a 503 when the number of concurrent requests is larger than a worker's job queue. It would be great if we could get a 429 so that we know the service is not available due to high request load.

Motivation, pitch

We'd like to be able to disambiguate errors caused by too many requests from other transient 503s (either on server or service mesh side). Having the server return 429 would allow the client to handle retries differently in the case of high load.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions