Skip to content

[Serve] Can't autoscale deployment when target ongoing requests is 1 #24793

Closed
@spolcyn

Description

What happened + What you expected to happen

Issue: In Ray Serve, if target_num_ongoing_requests_per_replica is 1 and max_concurrent_queries is also 1, then autoscaling will never occur.

Expected: I can set the target to 1 and autoscaling will occur as I send more queries to the deployment, but a single replica will never have more than 1 ongoing request.

See: https://discuss.ray.io/t/autoscaling-with-max-concurrent-queries-1/6121 and the source of autoscaling_policy.py

Current workaround: Set max concurrent queries to 2, and target to 1. This increases request latency though, especially for queries that take a while (like model inference for large models).

Versions / Dependencies

Ray 1.12
Python/OS: N/A

Reproduction script

N/A

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tserveRay Serve Related Issue

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions