Skip to content

Commit 7508c76

Browse files
committed
Remove support for max_concurrency and max_queue_length for Async APIs
1 parent 992e6d9 commit 7508c76

File tree

4 files changed

+51
-44
lines changed

4 files changed

+51
-44
lines changed

docs/workloads/async/autoscaling.md

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,6 @@ Cortex auto-scales AsyncAPIs on a per-API basis based on your configuration.
44

55
## Autoscaling replicas
66

7-
### Relevant pod configuration
8-
9-
In addition to the autoscaling configuration options (described below), there is one field in the pod configuration which are relevant to replica autoscaling:
10-
11-
**`max_concurrency`** (default: 1): The maximum number of requests that will be concurrently sent into the container by Cortex. If your web server is designed to handle multiple concurrent requests, increasing `max_concurrency` will increase the throughput of a replica (and result in fewer total replicas for a given load).
12-
13-
<br>
14-
157
### Autoscaling configuration
168

179
**`min_replicas`**: The lower bound on how many replicas can be running for an API.
@@ -22,13 +14,13 @@ In addition to the autoscaling configuration options (described below), there is
2214

2315
<br>
2416

25-
**`target_in_flight`** (default: `max_concurrency` in the pod configuration): This is the desired number of in-flight requests per replica, and is the metric which the autoscaler uses to make scaling decisions. The number of in-flight requests is simply how many requests have been submitted and are not yet finished being processed. Therefore, this number includes requests which are actively being processed as well as requests which are waiting in the queue.
17+
**`target_in_flight`** (default: 1): This is the desired number of in-flight requests per replica, and is the metric which the autoscaler uses to make scaling decisions. The number of in-flight requests is simply how many requests have been submitted and are not yet finished being processed. Therefore, this number includes requests which are actively being processed as well as requests which are waiting in the queue.
2618

2719
The autoscaler uses this formula to determine the number of desired replicas:
2820

2921
`desired replicas = total in-flight requests / target_in_flight`
3022

31-
For example, setting `target_in_flight` to `max_concurrency` (the default) causes the cluster to adjust the number of replicas so that on average, there are no requests waiting in the queue.
23+
For example, setting `target_in_flight` to 1 (the default) causes the cluster to adjust the number of replicas so that on average, there are no requests waiting in the queue.
3224

3325
<br>
3426

@@ -66,9 +58,9 @@ Cortex spins up and down instances based on the aggregate resource requests of a
6658

6759
## Overprovisioning
6860

69-
The default value for `target_in_flight` is `max_concurrency`, which behaves well in many situations (see above for an explanation of how `target_in_flight` affects autoscaling). However, if your application is sensitive to spikes in traffic or if creating new replicas takes too long (see below), you may find it helpful to maintain extra capacity to handle the increased traffic while new replicas are being created. This can be accomplished by setting `target_in_flight` to a lower value relative to the expected replica's concurrency. The smaller `target_in_flight` is, the more unused capacity your API will have, and the more room it will have to handle sudden increased load. The increased request rate will still trigger the autoscaler, and your API will stabilize again (maintaining the overprovisioned capacity).
61+
The default value for `target_in_flight` is 1, which behaves well in many situations (see above for an explanation of how `target_in_flight` affects autoscaling). However, if your application is sensitive to spikes in traffic or if creating new replicas takes too long (see below), you may find it helpful to maintain extra capacity to handle the increased traffic while new replicas are being created. This can be accomplished by setting `target_in_flight` to a lower value. The smaller `target_in_flight` is, the more unused capacity your API will have, and the more room it will have to handle sudden increased load. The increased request rate will still trigger the autoscaler, and your API will stabilize again (maintaining the overprovisioned capacity).
7062

71-
For example, if you've determined that each replica in your API can handle 2 concurrent requests, you would typically set `target_in_flight` to 2. In a scenario where your API is receiving 8 concurrent requests on average, the autoscaler would maintain 4 live replicas (8/2 = 4). If you wanted to overprovision by 25%, you could set `target_in_flight` to 1.6, causing the autoscaler maintain 5 live replicas (8/1.6 = 5).
63+
For example, if you wanted to overprovision by 25%, you could set `target_in_flight` to 0.8. If your API has an average of 4 concurrent requests, the autoscaler would maintain 5 live replicas (4/0.8 = 5).
7264

7365
## Autoscaling responsiveness
7466

docs/workloads/async/configuration.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
kind: AsyncAPI # must be "AsyncAPI" for async APIs (required)
66
pod: # pod configuration (required)
77
port: <int> # port to which requests will be sent (default: 8080; exported as $CORTEX_PORT)
8-
max_concurrency: <int> # maximum number of requests that will be concurrently sent into the container (default: 1)
98
containers: # configurations for the containers to run (at least one constainer must be provided)
109
- name: <string> # name of the container (required)
1110
image: <string> # docker image to use for the container (required)
@@ -46,7 +45,7 @@
4645
min_replicas: <int> # minimum number of replicas (default: 1)
4746
max_replicas: <int> # maximum number of replicas (default: 100)
4847
init_replicas: <int> # initial number of replicas (default: <min_replicas>)
49-
target_in_flight: <int> # desired number of in-flight requests per replica (including requests actively being processed as well as queued), which the autoscaler tries to maintain (default: <max_concurrency>)
48+
target_in_flight: <int> # desired number of in-flight requests per replica (including requests actively being processed as well as queued), which the autoscaler tries to maintain (default: 1)
5049
window: <duration> # duration over which to average the API's in-flight requests per replica (default: 60s)
5150
downscale_stabilization_period: <duration> # the API will not scale below the highest recommendation made during this period (default: 5m)
5251
upscale_stabilization_period: <duration> # the API will not scale above the lowest recommendation made during this period (default: 1m)

pkg/types/spec/validations.go

Lines changed: 39 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ func multiAPIsValidation() *cr.StructFieldValidation {
141141
}
142142

143143
func podValidation(kind userconfig.Kind) *cr.StructFieldValidation {
144-
return &cr.StructFieldValidation{
144+
validation := &cr.StructFieldValidation{
145145
StructField: "Pod",
146146
StructValidation: &cr.StructValidation{
147147
StructFieldValidations: []*cr.StructFieldValidation{
@@ -180,30 +180,37 @@ func podValidation(kind userconfig.Kind) *cr.StructFieldValidation {
180180
},
181181
},
182182
},
183-
{
184-
StructField: "MaxQueueLength",
185-
Int64Validation: &cr.Int64Validation{
186-
Default: consts.DefaultMaxQueueLength,
187-
GreaterThan: pointer.Int64(0),
188-
// the proxy can theoretically accept up to 32768 connections, but during testing,
189-
// it has been observed that the number is just slightly lower, so it has been offset by 2678
190-
LessThanOrEqualTo: pointer.Int64(30000),
191-
},
192-
},
193-
{
194-
StructField: "MaxConcurrency",
195-
Int64Validation: &cr.Int64Validation{
196-
Default: consts.DefaultMaxConcurrency,
197-
GreaterThan: pointer.Int64(0),
198-
// the proxy can theoretically accept up to 32768 connections, but during testing,
199-
// it has been observed that the number is just slightly lower, so it has been offset by 2678
200-
LessThanOrEqualTo: pointer.Int64(30000),
201-
},
202-
},
203183
containersValidation(kind),
204184
},
205185
},
206186
}
187+
188+
if kind == userconfig.RealtimeAPIKind {
189+
validation.StructValidation.StructFieldValidations = append(validation.StructValidation.StructFieldValidations,
190+
&cr.StructFieldValidation{
191+
StructField: "MaxQueueLength",
192+
Int64Validation: &cr.Int64Validation{
193+
Default: consts.DefaultMaxQueueLength,
194+
GreaterThan: pointer.Int64(0),
195+
// the proxy can theoretically accept up to 32768 connections, but during testing,
196+
// it has been observed that the number is just slightly lower, so it has been offset by 2678
197+
LessThanOrEqualTo: pointer.Int64(30000),
198+
},
199+
},
200+
&cr.StructFieldValidation{
201+
StructField: "MaxConcurrency",
202+
Int64Validation: &cr.Int64Validation{
203+
Default: consts.DefaultMaxConcurrency,
204+
GreaterThan: pointer.Int64(0),
205+
// the proxy can theoretically accept up to 32768 connections, but during testing,
206+
// it has been observed that the number is just slightly lower, so it has been offset by 2678
207+
LessThanOrEqualTo: pointer.Int64(30000),
208+
},
209+
},
210+
)
211+
}
212+
213+
return validation
207214
}
208215

209216
func containersValidation(kind userconfig.Kind) *cr.StructFieldValidation {
@@ -807,12 +814,19 @@ func validateAutoscaling(api *userconfig.API) error {
807814
autoscaling := api.Autoscaling
808815
pod := api.Pod
809816

810-
if autoscaling.TargetInFlight == nil {
811-
autoscaling.TargetInFlight = pointer.Float64(float64(pod.MaxConcurrency))
817+
if api.Kind == userconfig.RealtimeAPIKind {
818+
if autoscaling.TargetInFlight == nil {
819+
autoscaling.TargetInFlight = pointer.Float64(float64(pod.MaxConcurrency))
820+
}
821+
if *autoscaling.TargetInFlight > float64(pod.MaxConcurrency)+float64(pod.MaxQueueLength) {
822+
return ErrorTargetInFlightLimitReached(*autoscaling.TargetInFlight, pod.MaxConcurrency, pod.MaxQueueLength)
823+
}
812824
}
813825

814-
if *autoscaling.TargetInFlight > float64(pod.MaxConcurrency)+float64(pod.MaxQueueLength) {
815-
return ErrorTargetInFlightLimitReached(*autoscaling.TargetInFlight, pod.MaxConcurrency, pod.MaxQueueLength)
826+
if api.Kind == userconfig.AsyncAPIKind {
827+
if autoscaling.TargetInFlight == nil {
828+
autoscaling.TargetInFlight = pointer.Float64(1)
829+
}
816830
}
817831

818832
if autoscaling.MinReplicas > autoscaling.MaxReplicas {

pkg/types/userconfig/api.go

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ func IdentifyAPI(filePath string, name string, kind Kind, index int) string {
153153
func (api *API) ToK8sAnnotations() map[string]string {
154154
annotations := map[string]string{}
155155

156-
if api.Pod != nil {
156+
if api.Pod != nil && api.Kind == RealtimeAPIKind {
157157
annotations[MaxConcurrencyAnnotationKey] = s.Int64(api.Pod.MaxConcurrency)
158158
annotations[MaxQueueLengthAnnotationKey] = s.Int64(api.Pod.MaxQueueLength)
159159
}
@@ -257,7 +257,7 @@ func (api *API) UserStr() string {
257257

258258
if api.Pod != nil {
259259
sb.WriteString(fmt.Sprintf("%s:\n", PodKey))
260-
sb.WriteString(s.Indent(api.Pod.UserStr(), " "))
260+
sb.WriteString(s.Indent(api.Pod.UserStr(api.Kind), " "))
261261
}
262262

263263
if api.Networking != nil {
@@ -286,7 +286,7 @@ func (trafficSplit *TrafficSplit) UserStr() string {
286286
return sb.String()
287287
}
288288

289-
func (pod *Pod) UserStr() string {
289+
func (pod *Pod) UserStr(kind Kind) string {
290290
var sb strings.Builder
291291

292292
if pod.ShmSize != nil {
@@ -301,8 +301,10 @@ func (pod *Pod) UserStr() string {
301301
sb.WriteString(fmt.Sprintf("%s: %d\n", PortKey, *pod.Port))
302302
}
303303

304-
sb.WriteString(fmt.Sprintf("%s: %s\n", MaxConcurrencyKey, s.Int64(pod.MaxConcurrency)))
305-
sb.WriteString(fmt.Sprintf("%s: %s\n", MaxQueueLengthKey, s.Int64(pod.MaxQueueLength)))
304+
if kind == RealtimeAPIKind {
305+
sb.WriteString(fmt.Sprintf("%s: %s\n", MaxConcurrencyKey, s.Int64(pod.MaxConcurrency)))
306+
sb.WriteString(fmt.Sprintf("%s: %s\n", MaxQueueLengthKey, s.Int64(pod.MaxQueueLength)))
307+
}
306308

307309
sb.WriteString(fmt.Sprintf("%s:\n", ContainersKey))
308310
for _, container := range pod.Containers {

0 commit comments

Comments
 (0)