Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Support Circuit Breakers in BackendTrafficPolicy #2284

Merged
merged 4 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions api/v1alpha1/backendtrafficpolicy_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ type BackendTrafficPolicySpec struct {
//
// +optional
TCPKeepalive *TCPKeepalive `json:"tcpKeepalive,omitempty"`

// Circuit Breaker settings for the upstream connections and requests.
// If not set, circuit breakers will be enabled with the highest supported thresholds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If not set, circuit breakers will be enabled with the highest supported thresholds
// If not set, circuit breakers will be enabled with default thresholds

//
// +optional
CircuitBreakers *CircuitBreakers `json:"circuitBreakers,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why plural?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the term used by Envoy, the Backend Traffic Policy and other OSS projects like Emissary Ingress. A single thresholds struct defines multiple breakers (requests, connection, etc. ), so plural form does sound appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only need one Thresholds in the CircuitBreakers structure, the name should be singular.

}

// BackendTrafficPolicyStatus defines the state of BackendTrafficPolicy
Expand Down
43 changes: 43 additions & 0 deletions api/v1alpha1/circuitbreaker_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
// Copyright Envoy Gateway Authors
// SPDX-License-Identifier: Apache-2.0
// The full text of the Apache license is available in the LICENSE file at
// the root of the repo.

package v1alpha1

// CircuitBreakers defines the Circuit Breakers configuration.
type CircuitBreakers struct {
// List of Circuit Breaker Thresholds
// At most one Thresholds resource is supported.
//
// +kubebuilder:validation:MaxItems:=1
// +optional
Thresholds []Thresholds `json:"thresholds,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the description of this PR, only one Thresholds is needed here because Envoy Gateway doesn't support Routing Priorities

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this as list for two reasons:

  • Future-proofing: if/when EG does support routing priorities, there would be no need to add another list. We can just add another optional priority field to Thresholds and cancel the length validation.
  • Compatibility with the original proposal by @AliceProxy in BTP

We can decide to keep things simpler for the common use case of tweaking default circuit breakers, and, in the future, allow a list for non-default priorities.

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: example-policy-with-circuitbreakers
spec:
 [...]
  circuitBreakers: # optional
    maxConnections: uint32 # optional, default=1024
    maxPendingRequests: uint32 # optional, default=1024
    maxParallelRequests: uint32 # optional, default=1024
    maxRetries: uint32 # optional, default=3
    additionalThresholds: #optional [Future, not in this PR]
    - priority: [HIGH|...] 
      maxConnections: uint32 # optional, default=1024
      maxPendingRequests: uint32 # optional, default=1024
      maxParallelRequests: uint32 # optional, default=1024
      maxRetries: uint32 # optional, default=3
[...] 

If the maintainers support this I don't have any objection . WDYT?

Copy link
Member

@zhaohuabing zhaohuabing Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. A one-member list just looks weird to me. But would love to hear @AliceProxy 's opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to keep it singular for now

circuitBreaker:
  maxConnections: 
  maxPendingRequests: 
  maxParallelRequests: 

I dont see routing priority being added in the upstream Gateway API, and if it does, we can deprecate circuitBreaker in favor of circuitBreakers

}

type Thresholds struct {
// The maximum number of connections that Envoy will make to the referenced backend (per xRoute).
// Default: 1024
//
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better go with CEL for all these fields, for example:

	// +kubebuilder:validation:Minimum=xxx
	// +kubebuilder:validation:Maximum=xxx
	// +kubebuilder:default=xxx

MaxConnections *uint32 `json:"maxConnections,omitempty"`

// The maximum number of pending requests that Envoy will allow to the referenced backend (per xRoute).
// Default: 1024
//
// +optional
MaxPendingRequests *uint32 `json:"maxPendingRequests,omitempty"`

// The maximum number of parallel requests that Envoy will make to the referenced backend (per xRoute).
// Default: 1024
//
// +optional
MaxRequests *uint32 `json:"maxParallelRequests,omitempty"`

// The maximum number of parallel retries that Envoy will allow to the referenced backend (per xRoute).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vote to rm this for now, raise a issue to track max parallel retries, and once the retry API is complete, we can revisit this field and decide on the right home for this

// Default: 3
//
// +optional
MaxRetries *uint32 `json:"maxRetries,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxRetries is part of the retry and would be better included in the retry, the corresponding field is maxParallel #2168

Copy link
Member

@zhaohuabing zhaohuabing Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're talking about two different things here: the MaxConcurrentRetries of a cluster and the MaxRetries of an individual request.

MaxConcurrentRetries belongs to the Circuit Breaker configuration, and MaxRetries belongs to the Retry configuration.

Copy link
Contributor Author

@guydc guydc Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. However, note that #2168 deals with RetryBudget which is also a part of Envoy's Circuit Breaker settings.

In Envoy, the separation of route and cluster settings is pretty clear. Multiple routes can point to the same cluster, and so MaxRetries, RetryBudget will apply to all the traffic coming from routes that share an upstream cluster. The motivation is to protect the upstream system from a retry storm.

In Envoy Gateway, we have a cluster for each xRoute. So, it doesn't make much difference if these settings are managed under the Retries or the CircuitBreakers section. It is important that the users understand the implications of these settings - overflowing retries will be queued and later dropped.

If in the future Envoy Gateway does support a notion of shared backends (e.g. by translating services to clusters in some situations) and Envoy Gateway will support a SharedBackendTrafficPolicy, I expect that this policy will include CircuitBreakers but not Retries. So, for future reusability, it could be better to have these settings under the circuit breaker types.

Another aspect to consider is that these settings are scoped to a Routing Priority level. As long as only the Default level is supported, it doesn't really matter where we place these settings. However, if multiple priority levels are supported in the future, the RateLimitStrategy API will need to be extended to support multiple routing priorities, and the translation logic will need to carefully merge that with the other circuit breaker settings.

I'm willing to drop MaxRetries from this PR for now. We can continue the discussion in #2168 on the best location for these settings in the API and implement it as part of that PR. WDYT?

Copy link
Member

@zhaohuabing zhaohuabing Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Envoy Gateway, we have a cluster for each xRoute. So, it doesn't make much difference if these settings are managed under the Retries or the CircuitBreakers section. It is important that the users understand the implications of these settings - overflowing retries will be queued and later dropped.

IMO, the concurrent max retries setting belongs to Circuit Breaker logic because it enforces back pressure on the clients. Therefore, EG probably should not mix it with the request retries configuration.

Use Istio as an example: Istio puts them into two places: the concurrent max retries setting in the DestinationRule and request retries in the VirtualService.

Copy link
Contributor

@tmsnan tmsnan Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should distinguish between the circuit_breaker and retryStrategy functions for users, as they offer distinct features, not limited to the design of Envoy. Even though retry budget and concurrent max retries are implemented in the circuit_breaker in Envoy, for users, these encompass retry functions that provide richer options for retry operations.

Regarding the shared cluster, it's an aspect that requires careful consideration. However, I'm currently uncertain about its usage. It might be an implementation similar to the Istio DestinationRule resource. If that's the case, one could patch the BackendTrafficPolicy to the DestinationRule (DR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should distinguish between the circuit_breaker and retryStrategy functions for users, as they offer distinct features, not limited to the design of Envoy. Even though retry budget and concurrent max retries are implemented in the circuit_breaker in Envoy, for users, these encompass retry functions that provide richer options for retry operations.

I vote -1 on this.

Even though both have "retries" in their name, they serve two different purposes. The concurrent max retries setting is inherently associated with the Circuit Breaker, which fails requests quickly when a lot of retries happen and apply back pressure on downstream. On the other hand, request retries are specifically designed to mitigate transient network issues. Would love more insights from @kflynn and other @envoyproxy/gateway-maintainers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @arkodg #2284 (comment) and @tmsnan support removing MaxRetries from this API proposal, I'll go ahead and remove it. If we eventually decide that CircuitBreakers should contain these settings, we can add them later on.

}

Check failure on line 43 in api/v1alpha1/circuitbreaker_types.go

View workflow job for this annotation

GitHub Actions / lint

File is not `gofmt`-ed with `-s` (gofmt)
62 changes: 62 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,44 @@ spec:
spec:
description: spec defines the desired state of BackendTrafficPolicy.
properties:
circuitBreakers:
description: Circuit Breaker settings for the upstream connections
and requests. If not set, circuit breakers will be enabled with
the highest supported thresholds
properties:
thresholds:
description: List of Circuit Breaker Thresholds At most one Thresholds
resource is supported.
items:
properties:
maxConnections:
description: 'The maximum number of connections that Envoy
will make to the referenced backend (per xRoute). Default:
1024'
format: int32
type: integer
maxParallelRequests:
description: 'The maximum number of parallel requests that
Envoy will make to the referenced backend (per xRoute).
Default: 1024'
format: int32
type: integer
maxPendingRequests:
description: 'The maximum number of pending requests that
Envoy will allow to the referenced backend (per xRoute).
Default: 1024'
format: int32
type: integer
maxRetries:
description: 'The maximum number of parallel retries that
Envoy will allow to the referenced backend (per xRoute).
Default: 3'
format: int32
type: integer
type: object
maxItems: 1
type: array
type: object
loadBalancer:
description: LoadBalancer policy to apply when routing traffic from
the gateway to the backend endpoints
Expand Down
32 changes: 32 additions & 0 deletions site/content/en/latest/api/extension_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ _Appears in:_
| `loadBalancer` _[LoadBalancer](#loadbalancer)_ | LoadBalancer policy to apply when routing traffic from the gateway to the backend endpoints |
| `proxyProtocol` _[ProxyProtocol](#proxyprotocol)_ | ProxyProtocol enables the Proxy Protocol when communicating with the backend. |
| `tcpKeepalive` _[TCPKeepalive](#tcpkeepalive)_ | TcpKeepalive settings associated with the upstream client connection. Disabled by default. |
| `circuitBreakers` _[CircuitBreakers](#circuitbreakers)_ | Circuit Breaker settings for the upstream connections and requests. If not set, circuit breakers will be enabled with the highest supported thresholds |



Expand Down Expand Up @@ -125,6 +126,20 @@ _Appears in:_
| `maxAge` _[Duration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#duration-v1-meta)_ | MaxAge defines how long the results of a preflight request can be cached. |


#### CircuitBreakers



CircuitBreakers defines the Circuit Breakers configuration.

_Appears in:_
- [BackendTrafficPolicySpec](#backendtrafficpolicyspec)

| Field | Description |
| --- | --- |
| `thresholds` _[Thresholds](#thresholds) array_ | List of Circuit Breaker Thresholds At most one Thresholds resource is supported. |


#### ClaimToHeader


Expand Down Expand Up @@ -1758,6 +1773,23 @@ _Appears in:_
| `interval` _Duration_ | The duration between keep-alive probes. Defaults to `75s`. |


#### Thresholds





_Appears in:_
- [CircuitBreakers](#circuitbreakers)

| Field | Description |
| --- | --- |
| `maxConnections` _integer_ | The maximum number of connections that Envoy will make to the referenced backend (per xRoute). Default: 1024 |
| `maxPendingRequests` _integer_ | The maximum number of pending requests that Envoy will allow to the referenced backend (per xRoute). Default: 1024 |
| `maxParallelRequests` _integer_ | The maximum number of parallel requests that Envoy will make to the referenced backend (per xRoute). Default: 1024 |
| `maxRetries` _integer_ | The maximum number of parallel retries that Envoy will allow to the referenced backend (per xRoute). Default: 3 |


#### TracingProvider


Expand Down
34 changes: 34 additions & 0 deletions test/cel-validation/backendtrafficpolicy_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,40 @@ func TestBackendTrafficPolicyTarget(t *testing.T) {
"spec.loadBalancer: Invalid value: \"object\": Currently SlowStart is only supported for RoundRobin and LeastRequest load balancers.",
},
},
{
desc: " consistenthash with SlowStart is set",
mutate: func(btp *egv1a1.BackendTrafficPolicy) {
val := uint32(1)
btp.Spec = egv1a1.BackendTrafficPolicySpec{
TargetRef: gwapiv1a2.PolicyTargetReferenceWithSectionName{
PolicyTargetReference: gwapiv1a2.PolicyTargetReference{
Group: gwapiv1a2.Group("gateway.networking.k8s.io"),
Kind: gwapiv1a2.Kind("Gateway"),
Name: gwapiv1a2.ObjectName("eg"),
},
},
CircuitBreakers: &egv1a1.CircuitBreakers{
Thresholds: []egv1a1.Thresholds{
{
MaxConnections: &val,
MaxPendingRequests: &val,
MaxRequests: &val,
MaxRetries: &val,
},
{
MaxConnections: &val,
MaxPendingRequests: &val,
MaxRequests: &val,
MaxRetries: &val,
},
},
},
}
},
wantErrors: []string{
"spec.circuitBreakers.thresholds: Too many: 2: must have at most 1 items",
},
},
}

for _, tc := range cases {
Expand Down
Loading