Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Support Circuit Breakers in BackendTrafficPolicy #2284

Merged
merged 4 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions api/v1alpha1/circuitbreaker_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,28 +16,35 @@ type CircuitBreakers struct {
}

type Thresholds struct {
// The maximum number of connections that Envoy will make to the referenced backend (per xRoute).
// Default: 1024
// The maximum number of connections that Envoy will establish to the referenced backend (per xRoute).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The maximum number of connections that Envoy will establish to the referenced backend (per xRoute).
// The maximum number of connections that Envoy will establish to the referenced backend (per xRoute per rule).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or can be rephrased to ... to the referenced backend defined within a xRoute rule

//
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=4294967295
// +kubebuilder:default=1024
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better go with CEL for all these fields, for example:

	// +kubebuilder:validation:Minimum=xxx
	// +kubebuilder:validation:Maximum=xxx
	// +kubebuilder:default=xxx

MaxConnections *uint32 `json:"maxConnections,omitempty"`

// The maximum number of pending requests that Envoy will allow to the referenced backend (per xRoute).
// Default: 1024
// The maximum number of pending requests that Envoy will queue to the referenced backend (per xRoute).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The maximum number of pending requests that Envoy will queue to the referenced backend (per xRoute).
// The maximum number of pending requests that Envoy will queue to the referenced backend (per xRoute per rule).

//
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=4294967295
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is 4294967295 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the maximum value of uint32, but i think this Maximum validation can be optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, my bad, think about 2147483647.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can -1 pass if the type is uint32?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it cannot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what it means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zirain, @shawnh2. Do note that the OpenAPI spec (used by K8s CRDs) doesn't really support unsigned ints: https://swagger.io/specification/. The controller-gen tools actually produce a schema that refers to these fields as int32 in the generated CRD. The actual K8s API server behavior, from my limited check, is to treat these fields as int64. I think that the actual go type (*uint32) mostly impacts the unmarshalling done by client go. So, guaranteeing that the value stored is actually safe to cast to uint32 could be useful...

Copy link
Contributor Author

@guydc guydc Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to use int64 explicitly in the go types layer and have uint32 as a representation in the IR layer and downwards. The value range validation can occur either using the schema or during the IR translation. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like Gateway API project, let's use *int32 with valiation min and max?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to use *int64? MaxUInt32 > MaxInt32, so by using *int32 users would not able to use the full value range provided by Envoy.

// +kubebuilder:default=1024
// +optional
MaxPendingRequests *uint32 `json:"maxPendingRequests,omitempty"`

// The maximum number of parallel requests that Envoy will make to the referenced backend (per xRoute).
// Default: 1024
//
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=4294967295
// +kubebuilder:default=1024
// +optional
MaxRequests *uint32 `json:"maxParallelRequests,omitempty"`

// The maximum number of parallel retries that Envoy will allow to the referenced backend (per xRoute).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vote to rm this for now, raise a issue to track max parallel retries, and once the retry API is complete, we can revisit this field and decide on the right home for this

// Default: 3
//
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=4294967295
// +kubebuilder:default=3
// +optional
MaxRetries *uint32 `json:"maxRetries,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxRetries is part of the retry and would be better included in the retry, the corresponding field is maxParallel #2168

Copy link
Member

@zhaohuabing zhaohuabing Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're talking about two different things here: the MaxConcurrentRetries of a cluster and the MaxRetries of an individual request.

MaxConcurrentRetries belongs to the Circuit Breaker configuration, and MaxRetries belongs to the Retry configuration.

Copy link
Contributor Author

@guydc guydc Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. However, note that #2168 deals with RetryBudget which is also a part of Envoy's Circuit Breaker settings.

In Envoy, the separation of route and cluster settings is pretty clear. Multiple routes can point to the same cluster, and so MaxRetries, RetryBudget will apply to all the traffic coming from routes that share an upstream cluster. The motivation is to protect the upstream system from a retry storm.

In Envoy Gateway, we have a cluster for each xRoute. So, it doesn't make much difference if these settings are managed under the Retries or the CircuitBreakers section. It is important that the users understand the implications of these settings - overflowing retries will be queued and later dropped.

If in the future Envoy Gateway does support a notion of shared backends (e.g. by translating services to clusters in some situations) and Envoy Gateway will support a SharedBackendTrafficPolicy, I expect that this policy will include CircuitBreakers but not Retries. So, for future reusability, it could be better to have these settings under the circuit breaker types.

Another aspect to consider is that these settings are scoped to a Routing Priority level. As long as only the Default level is supported, it doesn't really matter where we place these settings. However, if multiple priority levels are supported in the future, the RateLimitStrategy API will need to be extended to support multiple routing priorities, and the translation logic will need to carefully merge that with the other circuit breaker settings.

I'm willing to drop MaxRetries from this PR for now. We can continue the discussion in #2168 on the best location for these settings in the API and implement it as part of that PR. WDYT?

Copy link
Member

@zhaohuabing zhaohuabing Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Envoy Gateway, we have a cluster for each xRoute. So, it doesn't make much difference if these settings are managed under the Retries or the CircuitBreakers section. It is important that the users understand the implications of these settings - overflowing retries will be queued and later dropped.

IMO, the concurrent max retries setting belongs to Circuit Breaker logic because it enforces back pressure on the clients. Therefore, EG probably should not mix it with the request retries configuration.

Use Istio as an example: Istio puts them into two places: the concurrent max retries setting in the DestinationRule and request retries in the VirtualService.

Copy link
Contributor

@tmsnan tmsnan Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should distinguish between the circuit_breaker and retryStrategy functions for users, as they offer distinct features, not limited to the design of Envoy. Even though retry budget and concurrent max retries are implemented in the circuit_breaker in Envoy, for users, these encompass retry functions that provide richer options for retry operations.

Regarding the shared cluster, it's an aspect that requires careful consideration. However, I'm currently uncertain about its usage. It might be an implementation similar to the Istio DestinationRule resource. If that's the case, one could patch the BackendTrafficPolicy to the DestinationRule (DR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should distinguish between the circuit_breaker and retryStrategy functions for users, as they offer distinct features, not limited to the design of Envoy. Even though retry budget and concurrent max retries are implemented in the circuit_breaker in Envoy, for users, these encompass retry functions that provide richer options for retry operations.

I vote -1 on this.

Even though both have "retries" in their name, they serve two different purposes. The concurrent max retries setting is inherently associated with the Circuit Breaker, which fails requests quickly when a lot of retries happen and apply back pressure on downstream. On the other hand, request retries are specifically designed to mitigate transient network issues. Would love more insights from @kflynn and other @envoyproxy/gateway-maintainers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @arkodg #2284 (comment) and @tmsnan support removing MaxRetries from this API proposal, I'll go ahead and remove it. If we eventually decide that CircuitBreakers should contain these settings, we can add them later on.

}

Original file line number Diff line number Diff line change
Expand Up @@ -55,28 +55,36 @@ spec:
items:
properties:
maxConnections:
description: 'The maximum number of connections that Envoy
will make to the referenced backend (per xRoute). Default:
1024'
default: 1024
description: The maximum number of connections that Envoy
will establish to the referenced backend (per xRoute).
format: int32
maximum: 4294967295
minimum: 0
type: integer
maxParallelRequests:
description: 'The maximum number of parallel requests that
default: 1024
description: The maximum number of parallel requests that
Envoy will make to the referenced backend (per xRoute).
Default: 1024'
format: int32
maximum: 4294967295
minimum: 0
type: integer
maxPendingRequests:
description: 'The maximum number of pending requests that
Envoy will allow to the referenced backend (per xRoute).
Default: 1024'
default: 1024
description: The maximum number of pending requests that
Envoy will queue to the referenced backend (per xRoute).
format: int32
maximum: 4294967295
minimum: 0
type: integer
maxRetries:
description: 'The maximum number of parallel retries that
default: 3
description: The maximum number of parallel retries that
Envoy will allow to the referenced backend (per xRoute).
Default: 3'
format: int32
maximum: 4294967295
minimum: 0
type: integer
type: object
maxItems: 1
Expand Down
8 changes: 4 additions & 4 deletions site/content/en/latest/api/extension_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -1784,10 +1784,10 @@ _Appears in:_

| Field | Description |
| --- | --- |
| `maxConnections` _integer_ | The maximum number of connections that Envoy will make to the referenced backend (per xRoute). Default: 1024 |
| `maxPendingRequests` _integer_ | The maximum number of pending requests that Envoy will allow to the referenced backend (per xRoute). Default: 1024 |
| `maxParallelRequests` _integer_ | The maximum number of parallel requests that Envoy will make to the referenced backend (per xRoute). Default: 1024 |
| `maxRetries` _integer_ | The maximum number of parallel retries that Envoy will allow to the referenced backend (per xRoute). Default: 3 |
| `maxConnections` _integer_ | The maximum number of connections that Envoy will establish to the referenced backend (per xRoute). |
| `maxPendingRequests` _integer_ | The maximum number of pending requests that Envoy will queue to the referenced backend (per xRoute). |
| `maxParallelRequests` _integer_ | The maximum number of parallel requests that Envoy will make to the referenced backend (per xRoute). |
| `maxRetries` _integer_ | The maximum number of parallel retries that Envoy will allow to the referenced backend (per xRoute). |


#### TracingProvider
Expand Down
48 changes: 32 additions & 16 deletions test/cel-validation/backendtrafficpolicy_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ package celvalidation
import (
"context"
"fmt"
egv1a1 "github.com/envoyproxy/gateway/api/v1alpha1"
"k8s.io/utils/pointer"
"strings"
"testing"
"time"

egv1a1 "github.com/envoyproxy/gateway/api/v1alpha1"

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
gwapiv1a2 "sigs.k8s.io/gateway-api/apis/v1alpha2"
)
Expand Down Expand Up @@ -307,9 +307,8 @@ func TestBackendTrafficPolicyTarget(t *testing.T) {
},
},
{
desc: " consistenthash with SlowStart is set",
desc: " more than one circuit breakers threshold is set",
mutate: func(btp *egv1a1.BackendTrafficPolicy) {
val := uint32(1)
btp.Spec = egv1a1.BackendTrafficPolicySpec{
TargetRef: gwapiv1a2.PolicyTargetReferenceWithSectionName{
PolicyTargetReference: gwapiv1a2.PolicyTargetReference{
Expand All @@ -320,18 +319,8 @@ func TestBackendTrafficPolicyTarget(t *testing.T) {
},
CircuitBreakers: &egv1a1.CircuitBreakers{
Thresholds: []egv1a1.Thresholds{
{
MaxConnections: &val,
MaxPendingRequests: &val,
MaxRequests: &val,
MaxRetries: &val,
},
{
MaxConnections: &val,
MaxPendingRequests: &val,
MaxRequests: &val,
MaxRetries: &val,
},
{},
{},
},
},
}
Expand All @@ -340,6 +329,33 @@ func TestBackendTrafficPolicyTarget(t *testing.T) {
"spec.circuitBreakers.thresholds: Too many: 2: must have at most 1 items",
},
},
{
desc: " valid config: min, max, nil",
mutate: func(btp *egv1a1.BackendTrafficPolicy) {
valMax := pointer.Uint32(4294967295)
valMin := pointer.Uint32(0)
btp.Spec = egv1a1.BackendTrafficPolicySpec{
TargetRef: gwapiv1a2.PolicyTargetReferenceWithSectionName{
PolicyTargetReference: gwapiv1a2.PolicyTargetReference{
Group: gwapiv1a2.Group("gateway.networking.k8s.io"),
Kind: gwapiv1a2.Kind("Gateway"),
Name: gwapiv1a2.ObjectName("eg"),
},
},
CircuitBreakers: &egv1a1.CircuitBreakers{
Thresholds: []egv1a1.Thresholds{
{
MaxConnections: valMax,
MaxPendingRequests: valMin,
MaxRequests: nil,
MaxRetries: nil,
},
},
},
}
},
wantErrors: []string{},
},
}

for _, tc := range cases {
Expand Down