Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: admissionControl filter #2030

Merged
merged 2 commits into from
Jul 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 76 additions & 4 deletions docs/reference/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,8 +372,8 @@ no-compression, 1 means best-speed and 11 means best-compression. Example:
```

The filter also checks the incoming request, if it accepts the supported encodings,
explicitly stated in the Accept-Encoding header.
The filter currently supports by default `gzip`, `deflate` and `br` (can be overridden with flag `compress-encodings`).
explicitly stated in the Accept-Encoding header.
The filter currently supports by default `gzip`, `deflate` and `br` (can be overridden with flag `compress-encodings`).
It does not assume that the client accepts any encoding if the
Accept-Encoding header is not set. It ignores * in the Accept-Encoding header.

Expand Down Expand Up @@ -1848,6 +1848,78 @@ Path("/cheap") -> clusterLeakyBucketRatelimit("user-${request.cookie.Authori
Path("/expensive") -> clusterLeakyBucketRatelimit("user-${request.cookie.Authorization}", 1, "1s", 5, 2) -> ...
```

## shedder

The basic idea of load shedding is to reduce errors by early stopping
some of the ingress requests that create too much load and serving
the maximum throughput the system can process at a point in time.

There is a great talk by [Acacio Cruz from
Google](https://www.youtube.com/watch?v=XNEIkivvaV4)
that explains the basic principles.

### admissionControl

Implements an admission control filter, that rejects traffic by
observed error rate and probability.

The probability of rejection is calculated by the following equation:

$$ P_{reject} = ( { n_{total} - { n_{success} \over threshold } \over n_{total} + 1} )^{ exponent } $$

Examples:

admissionControl(metricSuffix, mode, d, windowSize, minRPS, successThreshold, maxRejectProbability, exponent)
admissionControl("myapp", "active", "1s", 5, 10, 0.95, 0.9, 0.5)

Parameters:

* metric suffix (string)
* mode (enum)
* d (time.Duration)
* window size (int)
* minRps (int)
* success threshold (float64)
* max reject probability (float64)
* exponent (float64)

Metric suffix is the chosen suffix key to expose reject counter,
should be unique by filter instance

Mode has 3 different possible values:

* "active" will reject traffic
* "inactive" will never reject traffic
* "logInactive" will not reject traffic, but log to debug filter settings

D the time duration of a single slot for required counters in our
circular buffer of window size.

Window size is the size of the circular buffer. It is used to snapshot
counters to calculate total requests and number of success. It is
within $[1, 100]$.

MinRps is the minimum requests per second that have to pass this filter
otherwise it will not reject traffic.

Success threshold sets the lowest request success rate at which the
filter will not reject requests. It is within $(0,1]$. A value of
0.95 means an error rate of lower than 5% will not trigger
rejects.

Max reject probability sets the upper bound of reject probability. It
is within (0,1]. A value of 0.95 means if backend errors with 100% it
will only reject up to 95%.

exponent is used to dictate the rejection probability. The
calculation is done by $p = p^{exponent}$
The exponent value is within $(0,\infty]$, to increase rejection
probability you have to use values lower than 1:

* 1: linear
* 1/2: quadratic
* 1/3: cubic

## lua

See [the scripts page](scripts.md)
Expand Down Expand Up @@ -2557,15 +2629,15 @@ fadeIn("3m", 1.5)

#### Warning on fadeIn and Rolling Restarts

Traffic fade-in has the potential to skew the traffic to your backend pods in case of a rolling restart
Traffic fade-in has the potential to skew the traffic to your backend pods in case of a rolling restart
(`kubectl rollout restart`), because it is very likely that the rolling restart is going faster than the
fade-in duration. The image below shows an example of a rolling restart for a four-pod deployment (A, B, C, D)
into (E, F, G, H), and the traffic share of each pod over time. While the ramp-up of the new pods is ongoing,
the remaining old pods will receive a largely increased traffic share (especially the last one, D in this
example), as well as an over-propotional traffic share for the first pod in the rollout (E).

To make rolling restarts safe, you need to slow them down by setting `spec.minReadySeconds` on the pod spec
of your deployment or stackset, according to your fadeIn duration.
of your deployment or stackset, according to your fadeIn duration.

![Rolling Restart and Fade-In](../img/fadein_traffic_skew.png)

Expand Down
4 changes: 3 additions & 1 deletion filters/filters.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@ package filters

import (
"errors"
log "github.com/sirupsen/logrus"
"net/http"
"time"

log "github.com/sirupsen/logrus"

"github.com/opentracing/opentracing-go"
)

Expand Down Expand Up @@ -264,6 +265,7 @@ const (
ConsecutiveBreakerName = "consecutiveBreaker"
RateBreakerName = "rateBreaker"
DisableBreakerName = "disableBreaker"
AdmissionControlName = "admissionControl"
ClientRatelimitName = "clientRatelimit"
RatelimitName = "ratelimit"
ClusterClientRatelimitName = "clusterClientRatelimit"
Expand Down
Loading