Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lifo filters, concept #1030

Merged
merged 33 commits into from
May 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
cd407e9
lifo filters, concept
aryszka Apr 11, 2019
87ba6d9
code formatting
aryszka Apr 11, 2019
bd50b2f
add some error handling and small refactorings
szuecs Apr 15, 2019
5c7389a
add doc from comment
szuecs Apr 15, 2019
584e379
fix build error
szuecs Apr 15, 2019
3277879
lifo -> scheduler
szuecs Apr 17, 2019
780cc39
tried to wire a global handling which I will remove in the next commit
szuecs Apr 17, 2019
8b81921
-
szuecs Apr 17, 2019
802935e
added benchmakr for enabled lifo queue:
szuecs Apr 18, 2019
5158238
add check and doc for min values as commented
szuecs Apr 18, 2019
14d2bc0
docs: package filters/scheduler
szuecs Apr 18, 2019
c82b302
fix scheduler PostProcessor used route Id to group the configuration
szuecs Apr 18, 2019
6a0c134
test lifo scheduler and remove code that is not grouped
szuecs Apr 24, 2019
d1e8c5d
change scheduler filter docs
szuecs Apr 24, 2019
3e1f156
add warning on compatability
szuecs Apr 24, 2019
f34b7ec
rename lifo to lifoGroup in order to create the lifo filter without n…
szuecs Apr 25, 2019
341f02b
add lifo filters that can be used in a per route style grouping also …
szuecs Apr 25, 2019
804892b
add some user docs
szuecs Apr 25, 2019
d301f5a
add operations manual
szuecs Apr 26, 2019
74d294d
refactor: drop global config store and use registry to store group co…
szuecs Apr 26, 2019
8bc5993
fix build errors, because of the Config(REgistry) change
szuecs Apr 26, 2019
aa22f35
fix staticcheck finding
szuecs Apr 26, 2019
785c9a0
added response code information and link the docs in case of known er…
szuecs Apr 26, 2019
1c37536
remove TODOs which were already done
szuecs Apr 26, 2019
3feaa72
add lifo() filter test
szuecs Apr 26, 2019
032c74e
refactor: rename jobstack to jobqueue
szuecs Apr 26, 2019
da95088
refactor: rename Ready() to Wait()
szuecs Apr 26, 2019
c523550
fix typos, remove duplicated docs and link instead
szuecs Apr 26, 2019
43d8a97
fix typo
szuecs May 2, 2019
cefd58d
fix comments
szuecs May 2, 2019
86d1eba
remove commented test stubs
szuecs May 2, 2019
5f8600c
copy items of a stack from old to new in case of a config change
szuecs May 2, 2019
94b9ba8
fix lifo() resize handling
szuecs May 3, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 95 additions & 10 deletions docs/operation/operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,7 +389,7 @@ Kubernetes API, use the following option:

-source-poll-timeout int
polling timeout of the routing data sources, in milliseconds (default 3000)


# Routing table information

Expand Down Expand Up @@ -471,7 +471,34 @@ using the Redis ring based solution, adds 2 additional roundtrips to
redis per hit. Make sure you monitor redis closely, because skipper
will fallback to allow traffic if redis can not be reached.

## Default filters
## Slow Backends

Skipper has to keep track of all active connections and http
Requests. Slow Backends can pile up in number of connections, that
will consume each a little memory per request. If you have high
traffic per instance and a backend times out it can start to increase
your memory consumption. Make sure you monitor backend latency,
request and error rates.

# Default Filters

Default filters will be applied to all routes created or updated.

## Global Default Filters

Global default filters can be specified via two different command line
flags `-default-filters-prepend` and
`-default-filters-append`. Filters passed to these command line flags
will be applied to all routes. The difference `prepend` and `append` is
where in the filter chain these default filters are applied.

For example a user specified the route: `r: * -> setPath("/foo")`
If you run skipper with `-default-filters-prepend=enableAccessLog(4,5) -> lifo(100,100,"10s")`,
the actual route will look like this: `r: * -> enableAccessLog(4,5) -> lifo(100,100,"10s") -> setPath("/foo")`.
If you run skipper with `-default-filters-append=enableAccessLog(4,5) -> lifo(100,100,"10s")`,
the actual route will look like this: `r: * -> setPath("/foo") -> enableAccessLog(4,5) -> lifo(100,100,"10s")`.

## Kubernetes Default Filters

Kubernetes dataclient supports default filters. You can enable this feature by
specifying `default-filters-dir`. The defined directory must contain per-service
Expand All @@ -485,11 +512,69 @@ potentially contradicting filter configurations and race conditions, i.e.
you should specify a specific filter either on the Ingress resource or as
a default filter.

## Slow Backends

Skipper has to keep track of all active connections and http
Requests. Slow Backends can pile up in number of connections, that
will consume each a little memory per request. If you have high
traffic per instance and a backend times out it can start to increase
your memory consumption. Make sure you monitor backend latency,
request and error rates.
# Scheduler

HTTP request schedulers change the queuing behavior of in-flight
requests. A queue has two generic properties: a limit of requests and
a concurrency level. The limit of request can be unlimited (unbounded
queue), or limited (bounded queue). The concurrency level is either
limited or unlimited.

The default scheduler is an unbounded first in first out (FIFO) queue,
that is provided by [Go's](https://golang.org/) standard library.

Skipper provides 2 last in first out (LIFO) filters to change the
scheduling behavior.

On failure conditions, Skipper will return HTTP status code:

- 503 if the queue is full, which is expected on the route with a failing backend
- 502 if queue access times out, because the queue access was not fast enough
- 500 on unknown errors, please create [an issue](https://github.com/zalando/skipper/issues/new/choose)

## The problem

Why should you use boundaries to limit concurrency level and limit the
queue?

The short answer is resiliency. If you have one route, that is timing
out, the request queue of skipper will pile up and consume much more
memory, than before. This can lead to out of memory kill, which will
affect all other routes. In [this
comment](https://github.bus.zalan.do/teapot/issues/issues/1792#issuecomment-1315569)
you can see the memory usage increased in [Go's](https://golang.org/)
standard library `bufio` package.

Why LIFO queue instead of FIFO queue?

In normal cases the queue should not contain many requests. Skipper is
able to process many requests concurrently without letting the queue
piling up. In overrun situations you might want to process at least
some fraction of requests instead of timing out all requests. LIFO
would not time out all requests within the queue, if the backend is
capable of responding some requests fast enough.

## A solution

Skipper has two filters [`lifo()`](../../reference/filters/#lifo) and
[`lifoGroup()`](../../reference/filters/#lifogroup), that can limit
the number of requests for a route. A [documented load
test](https://github.com/zalando/skipper/pull/1030#issuecomment-485714338)
shows the behavior with an enabled `lifo(100,100,"10s")` filter for
all routes, that was added by default. You can do this, if you pass
the following flag to skipper:
`-default-filters-prepend=lifo(100,100,"10s")`.

Both LIFO filters will, use a last in first out queue to handle most
requests fast. If skipper is in an overrun mode, it will serve some
requests fast and some will timeout. The idea is based on Dropbox
bandaid proxy, which is not opensource. [Dropbox](https://dropbox.com/)
shared their idea in a [public
blogpost](https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy/).

Skipper's scheduler implementation makes sure, that one route will not
interfere with other routes, if these routes are not in the same
scheduler group. [`LifoGroup`](../../reference/filters/#lifogroup) has
a user chosen scheduler group and
[`lifo()`](../../reference/filters/#lifo) will get a per route unique
scheduler group.
84 changes: 68 additions & 16 deletions docs/reference/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ The filter accepts variable number of string arguments, which are used to
validate the incoming token from the `Authorization: Bearer <token>`
header. There are two rejection scenarios for this filter. If the token
is not successfully validated by the oauth server, then a 401 Unauthorised
response will be returned. However, if the token is successfully validated
response will be returned. However, if the token is successfully validated
but the required scope match isn't satisfied, then a 403 Forbidden response
will be returned. If any of the configured scopes from the filter is found
inside the tokeninfo result for the incoming token, it will allow the
Expand All @@ -439,7 +439,7 @@ The filter accepts variable number of string arguments, which are used to
validate the incoming token from the `Authorization: Bearer <token>`
header. There are two rejection scenarios for this filter. If the token
is not successfully validated by the oauth server, then a 401 Unauthorised
response will be returned. However, if the token is successfully validated
response will be returned. However, if the token is successfully validated
but the required scope match isn't satisfied, then a 403 Forbidden response
will be returned. If all of the configured scopes from the filter are found
inside the tokeninfo result for the incoming token, it will allow the
Expand All @@ -458,13 +458,13 @@ this filter.

The filter accepts an even number of variable arguments of type
string, which are used to validate the incoming token from the
`Authorization: Bearer <token>` header. There are two rejection scenarios
for this filter. If the token is not successfully validated by the oauth
server, then a 401 Unauthorised response will be returned. However,
if the token is successfully validated but the required scope match
`Authorization: Bearer <token>` header. There are two rejection scenarios
for this filter. If the token is not successfully validated by the oauth
server, then a 401 Unauthorised response will be returned. However,
if the token is successfully validated but the required scope match
isn't satisfied, then a 403 Forbidden response will be returned.
If any of the configured key value pairs from the filter is found
inside the tokeninfo result for the incoming token, it will allow
If any of the configured key value pairs from the filter is found
inside the tokeninfo result for the incoming token, it will allow
the request to pass.

Examples:
Expand All @@ -480,13 +480,13 @@ this filter.

The filter accepts an even number of variable arguments of type
string, which are used to validate the incoming token from the
`Authorization: Bearer <token>` header. There are two rejection
scenarios for this filter. If the token is not successfully validated
by the oauth server, then a 401 Unauthorised response will be
returned. However, if the token is successfully validated but
the required scope match isn't satisfied, then a 403 Forbidden response
will be returned. If all of the configured key value pairs from
the filter are found inside the tokeninfo result for the incoming
`Authorization: Bearer <token>` header. There are two rejection
scenarios for this filter. If the token is not successfully validated
by the oauth server, then a 401 Unauthorised response will be
returned. However, if the token is successfully validated but
the required scope match isn't satisfied, then a 403 Forbidden response
will be returned. If all of the configured key value pairs from
the filter are found inside the tokeninfo result for the incoming
token, it will allow the request to pass.

Examples:
Expand Down Expand Up @@ -1081,7 +1081,7 @@ token.

*N.B.* It is important to note that, if the content of the `X-Unverified-Audit` header does not match the following regex, then
a default value of `invalid-sub` will be populated in the header instead:
`^[a-zA-z0-9_/:?=&%@.#-]*$`
`^[a-zA-z0-9_/:?=&%@.#-]*$`

Examples:

Expand Down Expand Up @@ -1392,3 +1392,55 @@ E.g.:
```
apiUsageMonitoring.custom.my-app.{unknown}.GET.{no-match}.*.*.http_count
```

## lifo

This Filter changes skipper to handle the route with a bounded last in
first out queue (LIFO), instead of an unbounded first in first out
queue (FIFO). The default skipper scheduler is based on Go net/http
package, which provides an unbounded FIFO request handling. If you
enable this filter the request scheduling will change to a LIFO. The
idea of a LIFO queue is based on Dropbox bandaid proxy, which is not
opensource. Dropbox shared their idea in a
[public blogpost](https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy/).
All bounded scheduler filters will respond requests with server status error
codes in case of overrun. All scheduler filters return HTTP status code:

- 502, if the specified timeout is reached, because a request could not be scheduled fast enough
- 503, if the queue is full

Parameters:

* MaxConcurrency specifies how many goroutines are allowed to work on this queue(int)
* MaxStackSize sets the queue size (int)
* Timeout sets the timeout to get request scheduled (time)

Example:

```
lifo(100, 150, "10s")
```

The above configuration will set MaxConcurrency to 100, MaxStackSize
to 150 and Timeout to 10 seconds.

## lifoGroup

This filter is similar to the [lifo](#lifo) filter.

Parameters:

* GroupName to group multiple one or many routes to the same queue, which have to have the same settings (string)
* MaxConcurrency specifies how many goroutines are allowed to work on this queue(int)
* MaxStackSize sets the queue size (int)
* Timeout sets the timeout to get request scheduled (time)

Example:

```
lifoGroup("mygroup", 100, 150, "10s")
```

The above configuration will set MaxConcurrency to 100, MaxStackSize
to 150 and Timeout to 10 seconds for the lifoGroup "mygroup", that can
be shared between more than routes.
3 changes: 3 additions & 0 deletions filters/builtin/builtin.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
"github.com/zalando/skipper/filters/flowid"
logfilter "github.com/zalando/skipper/filters/log"
"github.com/zalando/skipper/filters/ratelimit"
"github.com/zalando/skipper/filters/scheduler"
"github.com/zalando/skipper/filters/tee"
"github.com/zalando/skipper/filters/tracing"
"github.com/zalando/skipper/script"
Expand Down Expand Up @@ -130,6 +131,8 @@ func MakeRegistry() filters.Registry {
accesslog.NewDisableAccessLog(),
accesslog.NewEnableAccessLog(),
auth.NewForwardToken(),
scheduler.NewLIFO(),
scheduler.NewLIFOGroup(),
} {
r.Register(s)
}
Expand Down
1 change: 1 addition & 0 deletions filters/builtin/redirect.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ type redirect struct {
// Name: "redirect".
//
// This filter is deprecated, use RedirectTo instead.
// This *DEPRECATED* filter can not be used with filters from scheduler package.
func NewRedirect() filters.Spec { return &redirect{typ: redDeprecated} }

// NewRedirectTo returns a new filter Spec, whose instances create an HTTP redirect
Expand Down
37 changes: 37 additions & 0 deletions filters/scheduler/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Package scheduler implements filter logic that changes the http
// request scheduling behavior of the proxy.
//
// The proxy has as default an unbounded scheduler that does not limit
// inflight requests. Goroutines with parsed request data consume
// memory. The unbounded handler could spike in memory, if you have
// traffic on a backend that has too big response times. You can check
// the number of goroutines from skipper metrics, if you have this
// problem.
//
// The scheduler filter package has two implementations of bounded
// queue, the lifo and lifoGroup filter. Both lifo filters will, use a
// last in first out queue to handle most requests fast and if skipper
// is in an overrun mode, it will serve some requests fast and some
// will timeout. This scheduler implementation makes sure that one
// route will not interfere with other routes, if these routes are not
// in the same scheduler group. LifoGroup has a user specified
// scheduler group and lifo will get a per route unique scheduler
// group.
//
// Bounded schedulers were tested in Kubernetes with 3 proxy instances
// with 500m CPU and 500Mi memory resources. The load test was done
// with 500 requests per second to backends with 25 seconds latency
// and a second load test was done in parallel with 50 150 250 .. 1000
// requests per second to backends with no additional latency. For the
// workload without additional latency there was no additional latency
// measurable. The memory was at maximum 350Mi with the bounded
// scheduler. The unbounded scheduler spiked in memory to above 500Mi,
// which caused an out of memory (OOM) kill by the operating system.
//
// Bounded schedulers will respond to requests with server status error
// codes in case of overrun. The scheduler returns HTTP status code:
//
// - 502, if it can not get a request from data structure fast enough
// - 503, if the data structure is full and reached its boundary
//
package scheduler
Loading