Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load shedding #2004

Open
andrewhowdencom opened this issue May 13, 2022 · 3 comments
Open

load shedding #2004

andrewhowdencom opened this issue May 13, 2022 · 3 comments

Comments

@andrewhowdencom
Copy link

andrewhowdencom commented May 13, 2022

Is your feature request related to a problem? Please describe.

When I am operating a service (particularly with skipper-ingress), that service will sometimes have a sudden and substantial increase in traffic. This can happen in a few situations:

  1. An increase in traffic at the edge (such as abusive traffic)
  2. An upstream service timing out, and retrying aggressively

The traffic frequently increases very large amounts; 100 → 200 or 300%. This increases the latency of the service (either through overloading that service directly, or through overloading the downstream services) after which the service eventually times out.

Describe the solution you would like

Skipper can parse the metadata associated with HTTP responses to determine the appropriate throughput of any given downstream system. By keeping a count of the:

  • The HTTP Status code (e.g. HTTP 503)
  • The HTTP Header (e.g. Retry-After, or similar)

Skipper can effectively understand whether the service is presently overloaded. Skipper can, in parallel, limit the throughput to any given "route" — in this case a pod.

Skipper can adjust the throughput toward a pod based on how overloaded the system is (as expected by the system returning a HTTP 503).

Related:

  1. Netflix, Adaptive Concurrency: https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581
  2. Envoy, Admission Control: https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/admission_control_filter
  3. Envoy, Adaptive Concurrency: https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/adaptive_concurrency_filter
  4. Netflix, Concurrency Limits (Java): https://github.com/Netflix/concurrency-limits

Describe alternatives you've considered (optional)

Alternative solutions include:

backendRateLimit() — This allows setting a limit per pod, but assumes that pods are stable and that requests are equivalent. Where the workload changes substantially (e.g. larger requests that are more computationally expensive), the effective capacity of any given pod goes down.

An adaptive solution would factor this in, and automatically reject traffic that could not be propagated. It would also work blindly — that is, when a downstreams downstream was overloaded (as the HTTP 503 status would be propagated back).

Additional context (optional)
Monitoring
Should this be deployed, it will make the state of the system harder to reason about as the throughput can vary either per route or per route group. Given this, it should be possible to either dump (via SIGQUIT or similar) or export as a metric the current calculated throughput per route.

This could be exported as a metric, but perhaps not by default as the cardinality is very high.

Existing Systems
While HTTP 503 can be associated with overload, it is also associated with other conditions that are more binary (such as pod ready). Given this, the design should probably match requests by both a HTTP status code and a header. For example,

adaptiveRateLimit("503", "retry-after")

This allows downstreams to "opt in" to this adaptive work.

Overload
With this design, the default configuration should not entirely prevent overload but rather look for a "sustainable overload". This sends enough traffic that systems can autoscale, and the adaptive approach will send more traffic (overloading them further).

New Risks
This by design disables the sudden increase in traffic until the "adaptive" filter has concluded that this traffic should proceed. Assuming the pid like loop runs every 1s, this means that the window can only adapt every 1s.

Ideally, users should be able to configure the window size adjustments so that either for certain services, or for certain times of the user the window size can be much larger and the upscaling much quicker.

Would you like to work on it?
Yes, but no time

@szuecs
Copy link
Member

szuecs commented May 16, 2022

@andrewhowdencom thanks for the well written feature requests with the references, that I have to read to understand it better.
I have two questions already after reading.

  1. There is a bit of a mix with per instance (pod) and route (kubernetes uses a load balanced backend so per route != per pod). I guess you want really per instance, so a snowflake pod will be also create some "noise"
  2. I think what I really like better is load shedding. For load shedding we should monitor errors and latency and based on these data it should react to let request pass or stop. As far as I understand the google way, would be to send more and more traffic until an endpoint will increase latency (or error rate) too much. Do you think it makes sense as indicator to use observed latency percentiles going significantly up?

side note: The problem if the current metrics storage is that the structure is not able to query for data (at least Prometheus does not allow the client to get it), so we would have data duplication which is not bad per se.

@bocytko
Copy link
Member

bocytko commented May 30, 2022

I think what I really like better is load shedding. For load shedding we should monitor errors and latency and based on these data it should react to let request pass or stop.

The two filter strategies essentially allow users to pick whether they consider success rate or latency (measured).

TL;DR:

  • Admission Control: "The admission control filter probabilistically rejects requests based on the success rate of previous requests in a configurable sliding time window". (with configurable definition of a successful request)
  • Adaptive Concurrency: "The adaptive concurrency filter dynamically adjusts the allowed number of requests that can be outstanding (concurrency) to all hosts in a given cluster at any time. Concurrency values are calculated using latency sampling of completed requests"

Implementing both strategies in skipper would be a good value-add and step towards feature parity with other request proxies. A valid question is in which order to implement these. Looking at envoy's source, Admission Control seems simpler to implement than Adaptive Concurrency.

@andrewhowdencom A good approach would be to configure an envoy or nginx proxy in front of an API and verify which of the two strategies would be a better fit for the problems you observe with overload.

@szuecs
Copy link
Member

szuecs commented May 30, 2022

It's not really clear what should be done. So let me ask more questions, because the docs are not clear to me, nor the spec in the issue.

The two filter strategies essentially allow users to pick whether they consider success rate or latency (measured).

TL;DR:

* **Admission Control**: "The admission control filter probabilistically rejects requests based on the **success rate** of previous requests in a configurable sliding time window". (with configurable definition of a successful request)
  1. Should this be measured per route or per backend application (set of pods) or per backend instance (pod)?
* **Adaptive Concurrency**: "The adaptive concurrency filter dynamically adjusts the allowed number of requests that can be outstanding (concurrency) to all hosts in a given cluster at any time. Concurrency values are calculated using **latency sampling of completed requests**"
  1. Should this be measured per route or per backend application (set of pods) or per backend instance (pod)?

  2. Why does it make sense to dynamically adjust the allowed number of requests?

Implementing both strategies in skipper would be a good value-add and step towards feature parity with other request proxies. A valid question is in which order to implement these. Looking at envoy's source, Admission Control seems simpler to implement than Adaptive Concurrency.

@andrewhowdencom A good approach would be to configure an envoy or nginx proxy in front of an API and verify which of the two strategies would be a better fit for the problems you observe with overload.

I agree testing these strategies makes a lot of sense, more than to implement it for "feature parity". They have features we don't have and we have features they don't have.

@szuecs szuecs changed the title As a software operator, I want Skipper to shed traffic that I am unable to handle load shedding Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants