-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load shedding #2004
Comments
@andrewhowdencom thanks for the well written feature requests with the references, that I have to read to understand it better.
side note: The problem if the current metrics storage is that the structure is not able to query for data (at least Prometheus does not allow the client to get it), so we would have data duplication which is not bad per se. |
The two filter strategies essentially allow users to pick whether they consider success rate or latency (measured). TL;DR:
Implementing both strategies in skipper would be a good value-add and step towards feature parity with other request proxies. A valid question is in which order to implement these. Looking at envoy's source, Admission Control seems simpler to implement than Adaptive Concurrency. @andrewhowdencom A good approach would be to configure an envoy or nginx proxy in front of an API and verify which of the two strategies would be a better fit for the problems you observe with overload. |
It's not really clear what should be done. So let me ask more questions, because the docs are not clear to me, nor the spec in the issue.
I agree testing these strategies makes a lot of sense, more than to implement it for "feature parity". They have features we don't have and we have features they don't have. |
Is your feature request related to a problem? Please describe.
When I am operating a service (particularly with skipper-ingress), that service will sometimes have a sudden and substantial increase in traffic. This can happen in a few situations:
The traffic frequently increases very large amounts; 100 → 200 or 300%. This increases the latency of the service (either through overloading that service directly, or through overloading the downstream services) after which the service eventually times out.
Describe the solution you would like
Skipper can parse the metadata associated with HTTP responses to determine the appropriate throughput of any given downstream system. By keeping a count of the:
Skipper can effectively understand whether the service is presently overloaded. Skipper can, in parallel, limit the throughput to any given "route" — in this case a pod.
Skipper can adjust the throughput toward a pod based on how overloaded the system is (as expected by the system returning a HTTP 503).
Related:
Describe alternatives you've considered (optional)
Alternative solutions include:
backendRateLimit()
— This allows setting a limit per pod, but assumes that pods are stable and that requests are equivalent. Where the workload changes substantially (e.g. larger requests that are more computationally expensive), the effective capacity of any given pod goes down.An adaptive solution would factor this in, and automatically reject traffic that could not be propagated. It would also work blindly — that is, when a downstreams downstream was overloaded (as the HTTP 503 status would be propagated back).
Additional context (optional)
Monitoring
Should this be deployed, it will make the state of the system harder to reason about as the throughput can vary either per route or per route group. Given this, it should be possible to either dump (via SIGQUIT or similar) or export as a metric the current calculated throughput per route.
This could be exported as a metric, but perhaps not by default as the cardinality is very high.
Existing Systems
While HTTP 503 can be associated with overload, it is also associated with other conditions that are more binary (such as pod ready). Given this, the design should probably match requests by both a HTTP status code and a header. For example,
This allows downstreams to "opt in" to this adaptive work.
Overload
With this design, the default configuration should not entirely prevent overload but rather look for a "sustainable overload". This sends enough traffic that systems can autoscale, and the adaptive approach will send more traffic (overloading them further).
New Risks
This by design disables the sudden increase in traffic until the "adaptive" filter has concluded that this traffic should proceed. Assuming the
pid
like loop runs every 1s, this means that the window can only adapt every 1s.Ideally, users should be able to configure the window size adjustments so that either for certain services, or for certain times of the user the window size can be much larger and the upscaling much quicker.
Would you like to work on it?
Yes, but no time
The text was updated successfully, but these errors were encountered: