-
Notifications
You must be signed in to change notification settings - Fork 148
Implementable RateLimitPolicy proposal #4346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4346 +/- ##
==========================================
+ Coverage 86.10% 86.12% +0.02%
==========================================
Files 132 132
Lines 14342 14343 +1
Branches 35 35
==========================================
+ Hits 12349 12353 +4
+ Misses 1790 1786 -4
- Partials 203 204 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| ### Versioning and Installation | ||
|
|
||
| The version of the `RateLimitPolicy` API will be `v1alpha1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we always aim to alpha1 and then duplicate for alpha2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite duplicate, but we start at alpha1 and if we add any changes that need a version change, we will change accordingly. See our dev docs on CRD versioning for a more in-depth explanation, its based on the Gateway API crd versioning docs. https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/developer/crd-versioning.md
|
|
||
| Downsides: | ||
|
|
||
| - Harder to reason about capacity of fleet, especially when auto-scaling is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least now i know a bit about fleets and zones in nginx but actually from the text below i'm not sure i can understand the difference. I understoon at first that zone is the area of shared memory, but below i see ratelimit is set per zone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea its a little confusing, but when I talk about fleet I'm just referring to a group of NGINX instances. Zone is specifically related to the nginx module https://nginx.org/en/docs/http/ngx_http_limit_req_module.html, where you can see you set the rate limit on the zone.
Then in each location you reference the zone, that makes the rate limit apply to the location.
|
|
||
| - When there is a a Route with a `RateLimitPolicy` attached that sets a rate limit zone named `zone_one` with `rate = 3r/s` and `zoneSize = 5m`, and a Gateway that also has a `RateLimitPolicy` attached that sets a rate limit zone named `zone_one` with `rate = 5/rs` and `zoneSize = 100m`, the effective policy will choose the rate limit zone settings from the Gateway. | ||
| - When there is a Route with a `RateLimitPolicy` attached that sets a rate limit rule with `zoneName = default_zone_five` and `burst=5`, and a Gateway that also has a `RateLimitPolicy` attached that sets a rate limit rule with `zoneName = default_zone_three` and `burst = 2` and `noDelay = true`, the effective policy will choose the rate limit rule settings from the HTTPRoute. | ||
| - A Route without a policy attached will inherit all settings from the Gateway's policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be possible to set ignorance like: ratelimit: off for some route to ignore inherited as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting thought. This is something I still need to think on a little more, as currently its difficult for a RL to be set at the Gateway level, and for an application developer to disable RL for their app. They would need to make a RL policy, and create a rule that is required to target a zoneName, so not the best experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really want that flow. If a cluster operator defines a restriction on the Gateway, it's usually for a reason, and the app dev shouldn't necessarily have the power to disable that. Defeats the purpose of the hierarchy of roles. And we haven't done that with any other policy either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, currently with the API I made, if a policy is attached at the HTTPRoute level, if there are any rules defined, the rules defined at a policy attached to the Gateway level will be completely overwritten. As explained in my attachments and inheritance section. Thus, application developers are free to define their zone and attach their rules to it, basically disabling what the cluster operator defined but with a loop-around.
The inheritance section is weird since Gateway level zones are prioritized, while HTTPRoute level rules are prioritized. This is because I think the application developers would have more information on how much their applications can handle and thus the RL going to their apps. They wouldn't want to be forced to have a rule be passed down onto their upstream. @sjberman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to have different rules I think, but I don't think we want a toggle: off field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proxy buffering is enabled by nginx by default, so that is specifically to disable that field in nginx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so the scenario where a Policy attached to the gateway sets some settings for proxy buffering, with a policy attached to the route disabling proxy buffering as a whole is different that this rate limiting scenario because RL is turned off be default? So if a cluster operator turned on RL for the gateway, it would defeat the hierarchy of roles for a route to be able to turn it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a bit fuzzy here. Basically what I'm getting it is I don't think we need a field to disable the Gateway policy. We technically don't do it anywhere else, the disable field in the ProxySettingsPolicy is a direct mapping to an nginx directive, and it technically would disable the parent policy. But I don't think we need to design all policies like that because it isn't mapping to an nginx directive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that disable field would be a no-op at the Gateway level, and is a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, ok yea sure your point is valid. I'll continue to think on this but probably plan on not adding a disable field. Thanks for going through this.
I also thought about making zoneName optional, and if one isn't specified the rule doesn't get applied. So if an application developer did want to remove the inherited rules, they could attach a policy with a blank rule, but that has the cons of being confusing for developers who didn't read they need to specify zoneName.
Which I in the end think having zoneName be required would fit the needs for most users more than the edge case of wanting to remove a rule set by the gateway.
| // RateLimitCondition represents a condition to determine if the request should be rate limited. | ||
| type RateLimitCondition struct { | ||
| // JWT defines a JWT condition to determine if the request should be rate limited. | ||
| // | ||
| // +optional | ||
| JWT *RateLimitJWTCondition `json:"jwt,omitempty"` | ||
| // Variable defines a Variable condition to determine if the request should be rate limited. | ||
| // | ||
| // +optional | ||
| Variable *RateLimitVariableCondition `json:"variable,omitempty"` | ||
| // Default sets the rate limit in this policy to be the default if no conditions are met. In a group of policies with the same condition, | ||
| // only one policy can be the default. | ||
| // | ||
| // +optional | ||
| Default *bool `json:"default,omitempty"` | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will a user be allowed to set multiple Rate Limiting conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I haven't thought about it too much but that follows the precedent set in NIC too https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#ratelimitcondition.
| ### JWT Claim Condition | ||
|
|
||
| JWT Claim Condition on a RateLimitPolicy would define a condition for a rate limit by JWT claim. For example, a condition could be on the claim `user_details.level` and the match could be `premium`, meaning this RateLimitPolicy would only apply to requests with a JWT claim `user_details.level` with a value `premium`. The following JWT payload would match the condition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version 5.0.0 of NIC introduced Rate Limiting with JWT Claims.
The PR for that is here: nginx/kubernetes-ingress#7175
It's very much worth studying this PR. From what I remember there were quite a few edge cases that NIC encountered when implementing this. I don't yet know if we will hit the same problems. Mostly want to bring this to everyone's attention
|
|
||
| ### NJS Support | ||
|
|
||
| Adding support for Conditions on the RateLimitPolicy will not be possible through native NGINX OSS and Plus modules and will need to be done through a separate NJS module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this more?
Proposed changes
Problem: A design is needed for the RateLimitPolicy.
Solution: Create an implementable RateLimitPolicy enhancement proposal.
Closes #4059
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.