Skip to content

Conversation

@bjee19
Copy link
Contributor

@bjee19 bjee19 commented Nov 27, 2025

Proposed changes

Problem: A design is needed for the RateLimitPolicy.

Solution: Create an implementable RateLimitPolicy enhancement proposal.

Closes #4059

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 27, 2025
@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.12%. Comparing base (775388b) to head (b3eecc3).
⚠️ Report is 26 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4346      +/-   ##
==========================================
+ Coverage   86.10%   86.12%   +0.02%     
==========================================
  Files         132      132              
  Lines       14342    14343       +1     
  Branches       35       35              
==========================================
+ Hits        12349    12353       +4     
+ Misses       1790     1786       -4     
- Partials      203      204       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


### Versioning and Installation

The version of the `RateLimitPolicy` API will be `v1alpha1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we always aim to alpha1 and then duplicate for alpha2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite duplicate, but we start at alpha1 and if we add any changes that need a version change, we will change accordingly. See our dev docs on CRD versioning for a more in-depth explanation, its based on the Gateway API crd versioning docs. https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/developer/crd-versioning.md


Downsides:

- Harder to reason about capacity of fleet, especially when auto-scaling is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least now i know a bit about fleets and zones in nginx but actually from the text below i'm not sure i can understand the difference. I understoon at first that zone is the area of shared memory, but below i see ratelimit is set per zone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea its a little confusing, but when I talk about fleet I'm just referring to a group of NGINX instances. Zone is specifically related to the nginx module https://nginx.org/en/docs/http/ngx_http_limit_req_module.html, where you can see you set the rate limit on the zone.

Then in each location you reference the zone, that makes the rate limit apply to the location.


- When there is a a Route with a `RateLimitPolicy` attached that sets a rate limit zone named `zone_one` with `rate = 3r/s` and `zoneSize = 5m`, and a Gateway that also has a `RateLimitPolicy` attached that sets a rate limit zone named `zone_one` with `rate = 5/rs` and `zoneSize = 100m`, the effective policy will choose the rate limit zone settings from the Gateway.
- When there is a Route with a `RateLimitPolicy` attached that sets a rate limit rule with `zoneName = default_zone_five` and `burst=5`, and a Gateway that also has a `RateLimitPolicy` attached that sets a rate limit rule with `zoneName = default_zone_three` and `burst = 2` and `noDelay = true`, the effective policy will choose the rate limit rule settings from the HTTPRoute.
- A Route without a policy attached will inherit all settings from the Gateway's policy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be possible to set ignorance like: ratelimit: off for some route to ignore inherited as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting thought. This is something I still need to think on a little more, as currently its difficult for a RL to be set at the Gateway level, and for an application developer to disable RL for their app. They would need to make a RL policy, and create a rule that is required to target a zoneName, so not the best experience.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really want that flow. If a cluster operator defines a restriction on the Gateway, it's usually for a reason, and the app dev shouldn't necessarily have the power to disable that. Defeats the purpose of the hierarchy of roles. And we haven't done that with any other policy either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, currently with the API I made, if a policy is attached at the HTTPRoute level, if there are any rules defined, the rules defined at a policy attached to the Gateway level will be completely overwritten. As explained in my attachments and inheritance section. Thus, application developers are free to define their zone and attach their rules to it, basically disabling what the cluster operator defined but with a loop-around.

The inheritance section is weird since Gateway level zones are prioritized, while HTTPRoute level rules are prioritized. This is because I think the application developers would have more information on how much their applications can handle and thus the RL going to their apps. They wouldn't want to be forced to have a rule be passed down onto their upstream. @sjberman

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to have different rules I think, but I don't think we want a toggle: off field.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proxy buffering is enabled by nginx by default, so that is specifically to disable that field in nginx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so the scenario where a Policy attached to the gateway sets some settings for proxy buffering, with a policy attached to the route disabling proxy buffering as a whole is different that this rate limiting scenario because RL is turned off be default? So if a cluster operator turned on RL for the gateway, it would defeat the hierarchy of roles for a route to be able to turn it off.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a bit fuzzy here. Basically what I'm getting it is I don't think we need a field to disable the Gateway policy. We technically don't do it anywhere else, the disable field in the ProxySettingsPolicy is a direct mapping to an nginx directive, and it technically would disable the parent policy. But I don't think we need to design all policies like that because it isn't mapping to an nginx directive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that disable field would be a no-op at the Gateway level, and is a bit confusing.

Copy link
Contributor Author

@bjee19 bjee19 Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ok yea sure your point is valid. I'll continue to think on this but probably plan on not adding a disable field. Thanks for going through this.

I also thought about making zoneName optional, and if one isn't specified the rule doesn't get applied. So if an application developer did want to remove the inherited rules, they could attach a policy with a blank rule, but that has the cons of being confusing for developers who didn't read they need to specify zoneName.

Which I in the end think having zoneName be required would fit the needs for most users more than the edge case of wanting to remove a rule set by the gateway.

Comment on lines +327 to +342
// RateLimitCondition represents a condition to determine if the request should be rate limited.
type RateLimitCondition struct {
// JWT defines a JWT condition to determine if the request should be rate limited.
//
// +optional
JWT *RateLimitJWTCondition `json:"jwt,omitempty"`
// Variable defines a Variable condition to determine if the request should be rate limited.
//
// +optional
Variable *RateLimitVariableCondition `json:"variable,omitempty"`
// Default sets the rate limit in this policy to be the default if no conditions are met. In a group of policies with the same condition,
// only one policy can be the default.
//
// +optional
Default *bool `json:"default,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a user be allowed to set multiple Rate Limiting conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I haven't thought about it too much but that follows the precedent set in NIC too https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#ratelimitcondition.

Comment on lines +135 to +137
### JWT Claim Condition

JWT Claim Condition on a RateLimitPolicy would define a condition for a rate limit by JWT claim. For example, a condition could be on the claim `user_details.level` and the match could be `premium`, meaning this RateLimitPolicy would only apply to requests with a JWT claim `user_details.level` with a value `premium`. The following JWT payload would match the condition:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version 5.0.0 of NIC introduced Rate Limiting with JWT Claims.
The PR for that is here: nginx/kubernetes-ingress#7175

It's very much worth studying this PR. From what I remember there were quite a few edge cases that NIC encountered when implementing this. I don't yet know if we will hit the same problems. Mostly want to bring this to everyone's attention


### NJS Support

Adding support for Conditions on the RateLimitPolicy will not be possible through native NGINX OSS and Plus modules and will need to be done through a separate NJS module.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this more?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

Design Rate Limiting

5 participants