-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: implement rate limiting retry mechanism for stat forwarder #16330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: implement rate limiting retry mechanism for stat forwarder #16330
Conversation
Replace fixed delay retry with Kubernetes workqueue rate limiter - Add TypedItemFastSlowRateLimiter for adaptive retry delays - Implement fast/slow retry strategy (100ms for first 5, 500ms after) - Add automatic state tracking and cleanup - Update maybeRetry() to use rate limiter instead of fixed sleep - Clean up rate limiter state on successful processing Resolves TODO: Use RateLimitingInterface and NewItemFastSlowRateLimiter
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: forrestIsRunning The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @forrestIsRunning! It looks like this is your first PR to knative/serving 🎉 |
|
Hi @forrestIsRunning. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16330 +/- ##
==========================================
+ Coverage 80.11% 80.24% +0.12%
==========================================
Files 215 216 +1
Lines 13391 13443 +52
==========================================
+ Hits 10728 10787 +59
+ Misses 2304 2293 -11
- Partials 359 363 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dprotaso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add rateLimiter field to Forwarder to track retry state per stat
The original retry mechanism has an individual delay per each stat.
Using the workqueue with the key being the revision means that all retried stats of the same revision will effect each other - specifically 5 delayed stats from the same revision will mean the 6th retied stat will be have to wait 500ms.
I think we can drop the workqueue as part of the implementation.
I wonder if it's just simpler to use the s.retry count and adjust the sleep based on that value (without the queue).
Let me know what you think and whether I interpreted your change incorrectly
|
Also note to contribute to the project you will need to sign the CLA (EasyCLA) which is a LinuxFoundation thing |
|
/hold |
Replace fixed delay retry with Kubernetes workqueue rate limiter
Resolves TODO: Use RateLimitingInterface and NewItemFastSlowRateLimiter
Fixes #
Proposed Changes
workqueue.TypedItemFastSlowRateLimiterfor adaptive retriesrateLimiterfield to Forwarder to track retry state per statmaybeRetry()to userateLimiter.When()andrateLimiter.NumRequeues()fastRetryDelay(100ms),slowRetryDelay(500ms),maxFastRetryAttempts(5)Why these changes?
Fixed delay doesn't adapt to failure scenarios. Fast retries help transient errors recover quickly, while slow retries reduce load for persistent issues. Uses Kubernetes standard workqueue rate limiting.
Release Note
elease-note
feat: implement rate limiting retry mechanism for stat forwarder
Replace fixed delay retry with Kubernetes workqueue rate limiter. Uses fast retries (100ms) for first 5 attempts and slower retries (500ms) after, improving error recovery while reducing system load.