-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Mitigate the bug where items are re-added constantly to the workqueue. #1193 #1243
Conversation
…queue. This will prevent argo from hanging for up to 16 minutes at a time while processing a rollout. Signed-off-by: Mark Robinson <mrobinson@plaid.com>
Codecov Report
@@ Coverage Diff @@
## master #1243 +/- ##
==========================================
+ Coverage 81.40% 81.42% +0.02%
==========================================
Files 106 106
Lines 9527 9531 +4
==========================================
+ Hits 7755 7761 +6
+ Misses 1251 1250 -1
+ Partials 521 520 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great find! Could you describe the scenario where this happens? I'm just surprised we have not come across this.
Also, can you fix linting errors? After that it looks good to merge.
I'm not entirely sure since it can be hard to reproduce, but the big thing is high pod count (>20) and analysis runs that are frequent and don't terminate. So every 10s for analysis checks. It also correlates with long deployment times (>10m) |
Signed-off-by: Mark Robinson <mrobinson@plaid.com>
Signed-off-by: Mark Robinson <mrobinson@plaid.com>
Signed-off-by: Mark Robinson <mrobinson@plaid.com>
Kudos, SonarCloud Quality Gate passed! |
…queue. argoproj#1193 (argoproj#1243) This will prevent argo from hanging for up to 16 minutes at a time while processing a rollout. Signed-off-by: Mark Robinson <mrobinson@plaid.com> Signed-off-by: caoyang001 <caoyang001@foxmail.com>
…queue. argoproj#1193 (argoproj#1243) This will prevent argo from hanging for up to 16 minutes at a time while processing a rollout. Signed-off-by: Mark Robinson <mrobinson@plaid.com> Signed-off-by: caoyang001 <caoyang001@foxmail.com>
There is a deep bug where items are added to the rollouts workqueue constantly. This is a problem because there is an exponential back-off for items so each add extends the back-off by a factor of two. The backoff maxes out at 16.6 minutes.
This fix will prevent Argo Rollouts from hanging for up to 16 minutes at a time if this case happens. This change reduces the maximum back-off to 10 seconds.
Checklist:
"fix(controller): Updates such and such. Fixes #1234"
.