-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Prevent infinite retries of autoscaling #9574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.19
Are you sure you want to change the base?
Conversation
@weizhouapache would like some advice on this issue. Do you think that if the number of all VMs including those in error and stopped states >= max size then we should stop scaling any further Or do you think if there are VMs in error state we need to retry for a few iterations? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## 4.19 #9574 +/- ##
============================================
- Coverage 15.08% 4.30% -10.79%
============================================
Files 5406 366 -5040
Lines 472889 29514 -443375
Branches 57738 5162 -52576
============================================
- Hits 71352 1270 -70082
+ Misses 393593 28100 -365493
+ Partials 7944 144 -7800
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -67,7 +67,7 @@ public int countAvailableVmsByGroup(long vmGroupId) { | |||
SearchCriteria<Integer> sc = CountBy.create(); | |||
sc.setParameters("vmGroupId", vmGroupId); | |||
sc.setJoinParameters("vmSearch", "states", | |||
State.Starting, State.Running, State.Stopping, State.Migrating); | |||
State.Starting, State.Running, State.Stopping, State.Migrating, State.Error, State.Stopped); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may make autoscaler to not retry deployment if any deployment goes into error state intermittently. Should we include Error state after some n reties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was my worry too, wanted some inputs on it. I'll do that. Thanks @shwstppr
@Pearl1594 my other advice would be
|
I added Stopped State imagining a scenario where in say for whatever reason a host goes down, all VMs belonging to the autoscale group, on that host would enter stopped state. This would cause additional VMs to be redeployed. I thought of it as a problematic scenario. But maybe that's how it should behave, I am not sure of the scope of autoscaling. Maybe you could shed some light on this @weizhouapache |
Yea i think that works. As a user, even if there is anything wrong with the VM, id still like it to be considered under the ASG Group. If not, i would not know i had a faulty VM as i would only find out by going through my list of VMs outside the ASG. |
Description
This PR fixes: #9318
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?