-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cluster-autoscaler][AWS] Massive scale-out when using composed topologySpreadConstraints #4129
Comments
I can confirm it with a single topologySpreadConstraint:
After scaling up deployment just 2 to 30 replicas (should have fit easily on a few nodes), CA started to scale up all node groups to the maximum within a few seconds. (CA 1.20.0, EKS 1.20, 1 ASG per AZ) Might be related to #4099 ? |
Observing the same behaviour after testing with v1.21 ,
Taking it a bit further, tried with the changes suggested in #4099 (ie. Adding a predicateChecker.CheckPredicates call after adding a new node in snapshot (binpacking_estimator.go) to check whether pod can be scheduled on this new node) (cluster-autoscaler-release-1.21...nshekhar221:cluster-autoscaler-1.21.0-with-fix) Testing with the above change has resulted with the following output -
Results/Analysis :
Any feedbacks/suggestions around this will help as we are observing this issue on a frequent basis. |
@MaciekPytel Does the changes cluster-autoscaler-release-1.21...nshekhar221:cluster-autoscaler-1.21.0-with-fix looks like something that can be a solution for this issue? Initial testing logs (shared above) suggests that it help with the massive scale out using failure-domain.beta.kubernetes.io/zone topologySpreadConstraints. Also kindly let us know if there will be any concerns around the same. Happy to raise a PR if suggested changes looks fine. |
The changes make a lot of sense and I agree they could help with this issue. One comment: ExpansionOption also has a list of pods that will be helped by scale-up. This fix changes the estimated node number, but it doesn't modify the list of pods. That means that expander (heuristic that selects between available scale-up options) will act as if all those pending pods could be scheduled on a very small number of nodes. Also, for future reference only: removing node from snapshot is an expensive operation as it drops internal caches. I suspect that with a lot of pending pods using topology spreading one may run into scalability problems with binpacking (which is obviously still a major improvement on current state).
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
I just tried topologySpreadConstraints again, and it still happens for me. I then found out that the latest CA version for EKS 1.22 ist indeed 1.22.1 (which I used) from 2021, so the fix cannot be in there. What options are there if one is stuck with Kubernetes 1.22 or 1.23 (e.g. AWS EKS)? The docs clearly state that the versions should match up.... |
Just started evaluating EKS 1.24, and tried CA 1.25 with it. It seems to work just fine, and the fix for this issue is included, too. So no more scalout explosions with topology spread constraints. |
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
Component version:
1.20.0
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
AWS
What did you expect to happen?:
If I request for 50 pods, at worst-case scenario I expect a maximum 50 new nodes to be provisioned. A small delta/deflection is acceptable.
What happened instead?:
A deployment scaled from 3 pods -> 50 pods and the cluster-autoscaler provisioned 124 new nodes (about 3 times more than needed)
How to reproduce it (as minimally and precisely as possible):
topologySpreadConstraints
:Anything else we need to know?:
The text was updated successfully, but these errors were encountered: