-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SelectorSpreadPriority does not spread pods across zones #71327
Comments
re Problem 1: for per node, we only need to compare current nodes as they will share the same maxPods. for per zone, we need to account other nodes, e.g. because of resource predicates. re problem 2: a little about your case :( |
Problem 1:
Problem 2:
Here is an example list of selectors.
All pods matching any one service selector are counted in CalculateSpreadPriorityMap
So in our case, old pods are counted during scheduling. But they get killed during the deployment. |
Gentle ping. |
/assign bsalamat |
@Ramyak |
I will try to separate this into two PRs. Any other suggestions welcome. |
I am working Problem 1. Looks like I have to open a new issue. |
What happened:
/kind bug
What you expected to happen:
Problem 1: Predicates filter nodes. Existing pods on those nodes will be not be counted when calculating max pods per (zone or node) resulting in imbalanced cluster.
Eg: GeneralPredicates removes nodes which cannot fit this pod. If any of the pods are already scheduled on this node, they are not considered when counting max pods in
CalculateSpreadPriorityReduce
.Problem 2: When there are 2 selectors(service and replication controller), it is sufficient to match any one selector for distribution. This creates imbalance [selector match code].
Pods from previous deploys matches
service selector
and are counted when distributing pods across zones/nodes (Even though they do not matchreplicaset selector
) . These pods will be deleted. After the deploy completes, the cluster is imbalanced - by zone and/or pods per node.How to reproduce it (as minimally and precisely as possible):
Problem 1:
Get pods to get scheduled on nodes with very high cpu utilization [Existing cpu utilization + 1 new pod will result in
allocatable.MilliCPU
almost equal to 0].GeneralPredicate will drop this node after scheduling the first pod.
This will lead to over utilization of an already loaded zone.
If there are enough nodes like this one, it will create a pile on effect resulting in most pods scheduled in an already loaded zone.
Problem 2:
Have 2 selectors and deploy 30 pods across 3 zones. Twice.
Anything else we need to know?:
Environment:
kubectl version
):/sig scheduling
/kind bug
The text was updated successfully, but these errors were encountered: