Skip to content

Commit

Permalink
Addressed review feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
ahg-g committed Feb 5, 2023
1 parent 3e476d0 commit a8da4e0
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 29 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -156,16 +156,17 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
## Summary

In [#3521](https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/3521-pod-scheduling-readiness)
we introduced PodSchedulingReadiness.
we introduced PodSchedulingReadiness. The main use case for PodSchedulingReadiness was to
enable building external resource controllers (like extended schedulers, dynamic quota managers)
to make a decision on when the pod should be eligible for scheduling by kube-scheduler.

The main use case for this feature was to enable building external resource controllers (like extended schedulers,
dynamic quota managers) to make a decision on when the pod should be eligible for scheduling by kube-scheduler.
This enhancement proposes to make the pod scheduling directives node selector and node
affinity mutable as long as the pod is gated and the update to node selector/affinity
further limits the existing constrains.

This enhancement proposes to make pod scheduling directives (nodeSelector, affinity) mutable as long as the pod
is gated and the update only further constrains the scheduling of the pod, and doesn't expand it.

This allows external resource controllers to not only decide when the pod should be eligible for scheduler, but
also where it should land. This is similar to what we did for suspended Job in [#2926](https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/2926-job-mutable-scheduling-directives);
This allows external resource controllers to not only decide when the pod should be eligible
for scheduling, but also where it should land. This is similar to what we did for suspended
Job in [#2926](https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/2926-job-mutable-scheduling-directives).

<!--
This section is incredibly important for producing high-quality, user-focused
Expand Down Expand Up @@ -204,8 +205,8 @@ demonstrate the interest in a KEP within the wider Kubernetes community.

### Goals

- Allow adding node affinity or node selector of a pod that is blocked on a
scheduling readiness gate.
- Allow mutating a pod that is blocked on a scheduling readiness gate with
a more constrained node affinity/selector.

<!--
List the specific goals of the KEP. What is it trying to achieve? How will we
Expand All @@ -214,11 +215,11 @@ know that this has succeeded?

### Non-Goals

- Allow mutating scheduling directives of pods unconditionally.
- Allow mutating node affinity/selector of pods unconditionally.
- Restrict updating tolerations on scheduling gates. We already allow adding tolerations
unconditionally, and so restricting that to gated pods is a backward incompatible change.
- Allow mutating pod affinity constraints. It is not clear how this can be done without
potentially violating a policy check that we previously done at pod CREATE.
unconditionally, and so restricting that to gated pods is not a backward compatible change.
- Allow mutating pod (anti)affinity constraints. It is not clear how this can be done without
potentially violating a policy check that was previously done at pod CREATE.

<!--
What is out of scope for this KEP? Listing non-goals helps to focus discussion
Expand All @@ -227,12 +228,13 @@ and make progress.

## Proposal

The proposal is to relax update validation of scheduling directives, specifically node affinity and selector, of Pods that
are blocked on a scheduling readiness gate (i.e., not yet considered for scheduling by kube-scheduler), specifically. Note
that adding tolerations and mutating labels and annotations are already allowed.
Node affinity and selector is currently immutable for Pods. The proposal is to relax this validation
for Pods that are blocked on a scheduling readiness gate (i.e., not yet considered for scheduling by
kube-scheduler), specifically. Note that adding tolerations and mutating labels and annotations are
already allowed.

This has no impact on the applications controllers. The applications controller generally has no dependency
on those scheduling directives expressed in the application's pod template.
This has no impact on the applications controllers. Application controllers generally has no dependency
on the scheduling directives expressed in the application's pod template.


<!--
Expand All @@ -256,13 +258,13 @@ bogged down.
#### Story 1

I want to build a controller that implements workload queueing and influences when and where a
workload should run. The workload controller creates the pods a workload instance, and to control
workload should run. The workload controller creates the pods of a workload instance. To control
when and where the workload can run, I have a webhook that injects a scheduling readiness gate,
and a controller that removes the gate when capacity becomes available.

At the time of creating the workload, we may not know on which part of the cluster (.e.g., a zone or
a VM type) a workload should run. To control where a workload should run, the controller can update
node affinity of the workload's pods; by doing so, the queue controller is able to influence the
At the time of creating the workload, we may not know on which subset of the nodes (e.g., a zone or
a VM type) a workload should run. To control where a workload should run, the controller updates
the node selector/affinity of the workload's pods; by doing so, the queue controller is able to influence the
scheduler and autoscaler decisions.

### Notes/Constraints/Caveats (Optional)
Expand All @@ -276,7 +278,7 @@ This might be a good place to talk about core concepts and how they relate.

### Risks and Mitigations

- New calls from a queue controller to the API server to update affinity or tolerations.
- New calls from a queue controller to the API server to update node affinity/selector.
The mitigation is for such controllers to make a single API call for both updating
affinity/tolerations and removing the scheduling gate.

Expand Down Expand Up @@ -320,7 +322,10 @@ are ORed, while `nodeSelectorTerms[].matchExpressions` list and
`nodeSelectorTerms[].fieldExpressions` are ANDed, and so for pods with existing
`nodeAffinity` terms, only additions of `NodeSelectorRequirements` to `matchExpressions`
or `fieldExpressions` are allowed; for pods with no `nodeAffinity` terms, then
adding anything is allowed.
adding anything is allowed.

No changes to existing terms will be allowed since that makes it harder to reason about the change and it is not necessary
to implement the semantics of "further constraining the node selection".

<!--
This section should contain enough information that the specifics of your
Expand Down Expand Up @@ -381,8 +386,8 @@ extending the production code to implement this enhancement.
##### Integration tests

- Test that adding node affinity to a pod with no scheduling gates should be rejected.
- Test that adding affinity that expands the existing constraints should always be rejected, even for pods with scheduling gates.
- Test that adding affinity that constraints the node selection to a pod with scheduling gates should be allowed. Also, test that the
- Test that adding node affinity/selector that relaxes the existing constraints should always be rejected, even for pods with scheduling gates.
- Test that adding node affinity/selector that constraints the node selection to a pod with scheduling gates should be allowed. Also, test that the
scheduler observes the update and acts on it when the scheduling gate is lifted.
- Test feature enablement/disablement.

Expand Down Expand Up @@ -415,7 +420,7 @@ We will release the feature directly in Beta state. Because the feature is opt-i
a new field, there is no benefit in having an alpha release.

#### Beta
- Feature implemented behind a feature flag
- Feature implemented behind the same feature flag we used for Pod Scheduling Readiness feature.
- Unit and integration tests passing

#### GA
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ approvers:

see-also:
- "/keps/sig-scheduling/2926-job-mutable-scheduling-directives"
- "/keps/sig-scheduling/3521-pod-scheduling-readiness"

# The target maturity stage in the current dev cycle for this KEP.
stage: beta
Expand All @@ -28,9 +29,10 @@ milestone:
stable: "v1.28"

feature-gates:
- name: PodMutableNodeSchedulingDirectives
- name: PodSchedulingReadiness
components:
- kube-apiserver
- kube-scheduler
disable-supported: true

# The following PRR answers are required at beta release
Expand Down

0 comments on commit a8da4e0

Please sign in to comment.