-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NominatedNodeName field to PodStatus #58990
Conversation
4b77324
to
4d58cc1
Compare
@@ -3051,6 +3051,13 @@ type PodStatus struct { | |||
// More info: https://git.k8s.io/community/contributors/design-proposals/node/resource-qos.md | |||
// +optional | |||
QOSClass PodQOSClass `json:"qosClass,omitempty" protobuf:"bytes,9,rep,name=qosClass"` | |||
// NominatedNodeName is set when this pod preempts other pods on the node, but it cannot be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are translated to user-facing documentation. Either don't mention the field name or use the json field name (nominatedNodeName).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details to specify somewhere:
- Will this field always be set before nodeName is set?
- Will this field be unset when nodeName is set?
- Will this field be changed if nodeName is set to something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the comment and moved the field up. Also added one more line to the comment to mention that NominatedNodeName and NodeName are not necessarily the same. More details about how this field is set is provided in preemption design doc.
Note: New fields shouldn't necessarily be put at the bottom. Please place the field near logically related fields, in order of importance / relevance. Will the annotation still be understood/respected in the next release? |
Also, there should be a release note with the introduction of any new API field. |
Release-note guidance: kubernetes/community#484 |
4d58cc1
to
fcc277b
Compare
ce24497
to
556a36b
Compare
556a36b
to
7470654
Compare
Please write the release note in a form that would match what a user would see in the API. |
// This field does not guarantee that the pod will be scheduled on this node. Scheduler may decide | ||
// to place the pod elsewhere if other nodes become available sooner. Scheduler may also decide to | ||
// give the resources on this node to a higher priority pod that is created after preemption. | ||
// As a result, this field may be different than PodSpec.nodeName when the pod is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it desirable to allow them to be different? Why not either set them to match or clear the nominatedNodeName once nodeName is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of reasons:
- It is useful to understand/debug behavior of the cluster. When a pod causes preemption, but it is scheduled elsewhere, checking nominatedNodeName reveals that this is the pod that caused preemption on the node. Users who don't have access to scheduler logs can still see this field.
- Performance: setting the nominatedNodeName to match the nodeName or clearing it requires another API call.
Done. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bgrant0607, bsalamat The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
Today, Scheduler uses an annotation called "nominated-node-name" to mark a preemptor Pod. This annotation helps scheduler know about the Pods that are destined to run on the nodes so that the resources made available by preemption is not allocated to a different Pod. In a recent discussion with @bgrant0607, we learned that we should change the annotation to a field as this field can be used by multiple schedulers and other components that may make scheduling-related decisions (descheduler, auto-scaler, kube-arbitrator, ...).
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):ref #57471
Special notes for your reviewer:
Release note:
/sig scheduling