-
Notifications
You must be signed in to change notification settings - Fork 15.2k
KEP-4671 Add docs for Workload API and Gang scheduling #53296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4671 Add docs for Workload API and Gang scheduling #53296
Conversation
👷 Deploy Preview for kubernetes-io-vnext-staging processing.
|
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
erictune
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text looks good.
Should the new concepts page be linked from somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
Because Pod is a stable API, you also need to update the Pod documentation. You need to do this work even though the new APIs are only alpha.
Explain that the behavior of Pod depends on whether the reader, a cluster administrator, has or has not enabled the relevant feature gates.
Watch out for putting new documentation in one page. It's tempting to do that because what you are documenting is part of one package of improvements; however, readers learn about different elements of Kubernetes in different pages, and these improvements touch on several of those (not just scheduling).
I would put most of the new content into the Workloads
section of the docs, for example by adding a section about Pod groups, at one of:
• https://kubernetes.io/docs/concepts/workloads/pod-groups/
• https://kubernetes.io/docs/concepts/workloads/pods/groups/
(I prefer the former, personally; PodGroup is an API separate from Pod).
Gang scheduling, however, I would place at
• https://kubernetes.io/docs/concepts/scheduling-eviction/gang-scheduling/
You can also, either for alpha or beta, work with SIG Docs to add a new tutorial. If you do, various other pages can and should link there.
content/en/docs/concepts/scheduling-eviction/workload-aware-scheduling.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/scheduling-eviction/workload-aware-scheduling.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/scheduling-eviction/workload-aware-scheduling.md
Outdated
Show resolved
Hide resolved
content/en/docs/reference/command-line-tools-reference/feature-gates/GangScheduling.md
Outdated
Show resolved
Hide resolved
content/en/docs/reference/command-line-tools-reference/feature-gates/GenericWorkload.md
Outdated
Show resolved
Hide resolved
content/en/docs/reference/command-line-tools-reference/feature-gates/GenericWorkload.md
Outdated
Show resolved
Hide resolved
| spec: | ||
| # controllerRef provides a link to the object that manages this Workload, | ||
| # such as a Kubernetes Job. This is for tooling and observability. | ||
| controllerRef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need to explain the difference between "the Job controller" (which is a controller) and "a Job" (which represents a desired and observed state that the Job controller operates on)
| because no single node has enough capacity for them. The job cannot run, | ||
| but the scheduled Pods waste expensive resources that other applications could use. | ||
|
|
||
| Workload Aware Scheduling introduces a mechanism for the scheduler to identify and manage a group of Pods as a single, atomic workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aim to write the documentation mostly as if the feature is already generally available, and then garnish it with caveats about it actually being alpha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good documentation is often timeless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't add this file at all.
|
/sig scheduling node |
|
|
||
| ## What is Workload Aware Scheduling? | ||
|
|
||
| The default Kubernetes scheduler makes decisions for one Pod at a time. This model works sufficiently good for stateless applications, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't exactly true. The default scheduler's behavior, at the time this doc is live, depends on whether you have enabled the GangScheduling feature gate.
v1.35 K8s will, of course, support gang scheduling (as alpha), in-tree.
|
@lmktfy thank you for your valuable review. Just to be on the same page:
What Pod documentation are you referring to? Are you talking about mentioning WorkloadReference somewhere in the “https://kubernetes.io/docs/concepts/workloads/pods/” section, or somewhere else?
So I should split the documentation page into two parts: move the part about the PodGroups to https://kubernetes.io/docs/concepts/workloads/pods-groups/, and the part about Gang Scheduling to https://kubernetes.io/docs/concepts/scheduling-eviction/gang-scheduling/, right? Should I describe the part about (whole) Workload API in the PodGroups docs or somewhere else?
Good idea, let's do that for the beta. |
Yes, when I talk about the documentation for the Pod API, I mean https://kubernetes.io/docs/concepts/workloads/pods/ and contents. There is also an API reference, but we generate that from the OpenAPI. You will need to update Pod to tell people that Pods can be put into groups.
Yes, that's the split, but I might (only might) document the Workload API in its own section / page, somewhere within https://kubernetes.io/docs/concepts/workloads/ |
|
Hello @macsko 👋! I'm reaching out from the Docs team. Just checking in as we approach Docs Freeze on 3rd December 2025, 12:00 UTC. |
65f0e63 to
71fb0d6
Compare
|
@lmktfy I've updated the docs based on your comments. PTAL whether the current structure make sense |
bb4f3f2 to
bdb4b70
Compare
wojtek-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment - other than that it LGTM from technical POV.
bdb4b70 to
fda060d
Compare
|
/lgtm LGTM from technical POV. |
|
LGTM label has been added. DetailsGit tree hash: f7a8e4c3bbe59bde46e52c28e0686c51db66d358 |
dom4ha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maciek, very well written, so LGTM from me.
I have just a few minor suggestions.
| --- | ||
|
|
||
| Enables the support for [Workload API](/docs/concepts/workloads/workload-api/) to express scheduling requirements | ||
| at the workload level. Pods can now reference a specific Workload PodGroup using the spec.workloadRef field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| at the workload level. Pods can now reference a specific Workload PodGroup using the spec.workloadRef field. | |
| at the workload level. Pods can now reference a specific Workload PodGroup they belong to using the spec.workloadRef field. |
| The [Workload API](/docs/concepts/workloads/workload-api/) allows you to define a group of Pods | ||
| and apply advanced scheduling policies to them, such as [gang scheduling](/docs/concepts/scheduling-eviction/gang-scheduling/). | ||
| This is particularly useful for batch processing and machine learning workloads | ||
| where "all-or-nothing" placement is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| where "all-or-nothing" placement is required. | |
| where "all-or-nothing" scheduling is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think placement is OK, TBH.
| ### Gang policy | ||
| The `gang` policy enforces "all-or-nothing" scheduling. This is essential for tightly-coupled workloads | ||
| where partial startup results in deadlocks or wasted resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| where partial startup results in deadlocks or wasted resources. | |
| need a group of Pods to be scheduled simultaneously to function correctly. Partial startup results in resource waste and may even lead to deadlocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still update the merged docs, even after docs freeze
The key thing about the deadline is that we must have docs that are at least good enough ahead of the upcoming release.
| 2. Once the quorum is met, the scheduler attempts to find placements for all Pods in the group. | ||
| All assigned Pods wait at the `WaitOnPermit` gate during this process. | ||
| Note that in the Alpha phase of this feature, finding a placement is based on pod-by-pod scheduling, | ||
| rather than a single-cycle approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| rather than a single-cycle approach. | |
| rather than a more sophisticated logic capable of scheduling all required pods at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still update the merged docs, even after docs freeze
The key thing about the deadline is that we must have docs that are at least good enough ahead of the upcoming release.
|
|
||
| If a Pod references a Workload that does not exist, or a pod group that is not defined within that Workload, | ||
| the Pod will remain pending. It is not considered for placement until you create the missing Workload object | ||
| or recreate it to include the missing `PodGroup` definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For beta, try for this:
| or recreate it to include the missing `PodGroup` definition. | |
| or recreate it to include the missing pod group definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still update the merged docs, even after docs freeze
The key thing about the deadline is that we must have docs that are at least good enough ahead of the upcoming release.
content/en/docs/reference/command-line-tools-reference/feature-gates/GenericWorkload.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should update https://kubernetes.io/docs/concepts/policy/ to hyperlink here
lmktfy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
LGTM label has been added. DetailsGit tree hash: 739c4b88c894c8d06cfe33d52e02f5f5444fe469 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: erictune, lmktfy The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
This PR adds feature gates docs and a new Workload Aware Scheduling tab to the scheduling docs based on KEP-4671.
Issue
KEP: kubernetes/enhancements#4671