KEP-4671 : Gang Scheduling #5558

+
+- It is not a goal to take away the responsibility from controllers to create pods.
+- It is not a goal to bring fairness or multiple workload queues into kube-scheduler.  Kueue and Volcano.sh will continue to provide this.
+- It is not a goal to be able to map all the declarative state and behaviors of all workloads into ths `Workload` object. It will focus on state that is relevant to kube-scheduler, and possibly to cluster autoscalers, reschedulers and closely related components.


Suggested change

- It is not a goal to be able to map all the declarative state and behaviors of all workloads into ths `Workload` object. It will focus on state that is relevant to kube-scheduler, and possibly to cluster autoscalers, reschedulers and closely related components.

- It is not a goal to be able to map all the declarative state and behaviors of all workloads into the `Workload` object. It will focus on state that is relevant to kube-scheduler, and possibly to cluster autoscalers, reschedulers and closely related components.

ricardomaraschini · 2025-09-23T10:08:57Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+- It is not a goal to take away the responsibility from controllers to create pods.
+- It is not a goal to bring fairness or multiple workload queues into kube-scheduler.  Kueue and Volcano.sh will continue to provide this.
+- It is not a goal to be able to map all the declarative state and behaviors of all workloads into ths `Workload` object. It will focus on state that is relevant to kube-scheduler, and possibly to cluster autoscalers, reschedulers and closely related components.
+- Introducing a resource reservation that can later hold pods.  This feature seems desirable, and will be informed by experience gained from _Gang Scheduling woth using Workload Object_. 


Not sure what "Gang Scheduling woth using Workload Object" meant here. Typo most likely.

ricardomaraschini · 2025-09-23T10:11:05Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+    - name: "pg1"
+      gangMode: Single
+      minCount: 100
+      schedulingTimeoutSeconds: 60


Would it be better to use a time.Duration here ?

schedulingTimeoutSeconds and minCount should be part of a gangSchedulingPolicy. GangSchedulingPolicy type is not currently embedded in PodGroup.

It is part of the PodGroup, as far as I can see. Anyways, this is orthogonal to what I mentioned in my comment, no ?

@ricardomaraschini - we have examples of both in the APIs; we will decide on that based on API approvers during API review. For now added a TODO in the type definition below

@ingvagabund - updated to mention specification gangSchedulingPolicy - PTAL if that's what you meant

Yes. Thank you. There's also the current example that still have both fields on the podGroup level:

spec: podGroups: # or gangGroups -- TBD - name: "pg1" gangMode: Single minCount: 100 schedulingTimeoutSeconds: 60

->

spec: podGroups: # or gangGroups -- TBD - name: "pg1" gangMode: Single gangSchedulingPolicy: minCount: 100 schedulingTimeoutSeconds: 60

ricardomaraschini · 2025-09-23T10:18:49Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+
+## Summary
+
+In this KEP, kube-scheduler is modified to support gang scheduling[^1]. To implement gang scheduling, kube-scheduler identifies pods that are in a group and waits until all pods reach the same stage of the scheduling/binding cycle before allowing any pods from the group to advance past that point.  If not all pods can reach that point before a timeout expires, then the scheduler stops trying to schedule that group, and all pods release all their resources.  This allows other workloads to try to allocate those resources.


As a Workload may contain multiple groups what is the expected kube-scheduler behavior when we have reached the desired number of pods for a single group but not for all ? We keep the whole Workload from scheduling ?

Each group has its own timeout. So I presume each group will eventually time out. So if one group times out is there any point of keeping others around? One could get to a case where one group times out while another one is getting retried and again.

then the scheduler stops trying to schedule that group

Will there be any retry when a group times out and the resources get released? Or, does it mean the pods enter a failed state?

The way we think about individual groups (gangs and/or gang-replicas) is that they are to independent from each other. They all form a workload, but can operate independently from each other.

So if we can schedule one group but not the other - this is still fine, we should schedule and run it.

If the underlying intention is that you really need all groups or none of them, then they are no longer separate groups - they should form a single gang.

[The structure will become more important in future extensions, where e.g. we may want to schedule multiple gangs topologically close to each other. But that is not part of this KEP.]

I added a paragraph about it into the API section.

@wojtek-t consider expanding the summary section to allow readers distinguishing the gang from workload concepts.

Will there be any retry when a group times out and the resources get released? Or, does it mean the pods enter a failed state?

Currently a group will be continuously retried and in this or the following KEPs, the workload status will be set to "unschedulable". Individual pods may be get unschedulable status as well, but only those which failed to schedule in a given attempt.

Long term we don't expect workload pods to become unschedulable, as we plan to propose introducing a new workload-scheduling phase which will produce a proposed placement (or get a proposed placement from the external workload scheduler). If pods still fail to schedule, it means the proposed placement was wrong and needs to be modified.

keps/sig-scheduling/4671-gang-scheduling/README.md

ricardomaraschini · 2025-09-23T10:22:22Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+// WorkloadSpec describes a workload in a portable way that scheduler and related
+// tools can understand.  
+type WorkloadSpec struct {
+    // Optional but recommended to set.


Suggested change

// Optional but recommended to set.

// ControllerRef points to the true workload, e.g. Deployment. It is optional

// and it is intended to make this mapping easier for things like CLI tools.

keps/sig-scheduling/4671-gang-scheduling/README.md

wojtek-t · 2025-09-23T12:06:04Z

/assign @sanposhiho @dom4ha

I think this is pretty much ready for Alpha review, PTAL

macsko · 2025-10-03T13:44:30Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+* Within a Workload there is a list of groups of pods. Each group represents a top-level division of pods within a Workload.  Each group can be independently gang scheduled (or not use gang scheduling). This group is named
+* In a future , we expect that this group can optionally specify further subdivision into sub groups.  Each sub-group can have an index.  The indexes go from 0 to N, without repeats or gaps. These subgroups are called
+* In subsequent KEPs, we expect that a sub-group can optionally specify further subdivision into pod equivalence classes.  All pods in a pod equivalence class have the same values for all fields that affect scheduling feasibility.  These pod equivalence classes are called


You accidentally removed the names for these three

Oops - fixed now.

macsko · 2025-10-03T14:10:52Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+* `Workload` is the resource Kind.
+* `scheduling` is the ApiGroup.
+* `spec.workload` is the name of the new field in pod.
+* Within a Workload there is a list of groups of pods. Each group represents a top-level division of pods within a Workload.  Each group can be independently gang scheduled (or not use gang scheduling). This group is named


Suggested change

* Within a Workload there is a list of groups of pods. Each group represents a top-level division of pods within a Workload. Each group can be independently gang scheduled (or not use gang scheduling). This group is named

* Within a Workload there is a list of groups of pods. Each group represents a top-level division of pods within a Workload. Each group can be independently gang scheduled (or not use gang scheduling). This group is named `PodGroup`.

keps/sig-scheduling/4671-gang-scheduling/README.md

ahg-g

This is great! I left a couple of clarifying comments.

ahg-g · 2025-10-06T02:39:15Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+- Implement the first version of `Workload` API necessary for defining a Gang
+- Ensuring that we can extend `Workload` API in backward compatible way toward north-star API
+- Ensuring that `Workload` API will be usable for both built-in and third-party workload controllers and APIs
+- Implement first version of gang-scheduling in kube-scheduler


What scheduling constraints will or will not be supported? pod affinity/anti-affinity, node affinity/selector, pod topology spread? if all of them, then I would mention that explicitly.

@erictune

Yes - we want to support all existing features - though potentially not in an optimal way yet.
Added.

keps/sig-scheduling/4671-gang-scheduling/README.md

soltysh

A few comments, the biggest one that I'd like to get more clarity is about the workload inferring capabilities, which I'd like to get clarified before we move forward with this.

soltysh · 2025-10-06T17:51:14Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+          valueFrom:
+            fieldRef:
+              fieldPath:
+               "metadata.annotations['batch.kubernetes.io/job-completion-index']"


This is not a blocker for this particular KEP, but I'd like to hear from @helayoty where we stand with this? The questions from Wojtek still hold, but they will most likely affect #5548 not this work.

soltysh · 2025-10-06T17:55:21Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+Moreover, the visibility into issues (debuggability) will depend on [#5510], but we don't
+treat it as a blocker.
+
+[#5510]: https://github.com/kubernetes/enhancements/pull/5510


Nit: it's better to use the KEP tracking issue rather than a particular PR, so #5501

soltysh · 2025-10-06T18:08:26Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+
+// PodGroupPolicy defines scheduling configuration of a PodGroup.
+type PodGroupPolicy struct {
+    // Exactly one of the policies should be set.


Nit: from api perspective you'll want to add a discriminator field, expressing which policy is at play. That's what we've been doing with all the union-like types.

soltysh · 2025-10-06T18:10:27Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+will be updated to create an appropriate `Workload` objects themselves whenever they can appropriately infer
+the intention from the desired state.
+Note that given scheduling options are stored in the `Workload` object, pods linked to the `Workload`
+object will not be scheduled until this `Workload` object is created and observed by the kube-scheduler.


I'm still having a hard time with these ideas. If you control the whole Job object, why would you want to differentiate between those two fields (one at job spec level, and the other at pod template level) which as an author of a job have a full control of? What's the value of duplicating this information?

soltysh · 2025-10-06T18:11:12Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+will be updated to create an appropriate `Workload` objects themselves whenever they can appropriately infer
+the intention from the desired state.
+Note that given scheduling options are stored in the `Workload` object, pods linked to the `Workload`
+object will not be scheduled until this `Workload` object is created and observed by the kube-scheduler.


I'm inclined to push this such that for alpha, you only leave the initial part talking about user-managed Workload resource.

soltysh · 2025-10-06T18:12:56Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+1. Ensure that pods being part of a gang are not bound if all pods belonging to it can't be scheduled.
+2. Provide the "optimal enough" placement by considering all pods from a gang together.
+3. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by kube-scheduler.
+4. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by different


Any particular reason you have this element mentioned twice? Why can't you just write this as:

Suggested change

4. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by different

4. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by any scheduler (kube-scheduler, or third-party provided scheduler).

Yes - the reason is that these problems have a bit different solutions and will be resolved at different timeframe (as described below - addressing (3) is the requirement for Beta, but addressing (4) will require a lot of follow-up work (reservations-like approach)).
Because of that I'm going to leave it as is, to make referencing those problems below easier.

soltysh · 2025-10-06T18:15:06Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+will be updated to create an appropriate `Workload` objects themselves whenever they can appropriately infer
+the intention from the desired state.
+Note that given scheduling options are stored in the `Workload` object, pods linked to the `Workload`
+object will not be scheduled until this `Workload` object is created and observed by the kube-scheduler.


Also, in the long run, does that mean we'll be extending all of built-in controllers with similar changes? I'm having a hard time supporting this kind of additions across the board.

keps/sig-scheduling/4671-gang-scheduling/README.md

soltysh · 2025-10-06T18:20:09Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+###### How can this feature be enabled / disabled in a live cluster?
+
+- [X] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: Workload/GenericWorkload/NativeWorkload


kep.yaml talks about Workload, would be nice to pick one and use it consistently across all places 😉

Switched to GenericWorkload for now - although we may adjust the name during the coding.

soltysh · 2025-10-06T18:22:08Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+will rollout across nodes.
+-->
+
+###### What specific metrics should inform a rollback?


Although not required, have you considered what kind of metrics you could expose for this functionality and where? I'm thinking about exposing gang-related metrics in the scheduler, as the most appropriate place to track this.

I don't expect any particular metrics for the Workload API itself (we should rather rely on standard kube-apiserver metrics for that).

But for GangScheduling feature, we absolutely need to expose some metrics. The two primary ones I have on my mind is:

error rate - number of times we need to reject the placement that was already computed because of not being able to satisfy the whole gang

e2e latency for the whole gang

I just didn't add them from the beginning as we may want to tweak it somehow.

I was thinking primarily about the gangscheduling feature, since like most of our scheduling code it's hard to follow from outside. I like both of the proposed metrics. Now I need to only remember that you wrote them here for beta 😅

wojtek-t

PTAL

wojtek-t · 2025-10-07T06:46:50Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+Moreover, the visibility into issues (debuggability) will depend on [#5510], but we don't
+treat it as a blocker.
+
+[#5510]: https://github.com/kubernetes/enhancements/pull/5510


wojtek-t · 2025-10-07T06:49:06Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+1. Ensure that pods being part of a gang are not bound if all pods belonging to it can't be scheduled.
+2. Provide the "optimal enough" placement by considering all pods from a gang together.
+3. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by kube-scheduler.
+4. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by different


Yes - the reason is that these problems have a bit different solutions and will be resolved at different timeframe (as described below - addressing (3) is the requirement for Beta, but addressing (4) will require a lot of follow-up work (reservations-like approach)).
Because of that I'm going to leave it as is, to make referencing those problems below easier.

keps/sig-scheduling/4671-gang-scheduling/README.md

wojtek-t · 2025-10-07T06:55:56Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+###### How can this feature be enabled / disabled in a live cluster?
+
+- [X] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: Workload/GenericWorkload/NativeWorkload


Switched to GenericWorkload for now - although we may adjust the name during the coding.

wojtek-t · 2025-10-07T06:59:18Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+will rollout across nodes.
+-->
+
+###### What specific metrics should inform a rollback?


I don't expect any particular metrics for the Workload API itself (we should rather rely on standard kube-apiserver metrics for that).

But for GangScheduling feature, we absolutely need to expose some metrics. The two primary ones I have on my mind is:

error rate - number of times we need to reject the placement that was already computed because of not being able to satisfy the whole gang

e2e latency for the whole gang

I just didn't add them from the beginning as we may want to tweak it somehow.

wojtek-t · 2025-10-07T07:11:40Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+          valueFrom:
+            fieldRef:
+              fieldPath:
+               "metadata.annotations['batch.kubernetes.io/job-completion-index']"


@erictune - I think you're referring to a different thing. There are two things:

Having ControllerRef inside Workload object (part of this KEP)

Adding workloadRef in Job API (part of KEP-5547: Add workloadRef in the Job API #5548 - @helayoty KEP)

Your comment above shows why (1) makes sense. I think it's not critical, but I also see a value in it. But that's not the thing that is being questioned in this comment thread.

What is being questioned here is (2). Because as pointed by @soltysh (which I agreed with), introducing workloadRef in Job API is a duplication of information. Because the exact same thing can already be expressed through the WorkloadReference in the PodTemplateSpec.

So we will already have a way to get the information without the #5548:
(a) From Workload to Job - using the ControllerRef field being part of WorkloadSpec
(b) From Job to Workload - using the job.spec.podTemplate.WorkloadReference field

sanposhiho

love it now

/approve
/lgtm

Thanks for all the effort to kick off this initiative, I'm super excited to see it's finally coming!

I left one comment, but that is on the beta discussion section. I know that is not supposed to discuss deeply on this initial alpha KEP, and that's why I actually put the stamp.

sanposhiho · 2025-10-07T17:04:56Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+can process all gang pods together. The single scheduling cycle and blocking resources in beta
+will address the requirement (3).
+
+We will also introduce delayed preemption by moving it after `WaitOnPermit` phase. Together with


What if a pod deletion takes long? The current scheduler mitigates it by just letting a preemptor pod go anywhere else (if there's an empty space, probably made by some other pod termination), while a deletion isn't completed.

That's a great question - I believe that we will need a similar mechanism here too. This becomes problematic in case of cross-pod dependencies within a gang (e.g. collocation), so it requires putting more thought in it. So let's get back to this question after Alpha.

WaitOnPreemption would have its own timeout, after which waiting pods would be rejected, so the process of preemption could start over after possible workload rescheduling first (if scheduler has any new information).

sanposhiho · 2025-10-07T17:08:28Z

/hold

sanposhiho · 2025-10-07T17:09:24Z

/hold

To block the merge until @soltysh approves for PRR.

sanposhiho · 2025-10-07T17:09:56Z

keps/sig-scheduling/4671-gang-scheduling/kep.yaml

+status: implementable
+creation-date: 2025-09-17
+reviewers:
+  - TBD


I guess we just put me only?

sanposhiho · 2025-10-07T17:10:44Z

keps/sig-scheduling/4671-gang-scheduling/kep.yaml

+
+owning-sig: sig-scheduling
+participating-sigs:
+  - sig-apps


Is there anyone from sig-apps here? Do we need a stamp from them? or is that not mandatory?

I killed two birds with one stone by using @soltysh - he is wearing both PRR hat and the SIG apps hat :)

Indeed I wear many hats 😅

soltysh

/approve
the PRR

k8s-ci-robot · 2025-10-08T09:40:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sanposhiho, soltysh, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [soltysh,wojtek-t]
~~keps/sig-scheduling/OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wojtek-t · 2025-10-08T09:52:12Z

Cancelling hold based on above approvals.
If new comments arrive later, I'm happy to address them in a followup.

/hold cancel

andreyvelich · 2025-10-08T13:43:51Z

Thanks everyone for your effort! I am super excited to see this moving forward 🚀

ahg-g · 2025-10-10T21:10:24Z

keps/sig-scheduling/4671-gang-scheduling/README.md

+    // PodGroup defines the name of the PodGroup within a Workload this pod belongs to.
+    PodGroup string
+    // PodGroupReplicaIndex is the replica index of the PodGroup that this pod
+    // belong to when the workload is running ReplicatedGangMode. In this mode,


s/ReplicatedGangMode/GangModeReplicated

erictune added 3 commits September 22, 2025 14:46

Convert google doc to markdown

bff33ed

Initial version of README.md

925e87c

Try to make tests pass.

f714005

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 22, 2025

k8s-ci-robot requested a review from macsko September 22, 2025 13:40

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Sep 22, 2025

k8s-ci-robot requested a review from palnabarun September 22, 2025 13:40

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Sep 22, 2025

github-project-automation bot added this to SIG Scheduling Sep 22, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 22, 2025

wojtek-t commented Sep 22, 2025

View reviewed changes

keps/sig-scheduling/4671-gang-scheduling/README.md Show resolved Hide resolved

wojtek-t mentioned this pull request Sep 22, 2025

[WIP] KEP-4671 : Gang Scheduling alpha #5545

Closed

ricardomaraschini reviewed Sep 23, 2025

View reviewed changes

wojtek-t force-pushed the gang-kep branch 5 times, most recently from 64c1d92 to 820c6ea Compare September 23, 2025 12:05

k8s-ci-robot assigned dom4ha and sanposhiho Sep 23, 2025

wojtek-t changed the title ~~[WIP] KEP-4671 : Gang Scheduling~~ KEP-4671 : Gang Scheduling Sep 23, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 23, 2025

wojtek-t mentioned this pull request Sep 23, 2025

Gang Scheduling Support in Kubernetes #4671

Open

13 tasks

wojtek-t force-pushed the gang-kep branch from 2dc00c6 to 299a3df Compare October 3, 2025 13:05

macsko reviewed Oct 3, 2025

View reviewed changes

wojtek-t force-pushed the gang-kep branch from 299a3df to 3692ee6 Compare October 3, 2025 14:05

macsko reviewed Oct 3, 2025

View reviewed changes

helayoty reviewed Oct 3, 2025

View reviewed changes

keps/sig-scheduling/4671-gang-scheduling/README.md Show resolved Hide resolved

kannon92 mentioned this pull request Oct 3, 2025

[RFC] Support kubernetes gang scheduling as a pod group provider kubernetes-sigs/lws#666

Open

3 tasks

ahg-g reviewed Oct 6, 2025

View reviewed changes

wojtek-t force-pushed the gang-kep branch from 3692ee6 to 0a515a1 Compare October 6, 2025 07:40

soltysh reviewed Oct 6, 2025

View reviewed changes

wojtek-t force-pushed the gang-kep branch from 0a515a1 to 3e46365 Compare October 7, 2025 07:12

wojtek-t commented Oct 7, 2025

View reviewed changes

Update the PodGroup API proposal

4e1e7fc

wojtek-t force-pushed the gang-kep branch from 3e46365 to 4e1e7fc Compare October 7, 2025 09:07

sanposhiho approved these changes Oct 7, 2025

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 7, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2025

sanposhiho reviewed Oct 7, 2025

View reviewed changes

soltysh approved these changes Oct 8, 2025

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2025

k8s-ci-robot merged commit ff0ab1f into kubernetes:master Oct 8, 2025
4 checks passed

k8s-ci-robot added this to the v1.35 milestone Oct 8, 2025

github-project-automation bot moved this from In Progress to Done in SIG Scheduling Oct 8, 2025

ahg-g reviewed Oct 10, 2025

View reviewed changes

andreyvelich mentioned this pull request Oct 18, 2025

KEP-4671: Add Workload API kubernetes/kubernetes#134564

Merged

	- It is not a goal to be able to map all the declarative state and behaviors of all workloads into ths `Workload` object. It will focus on state that is relevant to kube-scheduler, and possibly to cluster autoscalers, reschedulers and closely related components.
	- It is not a goal to be able to map all the declarative state and behaviors of all workloads into the `Workload` object. It will focus on state that is relevant to kube-scheduler, and possibly to cluster autoscalers, reschedulers and closely related components.


		## Summary

		In this KEP, kube-scheduler is modified to support gang scheduling[^1]. To implement gang scheduling, kube-scheduler identifies pods that are in a group and waits until all pods reach the same stage of the scheduling/binding cycle before allowing any pods from the group to advance past that point. If not all pods can reach that point before a timeout expires, then the scheduler stops trying to schedule that group, and all pods release all their resources. This allows other workloads to try to allocate those resources.

	// Optional but recommended to set.
	// ControllerRef points to the true workload, e.g. Deployment. It is optional
	// and it is intended to make this mapping easier for things like CLI tools.

	* Within a Workload there is a list of groups of pods. Each group represents a top-level division of pods within a Workload. Each group can be independently gang scheduled (or not use gang scheduling). This group is named
	* Within a Workload there is a list of groups of pods. Each group represents a top-level division of pods within a Workload. Each group can be independently gang scheduled (or not use gang scheduling). This group is named `PodGroup`.

	4. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by different
	4. Avoid deadlock scenario when multiple workloads are being scheduled at the same time by any scheduler (kube-scheduler, or third-party provided scheduler).

KEP-4671 : Gang Scheduling #5558

KEP-4671 : Gang Scheduling #5558

Conversation

wojtek-t commented Sep 22, 2025

Uh oh!

wojtek-t commented Sep 22, 2025

Uh oh!

Uh oh!

kannon92 commented Sep 22, 2025

Uh oh!

wojtek-t commented Sep 22, 2025

Uh oh!

wojtek-t commented Sep 22, 2025

Uh oh!

kannon92 commented Sep 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ingvagabund Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ingvagabund Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dom4ha Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wojtek-t commented Sep 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ahg-g left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ingvagabund Sep 23, 2025 •

edited

Loading

ingvagabund Sep 23, 2025 •

edited

Loading

dom4ha Sep 25, 2025 •

edited

Loading