Update KEP-5007 DRA Device Binding Conditions #5342

KobayashiD27 · 2025-05-26T10:04:37Z

One-line PR description: updating KEP docs

Issue link: DRA: Device Binding Conditions #5007

Other comments:

This PR updates KEP-5007 to clarify the scope, motivation, and design of the proposed BindingConditions mechanism.

Key updates include:

Rewriting the Motivation section to emphasize general applicability beyond CDI, while retaining CDI as a motivating example.
Refining the Goals section to focus on readiness-aware binding and condition-based scheduling logic, rather than specific device types.
Rewriting the prioritization logic in the Goals section to describe general scheduling behavior based on BindingConditions.

These changes aim to make the KEP easier to review and better aligned with the feedback received during discussion. Feedback is welcome!

related to : kubernetes/kubernetes#130160

k8s-ci-robot · 2025-05-26T10:04:46Z

Hi @KobayashiD27. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

KobayashiD27 · 2025-05-26T10:08:07Z

cc @pohly @dom4ha
I've updated the KEP description based on the PR discussion and the current implementation. Could you please take a look when you have a moment?

KobayashiD27 · 2025-06-02T01:30:48Z

@pohly @johnbelamaric @dom4ha @macsko
Hi, could you take a look please?

pohly

I think this is getting closer to being ready for merging. Some suggestions and one API gap.

pohly · 2025-06-06T10:26:58Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

-In scenarios where attachment occurs after scheduling, there is a risk that the resource cannot be attached at the time of attachment, causing the container to remain in the "Container Creating" state.
+The mechanism is not tied to any specific hardware model or infrastructure.
+It can support a wide range of use cases, including:
+- Fabric-attached GPUs that require dynamic attachment via PCIe or CXL switches


👍 for mentioning CXL. This has indeed come up in recent discussions, and this KEP is relevant for it.

pohly · 2025-06-06T10:33:04Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

-By having the scheduler wait for the fabric device to be attached, we can reschedule the pod if the attachment fails.
-This approach is superior because it avoids unnecessary waiting and allows for immediate rescheduling.
+While the original motivation came from fabric-attached devices, the mechanism is designed to be broadly applicable.
+It can support other scenarios where resource readiness is asynchronous or failure-prone, such as remote accelerators or gang scheduling.


Also 👍 for gang scheduling.

This KEP is a good building block for exploring advanced use cases.

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

pohly · 2025-06-06T11:31:32Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

+It allows the scheduler to make binding decisions based on up-to-date readiness information, improving reliability and avoiding premature binding.
+
+While this proposal supports fabric-attached devices, it is not limited to them.
+The mechanism is designed to be general and can support other use cases where resource readiness is asynchronous or failure-prone.


This has been said a few times. I think you can remove the entire paragraph here.

I removed this paragraph.

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

pohly · 2025-06-06T11:42:25Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

+External controllers (e.g., composable DRA controllers) are responsible for monitoring the `ResourceClaim` and updating the condition statuses as device preparation progresses.
+This coordination allows the scheduler to make informed binding decisions without requiring tight coupling between the scheduler and device-specific logic.
+
+### Handling ResourceSlices Upon Failure of Attachment


This same section is present a second time below. I think this here is a better place for it.

I removed duplicate sections.

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

pohly · 2025-06-06T11:48:12Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

 ```

-#### AllocatedDeviceStatus Enhancements
+#### DeviceRequestAllocationResult Enhancements


Add a section above for adding AllocationTimestamp to AllocationResult, it's currently missing.

Yes, that's right, I'll update it to include a description about AllocationTimestamp.

dom4ha · 2025-06-06T13:45:17Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

-2. **Attribute Information for Fabric Devices**: 
-Add attribute information that clearly distinguishes fabric devices requiring attachment.
-This will help in accurately identifying and managing these devices within the Kubernetes environment.
+3. **Prioritize Devices Based on Readiness**:  


I'm not certain if we can make it a general rule without going into details of specific conditions. Do you mean it's always true that devices will lower number of conditions should be favorable or just assume that lack of conditions means the devices does not need attachment?

I think in general case it's not that simple and the number of conditions is rather an implementation detail. I think that scheduler should have more explicit information denoting a need for attachment (scale up) and possibly some information how much time it should take.

I'd rather think whether the timeout wasn't a better indicator which could be taken into account here?

The simple heuristic is that without binding conditions, there's no delay and thus such devices are "better" than devices with. I agree that we can improve this further by sorting by timeout: devices without binding conditions have a zero timeout, devices without an explicit timeout the default timeout, and others the specified one.

Ack, having none vs something seems good enough heuristic for the beginning.

Thanks for the feedback.
Yes, the current logic prioritizes devices without BindingConditions. This is because devices without conditions are immediately usable and don't require waiting in the PreBind phase.
This prioritization is already implemented in the GatherPools() function on my PR.

mortent · 2025-06-09T21:48:17Z

/wg device-management

mortent · 2025-06-09T21:49:09Z

/ok-to-test

pohly

/lgtm

Tentative API review, overall design.

pohly · 2025-06-10T09:49:09Z

I suppose this doesn't need a PRR re-review: that section was already reviewed when merging for 1.33 (#5012 (comment)) and the changes since then where mostly around API aspects, not fundamental changes of the design.

@dom4ha: okay to approve and merge?

KobayashiD27 · 2025-06-11T02:53:03Z

In the previous discussion on the implementation PR for KEP-5007 (kubernetes/kubernetes#130160), we received feedback that the API description was insufficient, that the "fail and reschedule" usage pattern mentioned in the KEP was considered an anti-pattern, and that the overall description of the KEP needed clarification.

Based on this feedback, we created this KEP update PR to make the KEP more generic and clearer, with a focus on the motivation, goals, and prioritization of device selection. These updates aim to better reflect the intent of the API, including BindingConditions, and the scheduling behavior based on them.

Given these changes, would it be necessary to conduct an API review for this KEP update PR itself? We would appreciate it if reviewers could take a look.

cc @thockin

pohly · 2025-06-11T05:50:50Z

Given these changes, would it be necessary to conduct an API review for this KEP update PR itself?

I think we can skip it for this PR. We can consider my review as a preliminary API review, to be ratified during the implementation review, and as you said, the API hasn't changed that much anyway since the previous PR.

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

dom4ha · 2025-06-11T14:32:19Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

 1. **Set NodeSelector**: Before the `PreBind` phase, add the `NodeName` to the `ResourceClaim`'s `NodeSelector`.

 If Conditions are present, the scheduler DRA plugin will perform the following steps during the `PreBind` phase:



Set NodeSelector in api-server: Before the PreBind phase, add the NodeName to the ResourceClaim's NodeSelector.

I have still some small corrections

macsko

Looks good, only nits

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

dom4ha · 2025-06-12T11:05:07Z

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md

 1. **Set NodeSelector**: Before the `PreBind` phase, add the `NodeName` to the `ResourceClaim`'s `NodeSelector`.

 If Conditions are present, the scheduler DRA plugin will perform the following steps during the `PreBind` phase:



I have still some small corrections

kannon92 · 2025-06-12T13:07:20Z

Just to confirm, @johnbelamaric approval is not needed here for PRR.

If so, I can mark this as no-need for this round since it was already approved for alpha. Has the design changed enough to warrant a new review?

pohly · 2025-06-12T14:34:00Z

I think we don't need another PRR review, the core design is still the same.

dom4ha · 2025-06-13T08:34:40Z

/approve for sig-scheduling

Thanks for updating this KEP!

k8s-ci-robot · 2025-06-13T08:34:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dom4ha, KobayashiD27, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-scheduling/OWNERS~~ [dom4ha]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2025-06-13T10:19:50Z

/lgtm

Please squash commits, I'll re-add LGTM if needed.

/hold

For confirmation that no new PRR review is needed (see also https://kubernetes.slack.com/archives/CPNHUMN74/p1749757599480619).

KobayashiD27 · 2025-06-13T11:11:09Z

Thanks all!
@pohly
I squash some commits, Please re-add LGTM?

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>

Co-authored-by: Dominik Marciński <gmidon@gmail.com>

soltysh · 2025-06-13T11:21:59Z

I think we don't need another PRR review, the core design is still the same.

This is not progressing between stages, so no PRR is required at this point in time.

pohly · 2025-06-13T12:06:23Z

/lgtm

/hold cancel
Because PRR is indeed not needed (thanks @soltysh!).

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 26, 2025

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels May 26, 2025

github-project-automation bot added this to SIG Scheduling May 26, 2025

github-project-automation bot moved this to Needs Triage in SIG Scheduling May 26, 2025

k8s-ci-robot requested review from dom4ha and macsko May 26, 2025 10:04

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 26, 2025

KobayashiD27 mentioned this pull request May 26, 2025

Implement DRA Device Binding Conditions (KEP-5007) kubernetes/kubernetes#130160

Merged

pohly mentioned this pull request Jun 6, 2025

DRA: Device Binding Conditions #5007

Open

6 tasks

pohly requested changes Jun 6, 2025

View reviewed changes

github-project-automation bot moved this from Needs Triage to Needs Review in SIG Scheduling Jun 6, 2025

dom4ha reviewed Jun 6, 2025

View reviewed changes

k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Jun 9, 2025

github-project-automation bot added this to Dynamic Resource Allocation Jun 9, 2025

github-project-automation bot moved this to 🆕 New in Dynamic Resource Allocation Jun 9, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 9, 2025

pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Jun 10, 2025

pohly approved these changes Jun 10, 2025

View reviewed changes

github-project-automation bot moved this from Needs Review to Needs Approval in SIG Scheduling Jun 10, 2025

k8s-ci-robot assigned pohly Jun 10, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2025

dom4ha reviewed Jun 11, 2025

View reviewed changes

keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md Outdated Show resolved Hide resolved

dom4ha reviewed Jun 11, 2025

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2025

macsko reviewed Jun 12, 2025

View reviewed changes

dom4ha reviewed Jun 12, 2025

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2025

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 13, 2025

KobayashiD27 force-pushed the dra-binding-conditions branch from 7e3a418 to 22d2875 Compare June 13, 2025 11:08

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 13, 2025

KobayashiD27 and others added 4 commits June 13, 2025 20:11

Update KEP-5007

9d18efa

Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>

Add explanation for AllocationTimestamp

abf9562

Co-authored-by: Dominik Marciński <gmidon@gmail.com>

Clarifiy where to set NodeSelector

fd1bd80

Update "Implementation History"

22d2875

Co-authored-by: Dominik Marciński <gmidon@gmail.com>

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 13, 2025

k8s-ci-robot merged commit bde597e into kubernetes:master Jun 13, 2025
3 of 4 checks passed

k8s-ci-robot added this to the v1.34 milestone Jun 13, 2025

github-project-automation bot moved this from Needs Approval to Done in SIG Scheduling Jun 13, 2025

pohly moved this from 👀 In review to ✅ Done in Dynamic Resource Allocation Jun 13, 2025

		1. Set NodeSelector: Before the `PreBind` phase, add the `NodeName` to the `ResourceClaim`'s `NodeSelector`.

		If Conditions are present, the scheduler DRA plugin will perform the following steps during the `PreBind` phase:

Update KEP-5007 DRA Device Binding Conditions #5342

Update KEP-5007 DRA Device Binding Conditions #5342

Uh oh!

Conversation

KobayashiD27 commented May 26, 2025

Uh oh!

k8s-ci-robot commented May 26, 2025

Uh oh!

KobayashiD27 commented May 26, 2025

Uh oh!

KobayashiD27 commented Jun 2, 2025

Uh oh!

pohly left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mortent commented Jun 9, 2025

Uh oh!

mortent commented Jun 9, 2025

Uh oh!

pohly left a comment

Choose a reason for hiding this comment

Uh oh!

pohly commented Jun 10, 2025

Uh oh!

KobayashiD27 commented Jun 11, 2025

Uh oh!

pohly commented Jun 11, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macsko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kannon92 commented Jun 12, 2025

Uh oh!

pohly commented Jun 12, 2025

Uh oh!

dom4ha commented Jun 13, 2025

Uh oh!

k8s-ci-robot commented Jun 13, 2025

Uh oh!

pohly commented Jun 13, 2025