Skip to content

Conversation

ritazh
Copy link
Member

@ritazh ritazh commented May 21, 2025

  • One-line PR description: Update KEP to prepare for beta in 1.34

/wg device-management
/assign @liggitt for sig auth
/assign @pohly
/assign @soltysh for PRR

Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label May 21, 2025
@k8s-ci-robot
Copy link
Contributor

@ritazh: GitHub didn't allow me to assign the following users: for, sig, auth, PRR.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

  • One-line PR description: Update KEP to prepare for beta in 1.34

/wg device-management
/assign @liggitt for sig auth
/assign @pohly
/assign @soltysh for PRR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 21, 2025
@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label May 21, 2025
@k8s-ci-robot k8s-ci-robot requested a review from micahhausler May 21, 2025 15:49
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 21, 2025
Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update lgtm, just had a couple questions

- Gather feedback
- Additional tests are in Testgrid and linked in KEP
- Implementations in the kubernetes-sigs/dra-example-driver
- Implementations in the kubernetes-sigs/dra-example-driver: https://github.com/kubernetes-sigs/dra-example-driver/issues/97 and the NVIDIA dra driver: https://github.com/NVIDIA/k8s-dra-driver-gpu/issues/337
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do those issues mean we will show those repos labeling namespaces as admin access and using devices as admin access before promoting the gate to beta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Implementations in the kubernetes-sigs/dra-example-driver was part of the original beta criteria. I think we should be able to add an example there. I'm less certain about the exact timeline of the Nvidia one. I could remove that one for now and add it back AFTER it's done. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not take dependencies on consumers we're not sure will be ready as a beta graduation criteria ... one example use seems sufficient


Will be considered for beta.
- kube-controller-manager: If the kube-controller-manager fails to create `ResourceClaim` objects from `ResourceClaimTemplate` due to misconfigurations or permission issues relating to `adminAccess`, then the associated Pods will remain in a pending state and won't be scheduled.
- kube-scheduler: Bugs in the scheduler might lead to Pods not being scheduled even when resources are available or, scheduling Pods that shouldn't be scheduled due to unmet `adminAccess` requirements. If the `DRAAdminAccess` feature gate isn't enabled or is misconfigured, the scheduler might not recognize ResourceClaim requirements, leading to scheduling failures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this thinking of something more than generic scheduler backoff behavior when it encounters failed API requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this should be part of the generic scheduler backoff behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe clarify that... otherwise this line sounds scarier or more specific to this feature than it actually is

-->

Will be considered for beta.
".status.allocation.devices.results[*].adminaccess" will be set to true for a claim using adminAccess when needed by a pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
".status.allocation.devices.results[*].adminaccess" will be set to true for a claim using adminAccess when needed by a pod.
".status.allocation.devices.results[*].adminAccess" will be set to true for a claim using adminAccess when needed by a pod.


Will be considered for beta.
- The DynamicResourceAllocation feature gate must be enabled to create ResourceClaim, ResourceClaimTemplate. More details at [KEP-4381 - DRA Structured Parameters](https://github.com/kubernetes/enhancements/issues/4381)
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get acess to device specific resources without allocating them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get acess to device specific resources without allocating them.
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get access to device specific resources without allocating them.

Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
@liggitt
Copy link
Member

liggitt commented May 21, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2025
@enj enj added this to SIG Auth May 22, 2025
@enj enj moved this to Needs Triage in SIG Auth May 22, 2025
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly the integration links are missing, otherwise it's good to go.

#### Beta
- Gather feedback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bits higher in the doc:

  1. make sure to check appropriate boxes in Release Singoff Checklist
  2. In Integration tests section, please make sure to link tests according to the template, especially the newly added that are called out there, since looking at the PRs submitted during alpha they did add new tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments still hold.

Copy link
Member

@liggitt liggitt May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while reviewing an unrelated DRA KEP, we realized this is using an inconsistent label key ...

labels / annotations are expected to use $group.kubernetes.io/... domain prefixes, so this should really be resource.kubernetes.io/admin-access

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#label-selector-and-annotation-conventions

"kubernetes.io" is the preferred form for labels and annotations, "k8s.io" should not be used for new map keys.

Sorry I didn't catch that last release

/hold

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we need to update this label in 1.34? and if so, is that still ok to move to beta?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we need to update this label in 1.34?

I would say so, yes.

and if so, is that still ok to move to beta?

I think so, for the following reasons:

  1. pre-1.34 clusters default the feature off, so the adminAccess field is forcibly cleared on creation, so there's no existing objects with this field set unless someone opted into an alpha feature
  2. ResourceClaim / ResourceClaimTemplate only check admin access on create, since their specs are immutable, so no 1.33 server would have to handle authorizing an update to an adminAccess: true object persisted by a 1.34 server
  3. If someone was using the feature in alpha, it's easy to double-label their namespace until they complete upgrading to 1.34 to let both 1.33 and 1.34 be happy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this now! and +1 on moving to beta.
I can add a validation and warning for using the old label, not sure if it's worth the effort to maintain the code though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this now! and +1 on moving to beta. I can add a validation and warning for using the old label, not sure if it's worth the effort to maintain the code though.

I think just a release note is fine for something changed before exiting alpha.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @pohly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks to both of you for taking care of this.

Now I just need to erase k8s.io from my own memory to avoid using it for future extensions...

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 22, 2025
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
@liggitt
Copy link
Member

liggitt commented May 22, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2025
@liggitt
Copy link
Member

liggitt commented May 22, 2025

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 22, 2025
#### Beta
- Gather feedback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments still hold.

@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 25, 2025
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 28, 2025
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
the PRR

- 1 example of real-world usage
- Allowing time for feedback
- All issues and gaps identified as feedback during beta are resolved
**Note:** GA criteria must not include any functional, security, monitoring, or testing requirements. Those must be beta requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@github-project-automation github-project-automation bot moved this from Needs Triage to In Review in SIG Auth May 28, 2025
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2025
@ritazh
Copy link
Member Author

ritazh commented May 28, 2025

/assign @enj

@enj
Copy link
Member

enj commented May 28, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 28, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enj, liggitt, ritazh, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit fc234c1 into kubernetes:master May 28, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone May 28, 2025
@github-project-automation github-project-automation bot moved this from In Review to Closed / Done in SIG Auth May 28, 2025
@ritazh ritazh deleted the kep-5018-beta branch May 29, 2025 01:21
@pohly pohly moved this from 👀 In review to ✅ Done in Dynamic Resource Allocation Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: Done
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants