-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5018: move to beta in 1.34 #5327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
@ritazh: GitHub didn't allow me to assign the following users: for, sig, auth, PRR. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update lgtm, just had a couple questions
- Gather feedback | ||
- Additional tests are in Testgrid and linked in KEP | ||
- Implementations in the kubernetes-sigs/dra-example-driver | ||
- Implementations in the kubernetes-sigs/dra-example-driver: https://github.com/kubernetes-sigs/dra-example-driver/issues/97 and the NVIDIA dra driver: https://github.com/NVIDIA/k8s-dra-driver-gpu/issues/337 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do those issues mean we will show those repos labeling namespaces as admin access and using devices as admin access before promoting the gate to beta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Implementations in the kubernetes-sigs/dra-example-driver
was part of the original beta criteria. I think we should be able to add an example there. I'm less certain about the exact timeline of the Nvidia one. I could remove that one for now and add it back AFTER it's done. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not take dependencies on consumers we're not sure will be ready as a beta graduation criteria ... one example use seems sufficient
|
||
Will be considered for beta. | ||
- kube-controller-manager: If the kube-controller-manager fails to create `ResourceClaim` objects from `ResourceClaimTemplate` due to misconfigurations or permission issues relating to `adminAccess`, then the associated Pods will remain in a pending state and won't be scheduled. | ||
- kube-scheduler: Bugs in the scheduler might lead to Pods not being scheduled even when resources are available or, scheduling Pods that shouldn't be scheduled due to unmet `adminAccess` requirements. If the `DRAAdminAccess` feature gate isn't enabled or is misconfigured, the scheduler might not recognize ResourceClaim requirements, leading to scheduling failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this thinking of something more than generic scheduler backoff behavior when it encounters failed API requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this should be part of the generic scheduler backoff behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, maybe clarify that... otherwise this line sounds scarier or more specific to this feature than it actually is
--> | ||
|
||
Will be considered for beta. | ||
".status.allocation.devices.results[*].adminaccess" will be set to true for a claim using adminAccess when needed by a pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
".status.allocation.devices.results[*].adminaccess" will be set to true for a claim using adminAccess when needed by a pod. | |
".status.allocation.devices.results[*].adminAccess" will be set to true for a claim using adminAccess when needed by a pod. |
|
||
Will be considered for beta. | ||
- The DynamicResourceAllocation feature gate must be enabled to create ResourceClaim, ResourceClaimTemplate. More details at [KEP-4381 - DRA Structured Parameters](https://github.com/kubernetes/enhancements/issues/4381) | ||
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get acess to device specific resources without allocating them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get acess to device specific resources without allocating them. | |
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get access to device specific resources without allocating them. |
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly the integration links are missing, otherwise it's good to go.
#### Beta | ||
- Gather feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing bits higher in the doc:
- make sure to check appropriate boxes in
Release Singoff Checklist
- In
Integration tests
section, please make sure to link tests according to the template, especially the newly added that are called out there, since looking at the PRs submitted during alpha they did add new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments still hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while reviewing an unrelated DRA KEP, we realized this is using an inconsistent label key ...
labels / annotations are expected to use $group.kubernetes.io/...
domain prefixes, so this should really be resource.kubernetes.io/admin-access
"kubernetes.io" is the preferred form for labels and annotations, "k8s.io" should not be used for new map keys.
Sorry I didn't catch that last release
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so do we need to update this label in 1.34? and if so, is that still ok to move to beta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so do we need to update this label in 1.34?
I would say so, yes.
and if so, is that still ok to move to beta?
I think so, for the following reasons:
- pre-1.34 clusters default the feature off, so the adminAccess field is forcibly cleared on creation, so there's no existing objects with this field set unless someone opted into an alpha feature
- ResourceClaim / ResourceClaimTemplate only check admin access on create, since their specs are immutable, so no 1.33 server would have to handle authorizing an update to an
adminAccess: true
object persisted by a 1.34 server - If someone was using the feature in alpha, it's easy to double-label their namespace until they complete upgrading to 1.34 to let both 1.33 and 1.34 be happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this now! and +1 on moving to beta.
I can add a validation and warning for using the old label, not sure if it's worth the effort to maintain the code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this now! and +1 on moving to beta. I can add a validation and warning for using the old label, not sure if it's worth the effort to maintain the code though.
I think just a release note is fine for something changed before exiting alpha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @pohly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks to both of you for taking care of this.
Now I just need to erase k8s.io
from my own memory to avoid using it for future extensions...
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
/lgtm |
/hold cancel |
#### Beta | ||
- Gather feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments still hold.
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
the PRR
- 1 example of real-world usage | ||
- Allowing time for feedback | ||
- All issues and gaps identified as feedback during beta are resolved | ||
**Note:** GA criteria must not include any functional, security, monitoring, or testing requirements. Those must be beta requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/assign @enj |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enj, liggitt, ritazh, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/wg device-management
/assign @liggitt for sig auth
/assign @pohly
/assign @soltysh for PRR