Skip to content

Commit 677a7ed

Browse files
committed
5018-dra-adminaccess
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
1 parent 62039f1 commit 677a7ed

File tree

4 files changed

+859
-68
lines changed

4 files changed

+859
-68
lines changed

keps/sig-node/4381-dra-structured-parameters/README.md

Lines changed: 10 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1455,29 +1455,8 @@ type DeviceRequest struct {
14551455
}
14561456
```
14571457

1458-
Admin access to devices is a privileged operation because it grants users
1459-
access to devices that are in use by other users. Drivers might also remove
1460-
other restrictions when preparing the device.
1461-
1462-
In Kubernetes 1.31, an example validating admission policy [was
1463-
provided](https://github.com/kubernetes/kubernetes/blob/4aeaf1e99e82da8334c0d6dddd848a194cd44b4f/test/e2e/dra/test-driver/deploy/example/admin-access-policy.yaml#L1-L11)
1464-
which restricts access to this option. It is the responsibility of cluster
1465-
admins to ensure that such a policy is installed if the cluster shouldn't allow
1466-
unrestricted access.
1467-
1468-
Long term, a Kubernetes cluster should disable usage of this field by default
1469-
and only allow it for users with additional privileges. More time is needed to
1470-
figure out how that should work, therefore the field is placed behind a
1471-
separate `DRAAdminAccess` feature gate which remains in alpha. A separate
1472-
KEP will be created to push this forward.
1473-
1474-
The `DRAAdminAccess` feature gate controls whether users can set the field to
1475-
true when requesting devices. That is checked in the apiserver. In addition,
1476-
the scheduler refuses to allocate claims with admin access when the feature is
1477-
turned off and somehow the field was set (for example, set in 1.31 when it
1478-
was available unconditionally, or set while the feature gate was enabled).
1479-
A similar check in the kube-controller-manager prevents creating a
1480-
ResourceClaim when the ResourceClaimTemplate has admin access enabled.
1458+
For more details about `AdminAccess`, please refer to [KEP 5018]([KEP
1459+
#5018 DRA AdminAccess](https://kep.k8s.io/5018))
14811460

14821461
```yaml
14831462
const (
@@ -1870,21 +1849,17 @@ type DeviceRequestAllocationResult struct {
18701849
// +required
18711850
Device string
18721851

1873-
// AdminAccess is a copy of the AdminAccess value in the
1874-
// request which caused this device to be allocated.
1875-
//
1876-
// New allocations are required to have this set when the DRAAdminAccess
1877-
// feature gate is enabled. Old allocations made
1878-
// by Kubernetes 1.31 do not have it yet. Clients which want to
1879-
// support Kubernetes 1.31 need to look up the request and retrieve
1880-
// the value from there if this field is not set.
1852+
// AdminAccess indicates that this device was allocated for
1853+
// administrative access. See the corresponding request field
1854+
// for a definition of mode.
18811855
//
18821856
// This is an alpha field and requires enabling the DRAAdminAccess
1883-
// feature gate.
1857+
// feature gate. Admin access is disabled if this field is unset or
1858+
// set to false, otherwise it is enabled.
18841859
//
1885-
// +required
1860+
// +optional
18861861
// +featureGate=DRAAdminAccess
1887-
AdminAccess *bool
1862+
AdminAccess *bool `json:"adminAccess" protobuf:"bytes,5,name=adminAccess"`
18881863
}
18891864

18901865
// DeviceAllocationConfiguration gets embedded in an AllocationResult.
@@ -2102,10 +2077,6 @@ per claim is limited to `AllocationResultsMaxSize = 32`. The quota mechanism
21022077
uses that as the worst-case upper bound, so `allocationMode: all` is treated
21032078
like `allocationMode: exactCount` with `count: 32`.
21042079

2105-
Requests asking for "admin access" contribute to the quota. In practice,
2106-
namespaces where such access is allowed will typically not have quotas
2107-
configured.
2108-
21092080
### kube-controller-manager
21102081

21112082
The code that creates a ResourceClaim from a ResourceClaimTemplate started
@@ -2784,8 +2755,7 @@ skew are less likely to occur.
27842755

27852756
### Feature Enablement and Rollback
27862757

2787-
The initial answer in this section is for the core DRA. The second answer is
2788-
marked with DRAAdminAccess and applies to that sub-feature.
2758+
The answer in this section is for the core DRA.
27892759

27902760
###### How can this feature be enabled / disabled in a live cluster?
27912761

@@ -2796,42 +2766,22 @@ marked with DRAAdminAccess and applies to that sub-feature.
27962766
- kubelet
27972767
- kube-scheduler
27982768
- kube-controller-manager
2799-
- [X] Feature gate
2800-
- Feature gate name: DRAAdminAccess
2801-
- Components depending on the feature gate:
2802-
- kube-apiserver
2803-
2804-
28052769

28062770
###### Does enabling the feature change any default behavior?
28072771

28082772
No.
28092773

2810-
DRAAdminAccess: no.
2811-
28122774
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
28132775

28142776
Yes. Applications that were already deployed and are running will continue to
28152777
work, but they will stop working when containers get restarted because those
28162778
restarted containers won't have the additional resources.
28172779

2818-
DRAAdminAccess: Workloads which were deployed with admin access will continue
2819-
to run with it. They need to be deleted to remove usage of the feature.
2820-
If they were not running, then the feature gate checks in kube-scheduler will prevent
2821-
scheduling and in kube-controller-manager will prevent creating the ResourceClaim from
2822-
a ResourceClaimTemplate. In both cases, usage of the feature is prevented.
2823-
28242780
###### What happens if we reenable the feature if it was previously rolled back?
28252781

28262782
Pods might have been scheduled without handling resources. Those Pods must be
28272783
deleted to ensure that the re-created Pods will get scheduled properly.
28282784

2829-
DRAAdminAccess: Workloads which were deployed with admin access enabled are not
2830-
affected by a rollback. If the pods were already running, they keep running. If
2831-
they pods where kept as unschedulable because the scheduler refused to allocate
2832-
claims, they might now get scheduled.
2833-
2834-
28352785
###### Are there any tests for feature enablement/disablement?
28362786

28372787
<!--
@@ -2851,9 +2801,6 @@ Tests for apiserver will cover disabling the feature. This primarily matters
28512801
for the extended PodSpec: the new fields must be preserved during updates even
28522802
when the feature is disabled.
28532803

2854-
DRAAdminAccess: Tests for apiserver will cover disabling the feature. A test
2855-
that the DaemonSet controller tolerates keeping pods as pending is needed.
2856-
28572804
### Rollout, Upgrade and Rollback Planning
28582805

28592806
###### How can a rollout or rollback fail? Can it impact already running workloads?

keps/sig-node/4381-dra-structured-parameters/kep.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,6 @@ feature-gates:
4141
- kube-controller-manager
4242
- kube-scheduler
4343
- kubelet
44-
- name: DRAAdminAccess
45-
components:
46-
- kube-apiserver
47-
- kube-controller-manager
48-
- kube-scheduler
4944
disable-supported: true
5045

5146
# The following PRR answers are required at beta release

0 commit comments

Comments
 (0)