From 14b80cabe51b885740d7a2c834950d139158b63d Mon Sep 17 00:00:00 2001 From: Humble Chirammal Date: Wed, 21 Sep 2022 18:27:10 +0530 Subject: [PATCH] KEP-3107: target csiNodeExpandSecret beta in 1.27 Signed-off-by: Humble Chirammal --- keps/prod-readiness/sig-storage/3107.yaml | 2 + .../3107-csi-nodeexpandsecret/README.md | 162 ++++++++++++++++-- .../3107-csi-nodeexpandsecret/kep.yaml | 8 +- 3 files changed, 149 insertions(+), 23 deletions(-) diff --git a/keps/prod-readiness/sig-storage/3107.yaml b/keps/prod-readiness/sig-storage/3107.yaml index 7a4f151bf83..a8b3cf591e9 100644 --- a/keps/prod-readiness/sig-storage/3107.yaml +++ b/keps/prod-readiness/sig-storage/3107.yaml @@ -4,3 +4,5 @@ kep-number: 3107 alpha: approver: "@deads2k" +beta: + approver: "@deads2k" diff --git a/keps/sig-storage/3107-csi-nodeexpandsecret/README.md b/keps/sig-storage/3107-csi-nodeexpandsecret/README.md index 07ed9bd460d..4d1d166eee5 100644 --- a/keps/sig-storage/3107-csi-nodeexpandsecret/README.md +++ b/keps/sig-storage/3107-csi-nodeexpandsecret/README.md @@ -172,18 +172,22 @@ N/A [sidecar](https://github.com/kubernetes-csi/external-provisioner/). Once added this support to mentioned sidecar, the e2e tests will be added and validated using example CSI driver [tests](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/drivers/csi-test/driver/driver.go). +- E2E test PR is available [here](https://github.com/kubernetes/kubernetes/pull/115451) + and this test has been enabled in the [testgrid](https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-gce-cos-alpha-features) ### Graduation Criteria #### Alpha - Implemented the feature. -- Wrote all the unit and E2E tests. +- implementation of unit tests. #### Beta - Deployed the feature in production and went through at least minor k8s version. +- Feedback from users. +- Implementation of e2e tests. #### GA @@ -191,8 +195,24 @@ N/A ### Upgrade / Downgrade Strategy +1. Upgrading a Kubernetes cluster with this feature flag enabled: +- in this upgraded cluster, a CSI driver should receive secrets as +part of NodeExpansion RPC call from CO side and should be able to +make use of it while expanding volumes on node. + +2. Downgrading a Kubernetes cluster with feature disabled: +- in this downgraded cluster, a CSI driver will not receive secrets +as part of the NodeExpansion RPC call from CO side. + ### Version Skew Strategy +The proposal requires changes to kubelet and kube api server feature +flag set. If any of the components are not upgraded to a version +supporting this feature, then the feature will not work as expected. +From an end user perspective, the existing behaviour will continue, ie, +there will be no facility to get the secrets as part of the node expansion +RPC call from CO side to the CSI driver. + ## Production Readiness Review Questionnaire ### Feature Enablement and Rollback @@ -220,53 +240,138 @@ N/A ### Rollout, Upgrade and Rollback Planning -TBD - ###### How can a rollout or rollback fail? Can it impact already running workloads? -TBD +A failed scenario of rollout or rollback dont have any impact on running workloads. +The CSI drivers use the feature based on the availability of Secrets in NodeExpansion +call which is controlled by the Kubernetes feature flag set. + + ###### What specific metrics should inform a rollback? -TBD +`csi_kubelet_operations_seconds` metric available +[here](https://github.com/kubernetes/kubernetes/blob/6b55f097bb2140381a58312aeede37fc76a0762e/pkg/volume/util/metrics.go#L66) +covers CSI NodeExpand operation which can be used for this purpose. + ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? -TBD +manual testing will be performed on upgrade and rollback. ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? -TBD +No. ### Monitoring Requirements -TBD - ###### How can an operator determine if the feature is in use by workloads? -TBD +An operator can query for api server and kubelet flags in the cluster +for `CSINodeExpandSecret` flag and if it exist then the feature is +in use. + + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [x] Other (treat as last resort) + - Details: to make use of this feature in a cluster a StorageClass instance has +to carry below entries in the parameter list. + + ``` + csi.storage.k8s.io/node-expand-secret-name + csi.storage.k8s.io/node-expand-secret-namespace + ``` + + The subjected CSI PV object should have `nodeExpandSecretRef` field filled with the + details given in the StorageClass. -###### How can someone using this feature know that it is working for their instance? - -TBD ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? -TBD + ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? -TBD + + + +- [ ] Metrics + - Metric name: `csiOperationsLatencyMetric` can be used by an operator to determine +the health of the service. ###### Are there any missing metrics that would be useful to have to improve observability of this feature? -TBD + ### Dependencies -TBD +This feature depends on the cluster having CSI drivers and sidecars that use CSI +spec v1.5.0 at minimum. ###### Does this feature depend on any specific services running in the cluster? -TBD + +- [CSI drivers and sidecars] + - Usage description: + - Impact of its outage on the feature: Inability to perform CSI storage + operations with NodeExpandVolume RPC call where the CSI driver require + credentials to complete this specific operation. + - Impact of its degraded performance or high-error rates on the feature: + Increase in latency performing CSI storage operations (due to repeated + retries) ### Scalability @@ -279,7 +384,8 @@ TBD provider?** no. - **Will enabling / using this feature result in increasing size or count of - the existing API objects?** no. + the existing API objects?** + yes, this adds a new field to the API so it changes the size. - **Will enabling / using this feature result in increasing time taken by any operations covered by [existing SLIs/SLOs]?** no. @@ -287,8 +393,21 @@ TBD - **Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?** no. +- **Can enabling / using this feature result in resource exhaustion of som + node resources (PIDs, sockets, inodes, etc.)?** no. + ### Troubleshooting +If the CSI driver does not receive the secrets as part of nodeExpansion +request, below things have to be checked in a cluster. + +- make sure StorageClass has `csi.storage.k8s.io/node-expand-secret-name` + and `csi.storage.k8s.io/node-expand-secret-namespace` parameters set + with proper value. + +- make sure `CSINodeExpandSecret` feature gate has been enabled for + `kubelet` and `kube-apiserver` configuration in the cluster. + ## Implementation History - 18/01/2022: Implementation started @@ -303,4 +422,9 @@ however this is really a hacky way and not the CSI driver authors want. ## Infrastructure Needed (Optional) + --- diff --git a/keps/sig-storage/3107-csi-nodeexpandsecret/kep.yaml b/keps/sig-storage/3107-csi-nodeexpandsecret/kep.yaml index c086ce95141..8cb3ce1c5d6 100644 --- a/keps/sig-storage/3107-csi-nodeexpandsecret/kep.yaml +++ b/keps/sig-storage/3107-csi-nodeexpandsecret/kep.yaml @@ -16,18 +16,18 @@ see-also: - TBD # The target maturity stage in the current dev cycle for this KEP. -stage: alpha +stage: beta # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.25" +latest-milestone: "v1.27" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.25" - beta: "v1.26" - stable: "v1.27" + beta: "v1.27" + stable: "v1.28" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled