Skip to content

Commit 6b5b1f8

Browse files
committed
Add security
1 parent e27569b commit 6b5b1f8

File tree

1 file changed

+80
-19
lines changed

1 file changed

+80
-19
lines changed

contributors/design-proposals/storage/container-storage-interface-inline-volumes.md

Lines changed: 80 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,14 @@
33
Author: @jsafrane
44

55
## Goal
6-
* Define API and high level design for in-line CSI volumes in Pod
6+
* Define API and high level design for in-line CSI volumes in Pod.
7+
* Make in-line CSI volumes secure for using ephemeral volumes (such as Secrets or ConfigMap).
78

89
## Motivation
9-
Currently, CSI can be used only though PersistentVolume object. All other persistent volume sources support in-line volumes in Pods, CSI should be no exception. There are two main drivers:
10+
Currently, CSI can be used only through PersistentVolume object. All other persistent volume sources support in-line volumes in Pods, CSI should be no exception. There are three main drivers:
1011
* We want to move away from in-tree volume plugins to CSI, as designed in a separate proposal https://github.com/kubernetes/community/pull/2199/. In-line volumes should use CSI too.
11-
* CSI drivers can be used to provide Secrets-like volumes to pods, e.g. providing secrets from a remote vault. We don't want to force users to create PVs for each secret, we should allow to use them in-line in pods as regular Secrets or Secrets-like Flex volumes.
12-
* Get the same features as Flex and deprecate Flex. I.e. replace it with some CSI-Flex bridge, which is out of scope of this proposal.
12+
* CSI drivers can be used to provide ephemeral volumes used to inject state, configuration, secrets, identity or similar information to pods, like Secrets and ConfigMap in-tree volumes do today. We don't want to force users to create PVs for each such volume, we should allow to use them in-line in pods as regular Secrets or ephemeral Flex volumes.
13+
* Get the same features as Flex and deprecate Flex. (I.e. replace it with some CSI-Flex bridge. This bridge is out of scope of this proposal.)
1314

1415
## API
1516
`VolumeSource` needs to be extended with CSI volume source:
@@ -23,17 +24,24 @@ type VolumeSource struct {
2324
}
2425

2526

26-
// Represents storage that is managed by an external CSI volume driver (Beta feature)
27+
// Represents storage that is managed by an external CSI volume driver (Alpha feature)
2728
type CSIVolumeSource struct {
2829
// Driver is the name of the driver to use for this volume.
2930
// Required.
3031
Driver string
3132

32-
// VolumeHandle is the unique ID of the volume. It is the ID used in all CSI
33-
// calls.
33+
// VolumeHandle is the unique ID of the volume. It is the volume ID used in
34+
// all CSI calls, optionally with a prefix based on VolumeHandlePrefix
35+
// value.
3436
// Required
3537
VolumeHandle string
3638

39+
// VolumeHandlePrefix is type of prefix added to VolumeHandle before using
40+
// it as CSI volume ID. It ensures that volumes with the same VolumeHandle
41+
// in different pods or namespaces get unique CSI volume ID.
42+
// Required.
43+
VolumeHandlePrefix CSIVolumeHandlePrefix
44+
3745
// Optional: The value to pass to ControllerPublishVolumeRequest.
3846
// Defaults to false (read/write).
3947
// +optional
@@ -74,9 +82,27 @@ type CSIVolumeSource struct {
7482
// +optional
7583
NodePublishSecretRef *LocalObjectReference
7684
}
85+
86+
type CSIVolumeHandlePrefix string
87+
const (
88+
// VolumeHandle is prefixed by Pod UID.
89+
CSIVolumeHandlePrefixPod CSIVolumeHandlePrefix = "Pod"
90+
// VolumeHandle is prefixed by UID of the namespace where the pod is located.
91+
CSIVolumeHandlePrefixNamespace CSIVolumeHandlePrefix = "Namespace"
92+
// VolumeHandle is not modified.
93+
CSIVolumeHandlePrefixNone CSIVolumeHandlePrefix = "None"
94+
)
7795
```
7896

79-
The only difference between `CSIVolumeSource` (in-lined in a pod) and `CSIPersistentVolumeSource` (in PV) are secrets. All secret references in in-line volumes can refer only to secrets in the same namespace where the corresponding pod is running. This is common in all other volume sources that refer to secrets, incl. Flex.
97+
The difference between `CSIVolumeSource` (in-lined in a pod) and `CSIPersistentVolumeSource` (in PV) are:
98+
99+
* All secret references in in-line volumes can refer only to secrets in the same namespace where the corresponding pod is running. This is common in all other volume sources that refer to secrets, incl. Flex.
100+
* VolumeHandle in in-line volumes can have a prefix. This prefix (Pod UID, Namespace UID or nothing) is added to the VolumeHandle before each CSI call. It makes sure that each pod uses a different volume ID for its ephemeral volumes. The prefix must be explicitly set by pod author, there is no default.
101+
* Users don't need to think about VolumeHandles used in other pods in their namespace, as each pod will get an unique prefix when `CSIVolumeHandlePrefixPod` is used. CSI volume ID with this prefix cannot accidentally conflict by another volume ID in another pod.
102+
* Each pod created by ReplicaSet, StatefulSet or DaemonSet will get the same copy of a pod template. `CSIVolumeHandlePrefixPod` makes sure that each pod gets its own unique volume ID and thus can get its own volume instance.
103+
* Without the prefix, user could guess volume ID of a secret-like CSI volume of another user and craft a pod with in-line volume referencing it. CSI driver, obeying idempotency, must then give the same volume to this pod. If users can use only`CSIVolumeHandlePrefixNamespace` or `CSIVolumeHandlePrefixPod`in their in-line volumes, we can make sure that they can't steal secrets of each other.
104+
* `PodSecurityPolicy` will be extended to allow / deny users using in-line volumes with no prefix.
105+
* Finally, `CSIVolumeHandlePrefixNone` allows selected users (based on PSP) to use persistent storage volumes in-line in pods.
80106

81107
## Implementation
82108
#### Provisioning/Deletion
@@ -102,29 +128,42 @@ type VolumeAttachmentSource struct {
102128
// +optional
103129
PersistentVolumeName *string
104130

105-
// VolumeSource represents the source location of a volume to attach.
106-
// Only CSIVolumeSource can be specified.
131+
// InlineVolumeSource represents the source location of a in-line volume in a pod to attach.
107132
// +optional
108-
VolumeSource *v1.VolumeSource
133+
InlineVolumeSource *InlineVolumeSource
134+
}
135+
136+
// InlineVolumeSource represents the source location of a in-line volume in a pod.
137+
type InlineVolumeSource struct {
138+
// VolumeSource is copied from the pod. It ensures that attacher has enough
139+
// information to detach a volume when the pod is deleted before detaching.
140+
// Only CSIVolumeSource can be set.
141+
// Required.
142+
VolumeSource v1.VolumeSource
143+
144+
// Namespace of the pod with in-line volume. It is used to resolve
145+
// references to Secrets in VolumeSource.
146+
// Required.
147+
Namespace string
109148
}
110149
```
111150

112151
* A/D controller **copies whole `VolumeSource`** from `Pod` into `VolumeAttachment`. This allows external CSI attacher to detach volumes for deleted pods without keeping any internal database of attached VolumeSources.
113152
* Using whole `VolumeSource` allows us to re-use `VolumeAttachment` for any other in-line volume in the future. We provide validation that this `VolumeSource` contains only `CSIVolumeSource` to clearly state that only CSI is supported now.
114-
* TBD: `CSIVolumeSource` would be enough...
115153
* External CSI attacher must be extended to process either `PersistentVolumeName` or `VolumeSource`.
116154
* Since in-line volume in a pod can refer to a secret in the same namespace as the pod, **external attacher may need permissions to read any Secrets in any namespace**.
117-
* CSI `ControllerUnpublishVolume` call (~ volume detach) requires the Secrets to be available at detach time. Current CSI attacher implementation simply expects that the Secrets are available at detach time. Secrets for PVs are "global", out of user's namespace, so this assumption is probably OK. For in-line volumes, **we can either expect that the Secrets are available too (and volume is not detached if user deletes them) or external attacher must cache them somewhere, probably directly in `VolumeAttachment` object itself.**
118-
* None of existing Kubernetes volume plugins needed credentials for `Detach`, however those that needed it for `TearDown` either required the Secret to be present (e.g. ScaleIO and StorageOS) or stored them in a json in `/var/lib/kubelet/plugins/<plugin name>/<volume name>/file.json` (e.g. iSCSI).
155+
* CSI `ControllerUnpublishVolume` call (~ volume detach) requires the Secrets to be available at detach time. Current CSI attacher implementation simply expects that the Secrets are available at detach time.
156+
* Secrets for PVs are "global", out of user's namespace, so this assumption is probably OK.
157+
* Secrets for in-line volumes must be in the same namespace as the pod that contains the volume. Users can delete them before the volume is detached. We deliberately choose to let the external attacher to fail when such Secret cannot be found on detach time and keep the volume attached, reporting errors about missing Secrets to user.
158+
* Since access to in-line volumes can be configured by `PodSecurityPolicy` (see below), we expect that cluster admin gives access to CSI drivers that require secrets at detach time only to educated users that know they should not delete Secrets used in volumes.
159+
* Number of CSI drivers that require Secrets on detach is probably very limited. No in-tree Kubernetes volume plugin requires them on detach.
160+
* We will provide clear documentation that using in-line volumes with drivers that require credentials on detach may leave orphaned attached volumes that Kubernetes is not able to detach. It's up to the cluster admin to decide if using such CSI driver is worth it.
119161

120162
### Kubelet (MountDevice/SetUp/TearDown/UnmountDevice)
121163
In-tree CSI volume plugin calls in kubelet get universal `volume.Spec`, which contains either `v1.VolumeSource` from Pod (for in-line volumes) or `v1.PersistentVolume`. We need to modify CSI volume plugin to check for presence of `VolumeSource` or `PersistentVolume` and read NodeStage/NodePublish secrets from appropriate source. Kubelet does not need any new permissions, it already can read secrets for pods that it handles. These secrets are needed only for `MountDevice/SetUp` calls and don't need to be cached until `TearDown`/`UnmountDevice`.
122164

123-
124-
### Security considerations
125-
126-
* As written above, external attacher may requrie permissions to read Secrets in any namespace. It is up to CSI driver author to document if the driver needs such permission (i.e. access to Secrets at attach/detach time) and up to cluster admin to deploy the driver with these permissions or restrict external attacher to access secrets only in some namespaces.
127-
* PodSecurityPolicy must be enhanced to limit pods in using in-line CSI volumes. It will be modeled following existing Flex volume policy:
165+
### `PodSecurityPolicy`
166+
* `PodSecurityPolicy` must be enhanced to limit pods in using in-line CSI volumes. It will be modeled following existing Flex volume policy. There is no default, users can't use in-line CSI volumes unless some CSI drivers are explicitly allowed.
128167
```go
129168
type PodSecurityPolicySpec struct {
130169
// <snip>
@@ -148,3 +187,25 @@ In-tree CSI volume plugin calls in kubelet get universal `volume.Spec`, which co
148187
Driver string
149188
}
150189
```
190+
* `PodSecurityPolicy` must be extended to allow users to use in-line volumes with no prefixes. This prevents users from stealing data from Secrets-like ephemeral volumes inlined in pods by guessing volume ID of someone else. There is no default, users can't use in-line CSI volumes unless some prefixes are explicitly allowed.
191+
```
192+
type PodSecurityPolicySpec struct {
193+
// <snip>
194+
// AllowedCSIVolumeHandlePrefixes is a whitelist of volume prefixes
195+
// allowed to be used in CSI volumes in-lined in pods.
196+
AllowedCSIVolumeHandlePrefixes []core.CSIVolumeHandlePrefix
197+
}
198+
```
199+
200+
* `PodSecurityPolicy` must be extended to allow users to use attachable in-line CSI volumes. This prevents users from leaving orphaned attached volumes when they delete Secrets required to detach volumes. **Kubernetes currently does not know which CSI volumes are attachable or not. There are several options considered and it will be handled in a separate proposal.**
201+
```
202+
type PodSecurityPolicySpec struct {
203+
// <snip>
204+
// AllowAttachableCSIVolumes allows users to use attachable CSI volumes
205+
// in-line in pod definitions.
206+
AllowAttachableCSIVolumes bool
207+
}
208+
```
209+
210+
### Security considerations
211+
As written above, external attacher may requrie permissions to read Secrets in any namespace. It is up to CSI driver author to document if the driver needs such permission (i.e. access to Secrets at attach/detach time) and up to cluster admin to deploy the driver with these permissions or restrict external attacher to access secrets only in some namespaces.

0 commit comments

Comments
 (0)