The Container Storage Interface (CSI) introduced an alpha snapshot API in Kubernetes v1.12. It will reach beta support in Kubernetes v1.17, scheduled for release in December 2019. This proposal documents an approach for integrating support for this snapshot API within Velero, augmenting its existing capabilities.
- Enable Velero to backup and restore CSI-backed volumes using the Kubernetes CSI CustomResourceDefinition API
- Replacing Velero's existing VolumeSnapshotter API
- Replacing Velero's Restic support
Velero has had support for performing persistent volume snapshots since its inception. However, support has been limited to a handful of providers. The plugin API introduced in Velero v0.7 enabled the community to expand the number of supported providers. In the meantime, the Kubernetes sig-storage advanced the CSI spec to allow for a generic storage interface, opening up the possibility of moving storage code out of the core Kubernetes code base. The CSI working group has also developed a generic snapshotting API that any CSI driver developer may implement, giving users the ability to snapshot volumes from a standard interface.
By supporting the CSI snapshot API, Velero can extend its support to any CSI driver, without requiring a Velero-specific plugin be written, easing the development burden on providers while also reaching more end users.
In order to support CSI's snapshot API, Velero must interact with the VolumeSnapshot
and VolumeSnapshotContent
CRDs.
These act as requests to the CSI driver to perform a snapshot on the underlying provider's volume.
This can largely be accomplished with Velero BackupItemAction
and RestoreItemAction
plugins that operate on these CRDs.
Additionally, changes to the Velero server and client code are necessary to track VolumeSnapshot
s that are associated with a given backup, similarly to how Velero tracks its own volume.Snapshot
type.
Tracking these is important for allowing users to see what is in their backup, and provides parity for the existing volume.Snapshot
and PodVolumeBackup
types.
This is also done to retain the object store as Velero's source of truth, without having to query the Kubernetes API server for associated VolumeSnapshot
s.
velero backup describe --details
will use the stored VolumeSnapshots to list CSI snapshots included in the backup to the user.
A set of prototype plugins was developed that informed this design.
The plugins will be as follows:
This plugin will act directly on PVCs, since an implementation of Velero's VolumeSnapshotter does not have enough information about the StorageClass to properly create the VolumeSnapshot
objects.
The associated PV will be queried and checked for the presence of PersistentVolume.Spec.PersistentVolumeSource.CSI
. (See the "Snapshot Mechanism Selection" section below).
If this field is nil
, then the plugin will return early without taking action.
If the Backup.Spec.SnapshotVolumes
value is false
, the plugin will return early without taking action.
Additionally, to prevent creating CSI snapshots for volumes backed up by restic, the plugin will query for all pods in the PersistentVolumeClaim
's namespace.
It will then filter out the pods that have the PVC mounted, and inspect the backup.velero.io/backup-volumes
annotation for the associated volume's name.
If the name is found in the list, then the plugin will return early without taking further action.
Create a VolumeSnapshot.snapshot.storage.k8s.io
object from the PVC.
Label the VolumeSnapshot
object with the velero.io/backup-name
label for ease of lookup later.
Also set an ownerRef on the VolumeSnapshot
so that cascading deletion of the Velero Backup
will delete associated VolumeSnapshots
.
The CSI controllers will create a VolumeSnapshotContent.snapshot.storage.k8s.io
object associated with the VolumeSnapshot
.
Associated VolumeSnapshotContent
objects will be retrieved and updated with the velero.io/backup-name
label for ease of lookup later.
velero.io/volume-snapshot-name
will be applied as a label to the PVC so that the VolumeSnapshot
can be found easily for restore.
VolumeSnapshot
, VolumeSnapshotContent
, and VolumeSnapshotClass
objects would be returned as additional items to be backed up. GitHub issue 1566 represents this work.
The VolumeSnapshotContent.Spec.VolumeSnapshotSource.SnapshotHandle
field is the link to the underlying platform's on-disk snapshot, and must be preserved for restoration.
The plugin will not wait for the VolumeSnapshot.Status.readyToUse
field to be true
before returning.
This field indicates that the snapshot is ready to use for restoration, and for different vendors can indicate that the snapshot has been made durable.
However, the applications can proceed as soon as VolumeSnapshot.Status.CreationTime
is set.
This also maintains current Velero behavior, which allows applications to quiesce and resume quickly, with minimal interruption.
Any sort of monitoring or waiting for durable snapshots, either Velero-native or CSI snapshots, are not covered by this proposal.
K8s object relationships inside of the backup tarball
+-----------------------+ +-----------------------+
| PersistentVolumeClaim +-------------->+ PersistentVolume |
+-----------+-----------+ +-----------+-----------+
^ ^
| |
| |
| |
+-----------+-----------+ +-----------+-----------+
| VolumeSnapshot +<------------->+ VolumeSnapshotContent |
+-----------------------+ +-----------------------+
On restore, VolumeSnapshotContent
objects are cleaned so that they may be properly associated with IDs assigned by the target cluster.
Only VolumeSnapshotContent
objects with the velero.io/backup-name
label will be processed, using the plugin's AppliesTo
function.
The metadata (excluding labels), PersistentVolumeClaim.UUID
, and VolumeSnapshotRef.UUID
fields will be cleared.
The reference fields are cleared because the associated objects will get new UUIDs in the cluster.
This also maps to the "import" case of the snapshot API.
This means the relationship between the VolumeSnapshot
and VolumeSnapshotContent
is
one way until the CSI controllers rebind them.
K8s objects after the velero.io/csi-vsc plugin has run
+-----------------------+ +-----------------------+
| PersistentVolumeClaim +-------------->+ PersistentVolume |
+-----------------------+ +-----------------------+
+-----------------------+ +-----------------------+
| VolumeSnapshot +-------------->+ VolumeSnapshotContent |
+-----------------------+ +-----------------------+
VolumeSnapshot
objects must be prepared for importing into the target cluster by removing IDs and metadata associated with their origin cluster.
Only VolumeSnapshot
objects with the velero.io/backup-name
label will be processed, using the plugin's AppliesTo
function.
Metadata (excluding labels) and Source
(that is, the pointer to the PersistentVolumeClaim
) fields on the object will be cleared.
The VolumeSnapshot.Spec.SnapshotContentName
is the link back to the VolumeSnapshotContent
object, and thus the actual snapshot.
The Source
field indicates that a new CSI snapshot operation should be performed, which isn't relevant on restore.
This follows the "import" case of the snapshot API.
The Backup
associated with the VolumeSnapshot
will be queried, and set as an ownerRef on the VolumeSnapshot
so that deletion can cascade.
+-----------------------+ +-----------------------+
| PersistentVolumeClaim +-------------->+ PersistentVolume |
+-----------------------+ +-----------------------+
+-----------------------+ +-----------------------+
| VolumeSnapshot +-------------->+ VolumeSnapshotContent |
+-----------------------+ +-----------------------+
On restore, PersistentVolumeClaims
will need to be created from the snapshot, and thus will require editing before submission.
Only PersistentVolumeClaim
objects with the velero.io/volume-snapshot-name
label will be processed, using the plugin's AppliesTo
function.
Metadata (excluding labels) will be cleared, and the velero.io/volume-snapshot-name
label will be used to find the relevant VolumeSnapshot
.
A reference to the VolumeSnapshot
will be added to the PersistentVolumeClaim.DataSource
field.
+-----------------------+
| PersistentVolumeClaim |
+-----------------------+
+-----------------------+ +-----------------------+
| VolumeSnapshot +-------------->+ VolumeSnapshotContent |
+-----------------------+ +-----------------------+
No special logic is required to restore VolumeSnapshotClass
objects.
These plugins should be provided with Velero, as there will also be some changes to core Velero code to enable association of a Backup
to the included VolumeSnapshot
s.
Any non-plugin code changes must be behind a EnableCSI
feature flag and the behavior will be opt-in until it's exited beta status.
This will allow the development to continue on the feature while it's in pre-production state, while also reducing the need for long-lived feature branches.
persistBackup
will be extended to query for all VolumeSnapshot
s associated with the backup, and persist the list to JSON.
BackupStore.PutBackup
will receive an additional argument, volumeSnapshots io.Reader
, that contains the JSON representation of VolumeSnapshots
.
This will be written to a file named csi-snapshots.json.gz
.
defaultRestorePriorities
should be rewritten to the following to accomodate proper association between the CSI objects and PVCs. CustomResourceDefinition
s are moved up because they're necessary for creating the CSI CRDs. The CSI CRDs are created before PersistentVolume
s and PersistentVolumeClaim
s so that they may be used as data sources.
GitHub issue 1565 represents this work.
var defaultRestorePriorities = []string{
"namespaces",
"storageclasses",
"customresourcedefinitions",
"volumesnapshotclass.snapshot.storage.k8s.io",
"volumesnapshotcontents.snapshot.storage.k8s.io",
"volumesnapshots.snapshot.storage.k8s.io",
"persistentvolumes",
"persistentvolumeclaims",
"secrets",
"configmaps",
"serviceaccounts",
"limitranges",
"pods",
"replicaset",
}
Volumes found in a Pod
's backup.velero.io/backup-volumes
list will use Velero's current Restic code path.
This also means Velero will continue to offer Restic as an option for CSI volumes.
The velero.io/csi-pvc
BackupItemAction plugin will inspect pods in the namespace to ensure that it does not act on PVCs already being backed up by restic.
This is preferred to modifying the PVC due to the fact that Velero's current backup process backs up PVCs and PVs mounted to pods at the same time as the pod.
A drawback to this approach is that we're querying all pods in the namespace per PVC, which could be a large number.
In the future, the plugin interface could be improved to have some sort of context argument, so that additional data such as our existing resticSnapshotTracker
could be passed to plugins and reduce work.
To ensure that all created resources are deleted when a backup expires or is deleted, VolumeSnapshot
s will have an ownerRef
defined pointing to the Velero backup that created them.
In order to fully delete these objects, each VolumeSnapshotContent
s object will need to be edited to ensure the associated provider snapshot is deleted.
This will be done by editing the object and setting VolumeSnapshotContent.Spec.DeletionPolicy
to Delete
, regardless of whether or not the default policy for the class is Retain
.
See the Deletion Policies section below.
The edit will happen before making Kubernetes API deletion calls to ensure that the cascade works as expected.
Deleting a Velero Backup
or any associated CSI object via kubectl
is unsupported; data will be lost or orphaned if this is done.
Since VolumeSnapshot
and VolumeSnapshotContent
objects are contained within a Velero backup tarball, it is possible that all CRDs and on-disk provider snapshots have been deleted, yet the CRDs are still within other Velero backup tarballs.
Thus, when a Velero backup that contains these CRDs is restored, the VolumeSnapshot
and VolumeSnapshotContent
objects are restored into the cluster, the CSI controllers will attempt to reconcile their state, and there are two possible states when the on-disk snapshot has been deleted:
1) If the driver _does not_ support the `ListSnapshots` gRPC method, then the CSI controllers have no way of knowing how to find it, and sets the `VolumeSnapshot.Status.readyToUse` field to `true`.
2) If the driver _does_ support the `ListSnapshots` gRPC method, then the CSI controllers will query the state of the on-disk snapshot, see it is missing, and set `VolumeSnapshot.Status.readyToUse` and `VolumeSnapshotContent.Status.readyToUse` fields to `false`.
To use CSI features, the Velero client must use the EnableCSI
feature flag.
DescribeBackupStatus
will be extended to download the csi-snapshots.json.gz
file for processing. GitHub Issue 1568 captures this work.
A new describeCSIVolumeSnapshots
function should be added to the output package that knows how to render the included VolumeSnapshot
names referenced in the csi-snapshots.json.gz
file.
The most accurate, reliable way to detect if a PersistentVolume is a CSI volume is to check for a non-nil
PersistentVolume.Spec.PersistentVolumeSource.CSI
field.
Using the volume.beta.kubernetes.io/storage-provisioner
is not viable, since the usage is for any PVC that should be dynamically provisioned, and is not limited to CSI implementations.
It was introduced with dynamic provisioning support in 2016, predating CSI.
In the BackupItemAction
for PVCs, the associated PV will be queried and checked for the presence of PersistentVolume.Spec.PersistentVolumeSource.CSI
.
Volumes with any other PersistentVolumeSource
set will use Velero's current VolumeSnapshotter plugin code path.
Velero uses its own VolumeSnapshotLocation
CRDs to specify configuration options for a given storage system.
In Velero, this often includes topology information such as regions or availibility zones, as well as credential information.
CSI volume snapshotting has a VolumeSnapshotClass
CRD which also contains configuration options for a given storage system, but these options are not the same as those that Velero would use.
Since CSI volume snapshotting is operating within the same storage system that manages the volumes already, it does not need the same topology or credential information that Velero does.
As such, when used with CSI volumes, Velero's VolumeSnapshotLocation
CRDs are not relevant, and could be omitted.
This will create a separate path in our documentation for the time being, and should be called out explicitly.
- Implementing similar logic in a Velero VolumeSnapshotter plugin was considered. However, this is inappropriate given CSI's data model, which requires a PVC/PV's StorageClass. Given the arguments to the VolumeSnapshotter interface, the plugin would have to instantiate its own client and do queries against the Kubernetes API server to get the necessary information.
This is unnecessary given the fact that the BackupItemAction
and RestoreItemAction
APIs can act directly on the appropriate objects.
Additionally, the VolumeSnapshotter plugins and CSI volume snapshot drivers overlap - both produce a snapshot on backup and a PersistentVolume on restore. Thus, there's not a logical place to fit the creation of VolumeSnapshot creation in the VolumeSnapshotter interface.
-
Implement CSI logic directly in Velero core code. The plugins could be packaged separately, but that doesn't necessarily make sense with server and client changes being made to accomodate CSI snapshot lookup.
-
Implementing the CSI logic entirely in external plugins. As mentioned above, the necessary plugins for
PersistentVolumeClaim
,VolumeSnapshot
, andVolumeSnapshotContent
could be hosted out-out-of-tree from Velero. In fact, much of the logic for creating the CSI objects will be driven entirely inside of the plugin implementation.
However, Velero currently has no way for plugins to communicate that some arbitrary data should be stored in or retrieved from object storage, such as list of all VolumeSnapshot
objects associated with a given Backup
.
This is important, because to display snapshots included in a backup, whether as native snapshots or Restic backups, separate JSON-encoded lists are stored within the backup on object storage.
Snapshots are not listed directly on the Backup
to fit within the etcd size limitations.
Additionally, there are no client-side Velero plugin mechanisms, which means that the velero describe backup --details
command would have no way of displaying the objects to the user, even if they were stored.
In order for underlying, provider-level snapshots to be retained similarly to Velero's current functionality, the VolumeSnapshotContent.Spec.DeletionPolicy
field must be set to Retain
.
This is most easily accomplished by setting the VolumeSnapshotClass.DeletionPolicy
field to Retain
, which will be inherited by all VolumeSnapshotContent
objects associated with the VolumeSnapshotClass
.
The current default for dynamically provisioned VolumeSnapshotContent
objects is Delete
, which will delete the provider-level snapshot when the VolumeSnapshotContent
object representing it is deleted.
Additionally, the Delete
policy will cascade a deletion of a VolumeSnapshot
, removing the associated VolumeSnapshotContent
object.
It is not currently possible to define a deletion policy on a VolumeSnapshot
that gets passed to a VolumeSnapshotContent
object on an individual basis.
This proposal does not significantly change Velero's security implications within a cluster.
If a deployment is using solely CSI volumes, Velero will no longer need privileges to interact with volumes or snapshots, as these will be handled by the CSI driver. This reduces the provider permissions footprint of Velero.
Velero must still be able to access cluster-scoped resources in order to back up VolumeSnapshotContent
objects.
Without these objects, the provider-level snapshots cannot be located in order to re-associate them with volumes in the event of a restore.