Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VolumeGroupSnapshots - how to rebuild/restore a VolumeGroupSnapshot? #969

Closed
tesshuflower opened this issue Nov 30, 2023 · 14 comments
Closed

Comments

@tesshuflower
Copy link

What happened:

When creating a VolumeGroupSnapshot from multiple PVCs, it's a bit unclear how to do a restore.

It looks like you are required to individually restore each volumesnapshot in the group into a PVC - but is there an easy way to map each volumesnapshot back to the original PVC it was taken from?

The volumegroupsnapshot has status that lists volumesnapshots but doesn't provide information (as far as I can tell) linking them back to the original PVC.

Example volumegroupsnapshot status:

status:
  boundVolumeGroupSnapshotContentName: groupsnapcontent-84656059-5c4b-4289-9d8f-464f4085b331
  creationTime: "2023-11-30T18:43:03Z"
  readyToUse: true
  volumeSnapshotRefList:
  - kind: VolumeSnapshots
    name: snapshot-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
    namespace: source
    uid: 6d0e0c03-696a-4f0b-a136-3a2c026f360e
  - kind: VolumeSnapshots
    name: snapshot-d01f061e29fb3f492579b031b9bcd6723e7ad4860c647e4397a89aea375ff116-2023-11-30-6.43.4
    namespace: source
    uid: c1ae59ad-8a36-4630-90c8-b45d4b9d42e2

There is some information about the PVs in the volumegroupsnapshotcontent object, but again I'm not sure how to map this to the individual snapshots.

What you expected to happen:

In order to restore a volumegroupsnapshot I need to be able to restore each snapshot to the proper PVC.

How to reproduce it:

Anything else we need to know?:

Environment:

  • Driver version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@tesshuflower
Copy link
Author

Some more context - I was testing with the CSI HostPath driver and the information is also not present in the individual snapshots, as spec.source.persistentVolumeClaimName is not set.

Example volumesnapshot that was created by the volumegroupsnapshot:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  creationTimestamp: "2023-11-30T18:43:04Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  - snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
  generation: 1
  labels:
    volumeGroupSnapshotName: new-groupsnapshot-demo
  name: snapshot-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
  namespace: source
  resourceVersion: "43592"
  uid: 6d0e0c03-696a-4f0b-a136-3a2c026f360e
spec:
  source:
    volumeSnapshotContentName: snapcontent-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
status:
  boundVolumeSnapshotContentName: snapcontent-6e37158c2ce07fe27cba4ef0bc84c58f2801d344bdffff539aa1b786ab57d1e4-2023-11-30-6.43.3
  creationTime: "2023-11-30T18:43:03Z"
  readyToUse: true
  restoreSize: 1Gi

@xing-yang
Copy link
Collaborator

xing-yang commented Dec 1, 2023

Some more context - I was testing with the CSI HostPath driver and the information is also not present in the individual snapshots, as spec.source.persistentVolumeClaimName is not set.

Example volumesnapshot that was created by the volumegroupsnapshot:

This works as designed as we don't want each individual snapshot to be dynamically created.

@tesshuflower
Copy link
Author

This works as designed as we don't want each individual snapshot to be dynamically created.

Makes sense, thanks - I think it does mean I don't currently have any reliable way of determining which snapshot goes with which original pvc when I'm trying to restore?

@xing-yang
Copy link
Collaborator

Can you check VolumeGroupSnapshotContent?
Are you working on implement this in your CSI driver? If so, here is a workaround. In CreateVolumeGroupSnapshotRequest, there is repeated string source_volume_ids. In CreateVolumeGroupSnapshotResponse, there is repeated Snapshot snapshots. When constructing the response, make sure the snapshots are appended in the same order as their source volumes in source_volume_ids. This way you can find mappings between PV and VolumeSnapshotContent in VolumeGroupSnapshotContent.

apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshotContent
metadata:
  creationTimestamp: "2023-12-05T21:56:48Z"
  finalizers:
  - groupsnapshot.storage.kubernetes.io/volumegroupsnapshotcontent-bound-protection
  generation: 1
  name: groupsnapcontent-dc63473c-b310-4ddc-8698-4e70442457dd
  resourceVersion: "96004"
  uid: b04e2744-5e34-4a40-9507-fff1cc7e3187
spec:
  deletionPolicy: Delete
  driver: hostpath.csi.k8s.io
  source:
    persistentVolumeNames:
    - pvc-e15ccefa-12a5-4eb1-965d-ed7f1b142f99
    - pvc-971d6c80-fe7f-405b-9fbd-ab7b80b9c4ed
  volumeGroupSnapshotClassName: csi-hostpath-groupsnapclass
  volumeGroupSnapshotRef:
    apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
    kind: VolumeGroupSnapshot
    name: cluster-example-with-volume-snapshot-20231205215643
    namespace: default
    resourceVersion: "95980"
    uid: dc63473c-b310-4ddc-8698-4e70442457dd
status:
  creationTime: 1701813408401819395
  readyToUse: true
  volumeGroupSnapshotHandle: 3052a006-93b9-11ee-987f-5acef7aa0d0c
  volumeSnapshotContentRefList:
  - kind: VolumeSnapshotContent
    name: snapcontent-c91b95d44137bbecb906cbce13a2e2d1a19181a4db5b600644cbf9443c84000b-2023-12-05-9.56.49
  - kind: VolumeSnapshotContent
    name: snapcontent-650c86b012a3049f298e0e3a6e08d858687c0820cd55fbb8ba4e1db0ed3c2e5a-2023-12-05-9.56.49

cc @leonardoce

@leonardoce
Copy link
Contributor

We're using the same workaround @xing-yang is referring to in cloudnative-pg/cloudnative-pg#3345, the PR adding VolumeGroupSnapshot support in CloudNative-PG.
We develop a PostgreSQL operator and use VolumeGroupSnapshots to take consistent backups of the database.

We assume that .spec.source.persistentVolumeNames (a list of references to PVs) and .status.volumeSnapshotContentRefList (a list of references to VolumeSnapshotContents) are parallel, and we use this information to reconstruct the link between a VolumeSnapshot and the corresponding PVC.

@tesshuflower
Copy link
Author

Thanks for this info @leonardoce and @xing-yang .

In my case I'm not developing a CSI driver, but looking at it from an end-user perspective.

I think this is workable, assuming every CSI driver implementation is going to keep the persistentVolumeNames and volumeSnapshotContentRefList in the same order (hopefully we can assume this), however I don't believe it's ideal.

A user currently would do this:

  1. Create their PVCs for their app, and label them
  2. Create a volumegroupsnapshot with the label selector
  3. Now when they want to restore, they need to look at the volumegroupsnaphot, then the volumegroupsnapshotcontents, then map the PV list to the volumesnapshotcontents list, and then find the volumesnapshots from the volumesnapshotcontents. This also assumes they're keeping track of the PVs that were associated with the original PVCs from step 1 (note that for step1 & 2 they are dealing with PVCs only, not the underlying PVs). At this point when they're trying to restore their data, the original PVCs may also have been deleted.
  4. Now they can startup their app that uses the PVCs - if at this point the wrong volumesnapshot has been restored to the wrong pvc name this would be very problematic.

The end user has created the volumegroupsnapshot only working at the PVC level (as this is what they label). I think somewhere in the volumegroupsnapshot spec the information about which pvc is backed up to which volumesnapshot is needed since there will be no way of restoring the entire volumegroupsnapshot at once.

tesshuflower added a commit to tesshuflower/volsync that referenced this issue Dec 14, 2023
- RS does all sorts of lookups to create PVS
from the vol group snapshot that will hopefully
not be necessary - see issue:
kubernetes-csi/external-snapshotter#969

Signed-off-by: Tesshu Flower <tflower@redhat.com>
tesshuflower added a commit to tesshuflower/volsync that referenced this issue Dec 14, 2023
- RS does all sorts of lookups to create PVS
from the vol group snapshot that will hopefully
not be necessary - see issue:
kubernetes-csi/external-snapshotter#969

Signed-off-by: Tesshu Flower <tflower@redhat.com>
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 4, 2024
@xing-yang
Copy link
Collaborator

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 4, 2024
@Madhu-1
Copy link
Contributor

Madhu-1 commented Apr 5, 2024

When constructing the response, make sure the snapshots are appended in the same order as their source volumes in source_volume_ids. This way you can find mappings between PV and VolumeSnapshotContent in VolumeGroupSnapshotContent.

@xing-yang we cannot assume the order unless it's enforced in the CSI SPEC, isn't it? because the csi driver is developed by someone and backup software's are developed by some others, adding a PVC identifier to the volume snapshot could also help maintain clarity and avoid any ambiguity. Should we add the PVC name as the annotation when creating the volumesnapshots? (we need to list all the PV and check the volumeHandle and add it, its a heavy operation where we need to list the PV and loop through each)

@xing-yang
Copy link
Collaborator

That's a temporary workaround. I'm thinking about making a change in the VolumeGroupSnapshot APIs.
https://docs.google.com/document/d/1NdNwFD5Z64K2heQLYOnojt6Ogulg750BuYIJ6W9cEiM/edit?usp=sharing

@Madhu-1
Copy link
Contributor

Madhu-1 commented Apr 8, 2024

That's a temporary workaround. I'm thinking about making a change in the VolumeGroupSnapshot APIs. https://docs.google.com/document/d/1NdNwFD5Z64K2heQLYOnojt6Ogulg750BuYIJ6W9cEiM/edit?usp=sharing

Thank you @xing-yang

leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue Apr 17, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue Apr 17, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue Apr 18, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue May 14, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue May 15, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue May 15, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue May 15, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
leonardoce added a commit to leonardoce/external-snapshotter that referenced this issue May 17, 2024
This use the update API to set `persistentVolumeClaimRef` in
`VolumeGroupSnapshot` and `persistentVolumeName` in
`VolumeGroupSnapshotContent` to the corresponding objects.

This makes restoring volumes from a VolumeGroupSnapshot easier.

Related: kubernetes-csi#969
@yati1998
Copy link
Contributor

@xing-yang both the PRs seems to have merged, shall we close this issue if resolved?

@tesshuflower
Copy link
Author

tesshuflower commented Jun 18, 2024

I think this issue is addressed now, thanks @xing-yang and @leonardoce for your attention to this, it will make it much more usable!

Using v8.0.1 of the external-snapshotter (and latest csi-driver-host-path, v1.13.0) the volumegroupsnapshot has the information to rebuild PVCs from snapshots.

Here's the example volumegroupsnapshot I was able to create:

apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshot
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"groupsnapshot.storage.k8s.io/v1alpha1","kind":"VolumeGroupSnapshot","metadata":{"annotations":{},"name":"testvgsnap","namespace":"testvg"},"spec":{"source":{"selector":{"matchLabels":{"my-app-vgroup":"my-data"}}}}}
  creationTimestamp: "2024-06-14T08:32:59Z"
  finalizers:
  - groupsnapshot.storage.kubernetes.io/volumegroupsnapshot-bound-protection
  generation: 2
  name: testvgsnap
  namespace: testvg
  resourceVersion: "1381"
  uid: 8acec076-9627-4550-b4b8-7140b5fee6dc
spec:
  source:
    selector:
      matchLabels:
        my-app-vgroup: my-data
  volumeGroupSnapshotClassName: csi-hostpath-groupsnapclass
status:
  boundVolumeGroupSnapshotContentName: groupsnapcontent-8acec076-9627-4550-b4b8-7140b5fee6dc
  creationTime: "2024-06-14T08:32:59Z"
  pvcVolumeSnapshotRefList:
  - persistentVolumeClaimRef:
      name: data-a
    volumeSnapshotRef:
      name: snapshot-e82e327f25a5a95f7b35250b4f8d4a4194685f86c3d0bd24adced6c185bc8cff-2024-06-14-8.32.59
  - persistentVolumeClaimRef:
      name: data-b
    volumeSnapshotRef:
      name: snapshot-d1f8716d59b7e978dbea5d1449c6759185f1433b4302734cd8433bb248c238b3-2024-06-14-8.32.59
  - persistentVolumeClaimRef:
      name: data-c
    volumeSnapshotRef:
      name: snapshot-5bbcce636eb199a58ff62dc798967c0f3a39eddbf1dd1a87eef2fe6bde2115b3-2024-06-14-8.32.59
  readyToUse: true

tesshuflower added a commit to tesshuflower/volsync that referenced this issue Aug 8, 2024
- RS does all sorts of lookups to create PVS
from the vol group snapshot that will hopefully
not be necessary - see issue:
kubernetes-csi/external-snapshotter#969

Signed-off-by: Tesshu Flower <tflower@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants