Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design for Extending VolumePolicies to support more actions #6956

Merged

Conversation

shubham-pampattiwar
Copy link
Collaborator

@shubham-pampattiwar shubham-pampattiwar commented Oct 13, 2023

This PR adds design for #6640

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.88%. Comparing base (e65ef28) to head (bec3425).
Report is 160 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6956      +/-   ##
==========================================
+ Coverage   61.76%   61.88%   +0.12%     
==========================================
  Files         262      266       +4     
  Lines       28440    29456    +1016     
==========================================
+ Hits        17567    18230     +663     
- Misses       9642     9934     +292     
- Partials     1231     1292      +61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sseago
Copy link
Collaborator

sseago commented Oct 31, 2023

It sounds like we need to completely step back. @shubham-pampattiwar has been designing for one set of requirements, but the other reviews seem to have a completely different set of requirements in mind. So I think we have to step away from design for a bit and go back to requirements. I think we have a couple of options here:

  1. Provide an alternative to the existing per-pod-volume annotations to determine when to use fs-backup. This is what the current design is based on -- workflow is unchanged from current annotations, we're just aiming to provide a more user-friendly means of selection.

  2. Provide a completely different selection mechanism that relies 100% on VolumePolicies and doesn't interact with defaultVolumesToFsBackup or the annotations at all. This is what the other reviewers seem to want here.

  3. will be the least risky to implement with the least amount of effort. 2) Might result in a more flexible approach, but it's completely incompatible with the legacy methods, so we would need validation to ensure that only one approach is used.

To expand on the validation needs -- backup.Spec.DefaultVolumesToFsBackup is a bool pointer. If this is set to true or false, then we're using the legacy opt-in/opt-out approach, and any volume policies referencing fs backup or snapshots should result in a validation error -- if this is true or false, then existing workflows will be used. If this is nil (unspecified), then volume policies are used to determine what to snapshot and what to use fsbackup for. We'll need to modify install/server to make this value a bool pointer instead of a plain bool -- currently, not specifying this on install/server results in the backup getting a false rather than a nil value here.

@shubham-pampattiwar what do you think?

@shubham-pampattiwar
Copy link
Collaborator Author

@reasonerjt @qiuming-best @sseago I have re-done the design, please take another look, Thanks !

@reasonerjt reasonerjt requested review from Lyndon-Li and removed request for blackpiglet January 24, 2024 09:18
@weshayutin
Copy link
Contributor

Some notes from latest community call:
https://hackmd.io/Jq6F5zqZR7S80CeDWUklkA#Discussion-topics

My personal understanding from the call last night is that:

  1. VolumePolicies take precedence over legecy opt-in/opt-out labeling approaches
  2. If there is no matching VolumePolicy on a volume, the velero defers to the opt-in/opt-out approach
  3. Finally if there is no match for a volumePolicy and no match for opt-in/out method, the velero server install default (FSBackupOrSnapShot) will be authoritative.

@shubham-pampattiwar I'm off base I can edit or delete this comment.

@shubham-pampattiwar shubham-pampattiwar force-pushed the extend-vol-policy-design branch 2 times, most recently from 545f52c to 4a43020 Compare March 8, 2024 19:24
@shubham-pampattiwar
Copy link
Collaborator Author

@reasonerjt Updated design implementation to use a volumehelper package, PTAL, Thanks !

sseago
sseago previously approved these changes Mar 12, 2024
Copy link
Collaborator

@sseago sseago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good now. We might want to be explicit that none of this new design replaces the snaphotVolumes param, that functions exactly as before. If it is true, then any volume identified as a snapshot volume (either via new policies or legacy opt-in/opt-out) will be snapshotted via either CSI or native VolumeSnapshotter. If it is false, then volumes identified as snapshot volumes will not be snapshotted. In other words, snapshotVolumes is a param that simply says "Yes I want to include snapshots in this backup for the volumes where it's appropriate", or "No, I don't want to include any snapshots in this backup".

@Lyndon-Li
Copy link
Contributor

@shubham-pampattiwar
I see there are still some opening comments, could you help to double check them? For the ones that are done, please resolve them; for the open ones, let's discuss in tomorrow's community meeting. Thanks!

@shubham-pampattiwar
Copy link
Collaborator Author

@Lyndon-Li Done, updated the design and resolved the comments.
@reasonerjt Latest iteration of design updates the implementation parts and introduces a VolumeHelper interface and a VolumePolicyHelper struct which implements the VolumeHelper interface.

ShouldPerformSnapshot(obj runtime.Unstructured) (bool, error)
}
```
- The `VolumePolicyHelper` struct will implement the `VolumeHelper` interface and will consist of the functions that we will use through the backup workflow to accomodate volume policies for PVs and PVCs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar
I was thinking the VolumeHelper should be the source of truth for the backup, the return value should combine the value of volume-policy, fields in the backup, and the opt-in/opt-out annotations on the pod.
In the case of fs-backup, the volumeHelper should be a replacement of pdvolumeutil.GetVolumesByPod
and tell the backupper what volumes should be handled via fs-backup.

Therefore, IMO the VolumePolicyHelper is a misuse b/c it only covers "volumePolicy".

Maybe we can clarify the volumeHelper's scope to "Help velero decide what to do with a volume during backup once the resource is selected by include/exclude filter and label selectors"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Updated the PR.

- The implementation should be included in velero 1.14

- We will introduce a `VolumeHelper` interface. It will consist of two methods:
- `ShouldPerformFSBackupForPodVolume(pod *corev1api.Pod)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have ProcessVolumePolicyFSBackup (btw I think it should rename to GetVolumesForFSBackup)
This func is probably not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Making sure that `snapshot` action is skipped for PVCs that do not fit the volume policy criteria, for this we will again use the `vph.ShouldPerformSnapshot` from the `VolumePolicyHelper(vph)` receiver.
- We will pass the `VolumePolicyHelper(vph)` instance in `executeActions` method so that its available to use.
```go
func (v *VolumePolicyHelper) ShouldPerformSnapshot(obj runtime.Unstructured) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should also take other flags, like "snapshotVolumes" in the backup CR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, updated the PR

@abh
Copy link

abh commented Mar 29, 2024

Thanks for working on this (and Velero). I greatly appreciate the backups Velero is managing for me. I came here via the issue referenced above.

For my use cases the fs-backup feature is fine for many use cases (and I appreciate that it's explicit and the configuration is close to the workload configuration).

I use snapshot backups for some workloads where I need a "as if the computer crashed" consistency of the file system and I use data mover to move the data off the infrastructure the cluster is running in. This PR doesn't seem to support that use case.

The feature I'm missing currently (and I can't figure out from the resource filtering documentation is doing volume snapshots (and data mover) on some PVCs in a namespace instead of all.

For example I have a namespace running clickhouse databases and it's a mix of 1TB data that gets replaced within a day and about 100GB data that changes much less (and is more valuable). I'd like to do snapshots and data mover on the 100GB data without also having to copy the 1TB other data. (I can't easily divide them between namespaces as clickhouse or the CH operator has features that helps me generate and update the small data set from the large data set within the cluster).

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Mar 29, 2024

@abh
Do you mean you want to specify data mover as the backup type for some of the volumes?
I think this has been discussed --- only fs-backup vs. snapshot could be specified per volume; then if snapshot is specified, you need to use the existing per backup option --snapshot-move-data to make it for data mover.
For sure, this would miss one functionality --- in a backup, among the volumes going with snapshot, I want some volumes to go with data mover, some not. This may be covered by future extension of this design.

@shubham-pampattiwar Please correct me if my understanding is wrong.

@sseago
Copy link
Collaborator

sseago commented Mar 29, 2024

@asaf-erlich If your goal is to back up with a mixture of fs-backup and datamover, that is absolutely supported by this design. Policies will identify volumes for fs-backup and snapshot, and setting --snapshot-move-data=true tells velero to use datamover for all snapshotted volumes. The only use case that's not supported in this design is if you want to use CSI snapshots without data movement for some volumes (leaving VolumeSnapshotContents in the cluster) and CSI snapshots with data movement for others, but from the use case you described, that doesn't seem relevant here.

@shubham-pampattiwar shubham-pampattiwar force-pushed the extend-vol-policy-design branch from 48bfa56 to 69a9a17 Compare April 1, 2024 22:16
@shubham-pampattiwar
Copy link
Collaborator Author

@reasonerjt PTAL, Thanks !

@abh
Copy link

abh commented Apr 2, 2024

Do you mean you want to specify data mover as the backup type for some of the volumes?

For me the important part is to be able to choose which volumes are being backed up. How I understand the current settings I can only choose by namespace when I want snapshot+data-mover.

In my use case I want all the snapshots to be with data mover (and I don't think there are any scenarios where fs-backup is better if snapshot+data mover is possible?).

For my use cases having the label on the pod (or on the PVC) is useful in many scenarios because it's so explicit. If the configuration is more complicated I'd worry I mistype a glob match or a label selector and accidentally don't have backups.

reasonerjt
reasonerjt previously approved these changes Apr 2, 2024
Copy link
Contributor

@reasonerjt reasonerjt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

add changelog

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

fix codespell

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

update codeblocks for language syntax rendering

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

redo design

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

update volume policies design

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

add notes and modify design based on community feedback

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

add future scope

add bia csi snapshot action details

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

add volumehelper package in implementation section

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

fix codespell

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

introduce volumehelper interface and volumepolicyhelper struct

address feedback regarding volumehelper interface and its funcs

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>

fix codespell

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
@shubham-pampattiwar
Copy link
Collaborator Author

@reasonerjt Thanks for the review, need a re-ack, fixed CI errors related to codespell.
@sseago Please take another look, Thank you !

@sseago sseago merged commit f85f877 into vmware-tanzu:main Apr 3, 2024
66 checks passed
blackpiglet pushed a commit to blackpiglet/velero that referenced this pull request Apr 10, 2024
…re-tanzu#6956)

add changelog



fix codespell



update codeblocks for language syntax rendering



redo design



update volume policies design



add notes and modify design based on community feedback



add future scope

add bia csi snapshot action details



add volumehelper package in implementation section



fix codespell



introduce volumehelper interface and volumepolicyhelper struct

address feedback regarding volumehelper interface and its funcs



fix codespell

Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.