Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup Azure Managed Disks from multiple Resource Groups #3157

Closed
cholm321 opened this issue Sep 25, 2020 · 16 comments
Closed

Backup Azure Managed Disks from multiple Resource Groups #3157

cholm321 opened this issue Sep 25, 2020 · 16 comments
Assignees
Labels
Area/Cloud/Azure Enhancement/User End-User Enhancement to Velero Icebox We see the value, but it is not slated for the next couple releases. kind/requirement Reviewed Q2 2021 Volumes Relating to volume backup and restore
Milestone

Comments

@cholm321
Copy link

We have an AKS cluster with velero installed in an enterprise setup.

On the cluster we have
app1 with managed disk in ResourceGroup 1
app2 with managed disk in ResourceGroup 2
app3 with managed disk in ResourceGroup 3

All Resource Groups has the AKS spn as contributor.

We have been poking a little around, but it seem that the ResourceGroup must be specified on velero install.

Have we missed something - is it possible to do all configuration at backup stage?
Or better, let the plugin lookup which ResourceGroup the disks are located by inspecting the pv objects.

@ashish-amarnath
Copy link
Member

@cholm321 If I am following this correctly, the issue you are facing is an error during disk information lookup which uses the resource group from the envvars, derived from the cloud-credentials secret, as the resource group for all disks.
https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/velero-plugin-for-microsoft-azure/volume_snapshotter.go#L163

This is, currently, an unsupported capability in Velero. We do have plans of adding support for multiple cloud credentials and I am expecting this to be addressed as a side effect of that.

@ashish-amarnath
Copy link
Member

This is the design got supporting multiple credentials.
#2403

@cholm321
Copy link
Author

@ashish-amarnath I would think that when Azure Managed Disks are placed across several Azure Resource Groups, Velero should just lookup the DiskURI in the k8s meta data - no need to expose that to the velero cli.

When I do kubectl get pvc I can locate associated pv's by kubectl get pv.

Assume I have a pv named pv-something.
Then kubectl describe pv pv-something gives me

DiskURI:      /subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Compute/disks/kubernetes-dynamic-pv-something

Why does Velero just not do that lookup?

@ashish-amarnath
Copy link
Member

That metadata is specific to Azure.
The volumesnapshotter interface is designed to be a generic volume snapshotter and keeping it agnostic of kubernetes volumes.

func (c *VolumeSnapshotterGRPCClient) CreateSnapshot(volumeID, volumeAZ string, tags map[string]string) (string, error) {

So the volumesnapshotter is given the necessary info to locate the volume and call the volume provider's snapshot API.

This, unfortunately, makes changing the interface a breaking change across all providers.

@nrb nrb transferred this issue from vmware-tanzu/velero-plugin-for-microsoft-azure Dec 8, 2020
@nrb nrb added Area/Cloud/Azure Volumes Relating to volume backup and restore Enhancement/User End-User Enhancement to Velero labels Dec 8, 2020
@eleanor-millman eleanor-millman added Reviewed Q2 2021 Icebox We see the value, but it is not slated for the next couple releases. labels May 12, 2021
@francois-travais
Copy link

I've the same issue: my AKS cluster is in a resource group while all the nodes, and therefore the PVC disks, are in another. Velero is stuck searching for the disks in AKS cluster instead of the nodes resource group, even if I change the resource group of the VolumeSnapshotLocation

@dploeger
Copy link

dploeger commented Apr 1, 2022

Any news about this? It's still valid for Velero helm 2.29.4 / Velero image 1.8.1 and velero plugin for azure v1.4.1 and this makes velero unable to backup dynamic volumes.

@grodzik
Copy link

grodzik commented Apr 13, 2022

Same here, with latest versions. Any plans to fix this within foreseeable timeframe?

@jkurek1
Copy link

jkurek1 commented May 13, 2022

Hi guys,
Any update? I have similiar use case.

We have configured storage classes in different resource groups and volumes are provisioned, but during the backup, velero is using only one resource group which is configured in velero credentials.

@dploeger
Copy link

@ashish-amarnath Reading this again, I believe it's not about multiple credentials.

With Azure, the thing is that the cluster is in resourcegroup "dev" and AKS automatically creates a second resource group like "MC_dev_myk8scluster_europewest" where it puts all the k8s resource it creates for Azure (like the said managed disks for PVCs).

So maybe it's the fault of the azure plugin for velero trying to search for the disks in the wrong resource group.

@dploeger
Copy link

However, this worked in helm chart 2.15.0 with azure plugin 1.1.0.

@dploeger
Copy link

dploeger commented May 23, 2022

Running azure plugin 1.1.0 and up with the most current helm chart verison 2.29.6 yields no errors but instead skips the PVCs:

time="2022-05-23T10:45:14Z" level=info msg="Backing up item" backup=velero/test2 logSource="pkg/backup/item_backupper.go:122" name=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 namespace= resource=persistentvolumes
time="2022-05-23T10:45:14Z" level=info msg="Executing takePVSnapshot" backup=velero/test2 logSource="pkg/backup/item_backupper.go:395" name=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 namespace= resource=persistentvolumes
time="2022-05-23T10:45:14Z" level=info msg="label \"topology.kubernetes.io/zone\" is not present on PersistentVolume, checking deprecated label..." backup=velero/test2 logSource="pkg/backup/item_backupper.go:422" name=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 namespace= persistentVolume=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 resource=persistentvolumes
time="2022-05-23T10:45:14Z" level=info msg="label \"failure-domain.beta.kubernetes.io/zone\" is not present on PersistentVolume" backup=velero/test2 logSource="pkg/backup/item_backupper.go:426" name=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 namespace= persistentVolume=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 resource=persistentvolumes
time="2022-05-23T10:45:14Z" level=info msg="zone info not available in nodeAffinity requirements" backup=velero/test2 logSource="pkg/backup/item_backupper.go:431" name=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 namespace= persistentVolume=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 resource=persistentvolumes
time="2022-05-23T10:45:14Z" level=info msg="No volume ID returned by volume snapshotter for persistent volume" backup=velero/test2 logSource="pkg/backup/item_backupper.go:455" name=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 namespace= persistentVolume=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 resource=persistentvolumes volumeSnapshotLocation=default
time="2022-05-23T10:45:14Z" level=info msg="Persistent volume is not a supported volume type for snapshots, skipping." backup=velero/test2 logSource="pkg/backup/item_backupper.go:466" name=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 namespace= persistentVolume=pvc-8426a489-4cb1-441e-a144-33ae7e4bfde4 resource=persistentvolumes

@dploeger
Copy link

Ah, sorry. I just overread the documentation stating that you should use the generated resource group instead of the resource group your cluster is in.

@eleanor-millman eleanor-millman added the 1.10-candidate The label used for 1.10 planning discussion. label May 25, 2022
@snowerem
Copy link

snowerem commented Jun 2, 2022

Hi guys,
Any update? We have similar use case - storage classes in different resource groups and volumes are provisioned. Fixing that would be a game changer for Azure backups.

@ywk253100
Copy link
Contributor

To address this issue, we may need to introduce a new version of the plugin interface which needs the support of plugin versioning.

And another choice is to use the CSI plugin instead which is going to be GA in v1.9

@reasonerjt
Copy link
Contributor

@ywk253100 I think in v1.10 timeframe we may verify if CSI plugin can solve the problem. If we can solve it via CSI we may consider closing this issue.

@ywk253100 ywk253100 removed the 1.10-candidate The label used for 1.10 planning discussion. label Aug 17, 2022
@ywk253100 ywk253100 added this to the 1.10 milestone Aug 17, 2022
@ywk253100
Copy link
Contributor

I have verified that this case can be resolved by taking a snapshot with the CSI plugin. For more information about how to use CSI, please refer to the doc.

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Cloud/Azure Enhancement/User End-User Enhancement to Velero Icebox We see the value, but it is not slated for the next couple releases. kind/requirement Reviewed Q2 2021 Volumes Relating to volume backup and restore
Projects
None yet
Development

No branches or pull requests