Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup restore is very slow #426

Open
jesulo opened this issue Oct 21, 2024 · 4 comments
Open

Backup restore is very slow #426

jesulo opened this issue Oct 21, 2024 · 4 comments

Comments

@jesulo
Copy link

jesulo commented Oct 21, 2024

I'm doing a backup restore on a ct that weighs 500gb, but only has 80gb occupied. When the backup is on the local disk it takes 3 and a half hours, but when it is on pbs it takes 7 hours. Because in both cases it takes a long time. Is there a way to reduce the time? They are on zfs with linstor. Regards

@ghernadi
Copy link
Contributor

I am not sure what you are actually looking for?

What is "ct", what is "pbs"?

What do you mean with "When the backup is on the local disk it takes 3 and a half hours"? When you already have the backup locally available, restoring the backup (or rather the snapshot) into a new LINSTOR-resource should only take a few seconds, not 3.5h.

What is the download-speed of the satellite that downloads the backup? What would be the time you would expect for 80GB to be downloaded (and why)?

@jesulo
Copy link
Author

jesulo commented Oct 21, 2024

I mean an lxc container or a proxmox vm.
Pbs is the proxmox backup server.
Yes, the restoration of the container backup takes 3 and a half hours on a local disk and when I do it from the pbs it takes longer. What settings should I make so that it doesn't take so long?
How do I see the download speed? In the restore log it says that the restore speed was 5 Mb. Maybe it's because I used zfs? Or for HA replication?

@ghernadi
Copy link
Contributor

If you are restoring from proxmox backup server, I assume the data is getting copied and possibly sent to the other peers via DRBD.

This is more of a performance tuning question than an actual bug, so I would suggest that you do some testing. I.e. try to restore a resource into a resource that has only 1 replica. The idea is that regardless if you have DRBD configured or not, if there are no other diskful DRBD peers, the restore-operation will not depend on your network speed. If this test is much faster than what you have right now, you will want to investigate further into network optimizations and DRBD tuning (for example https://kb.linbit.com/tuning-drbds-resync-controller, but feel free to further google).
If the results are someone similar to what you have right now, the network is not a problem. I would doubt that DRBD would be an issue with local writes, so my next guess is to check your storage speed by restoring into a storage-only resource. If that is also slow, it depends on your setup where to continue the investigation. If you are using VMs, check how the disk-IO is mapped from the virtual machine to the physical hardware and see if you can optimize things there.

From what you have said until now, this does not look like an issue with LINSTOR at all, since LINSTOR is not even in the IO path in these use-cases. My guess is that the bottleneck is either your network's or your storage's speed (check both, the reading as well as the writing storage).

@jesulo
Copy link
Author

jesulo commented Oct 22, 2024

I modified rs-discard-granularity to 1M, but the slowness continues. I've noticed that the I/O is very high; when restoring, it even impacts other virtual machines on the same disk. Could you tell me what configurations I could apply so that replication with the other node doesn't affect I/O too much? Can it be configured as asynchronous or lower the priority of replication? What properties do you recommend that I modify? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants