Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backing up resources in parallel #2888

Open
phuongatemc opened this issue Sep 1, 2020 · 22 comments
Open

Backing up resources in parallel #2888

phuongatemc opened this issue Sep 1, 2020 · 22 comments
Assignees
Labels
consistency Needs Product Blocked needing input or feedback from Product Performance

Comments

@phuongatemc
Copy link
Contributor

Currently Velero Backup processes resources in serial. In some scenarios, we would like to back up resources in parallel, not only to increase performance but also help reduce the time gap between backup time of the items. For example, to backup a Cassandra cluster of 20 pods (each has 1 PVC). The Backup of such cluster woud take snapshots of PVCs belong to these Pods and to help application consistency, these PVCs should be snapshotted as close to each other as possible (either in parallel or in a single volume group, depending on what storage back-end supported).

So the enhancement request is to allow users to specify the resource types (Kind) to be backed up in parallel. For example, we can enhance an option say "ConcurrentResources" and users can specify ConcurrentResources: "pods". Then during backup, we will create goroutine to backup all Pods in parallel.

This feature may conflict with the "OrderedResources" feature which we Backup resources of specific Kind in specific ordered. So these two "OrderedResources" and "ConcurrentResources" cannot specify the same Kind.

Another aspect can also be considered here is the level of concurrency allowed. For example if the back end system can only allow up to 10 PVC snapshots being taken in parallel or the backup storage device can allow 10 write streams in parallel then Backup cannot create more backup goroutines than such limit. This also raise the issue of multiple Backups in parallels and we need to factor in the limitation mentioned above when creating goroutines.

An alternative solution would be VolumeGroup that is currently proposed in the Kubernetes Data Protection Working Group. This VolumeGroup allows grouping together related PV (so they can be snapshotted together...).

@ashish-amarnath ashish-amarnath added the Needs Product Blocked needing input or feedback from Product label Sep 25, 2020
@phuongatemc
Copy link
Contributor Author

We ultimately want all the PVCs belong to the pods of the same application (in the same namespace) being snapshotted in parallel. However in Velero current implementation, backup item will backup pod and its PVC, PV together before moving to the next pod, we can make it parallel at the level of pod which should be good enough because each pod usually have 1 PVC-PV.

@stale
Copy link

stale bot commented Jul 10, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Jul 10, 2021
@zubron
Copy link
Contributor

zubron commented Jul 15, 2021

This is still needed.

@stale stale bot removed the staled label Jul 15, 2021
@stale
Copy link

stale bot commented Sep 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Sep 13, 2021
@stale
Copy link

stale bot commented Sep 27, 2021

Closing the stale issue.

@stale stale bot closed this as completed Sep 27, 2021
@zubron zubron reopened this Sep 27, 2021
@stale stale bot removed the staled label Sep 27, 2021
@jglick
Copy link

jglick commented Nov 2, 2021

if the back end system can only allow up to 10 PVC snapshots being taken in parallel or the backup storage device can allow 10 write streams in parallel then Backup cannot create more backup goroutines than such limit

Not necessarily; this can be handled simply by having the goroutine wait & retry in a provider-specific manner. jglick/velero-plugin-for-aws@b5d7c52 seems to work in the case of EBS snapshots. Needed when the number of PVs in the backup gets into the range of dozens.

@stale
Copy link

stale bot commented Jan 1, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Jan 1, 2022
@stale
Copy link

stale bot commented Jan 16, 2022

Closing the stale issue.

@stale stale bot closed this as completed Jan 16, 2022
@fabiorauber
Copy link

fabiorauber commented Jan 16, 2022

This issue is still relevant.

@onewithname
Copy link

This is still very relevant IMO. Is there any progress on this topic/issue?

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Oct 24, 2023

With 1.12, the time consuming actions (data related actions) in backup/restore are running in parallel, i.e., CSI snapshot creation for PVCs, data movement for PVCs.
There is still one legacy area we haven't touched --- volumes from different pods are still processed in sequence for fs backup. We may improve this in future.

The resource backups/restores are not planed to go in parallel, since the resources are small, we don't foresee much performance benefits to make it go concurrently.

@onewithname
Copy link

onewithname commented Oct 24, 2023

With 1.12, the time consuming actions (data related actions) in backup/restore are running in parallel, i.e., CSI snapshot creation for PVCs, data movement for PVCs. There is still one legacy area we haven't touched --- volumes from different pods are still processed in sequence for fs backup. We may improve this in future.

The resource backups/restores are not planed to go in parallel, since the resources are small, we don't foresee much performance benefits to make it go concurrently.

Thanks for the update!

Some information to preface my next question.
Also apologies if this is not directly related or off-topic. Please let me know if I should open it as separate issue.

In the environment I am managing we are running PowerProtect Data Manager (Dell backup tool for Kubernetes/OpenShift). We are struggling with backup performance and the main point of congestion I see (as well as is reported by Dell support) is the metadata backup by Velero (CRDs, secrets, etc.) taking ages.

Example:
We have 50 namespaces each with 60 "backup items" and no PVCs to be backed up.
In Velero each namespace is processed at rate of 1 resource/second - which takes 50 minutes for everything to complete since everything is sequential.
Issue gets even worse when working with more namespaces or higher resource counts.

Isn't this something where having resource backups be parallel would greatly improve the performance?

@weshayutin
Copy link
Contributor

With 1.12, the time consuming actions (data related actions) in backup/restore are running in parallel, i.e., CSI snapshot creation for PVCs, data movement for PVCs. There is still one legacy area we haven't touched --- volumes from different pods are still processed in sequence for fs backup. We may improve this in future.

The resource backups/restores are not planed to go in parallel, since the resources are small, we don't foresee much performance benefits to make it go concurrently.

I would argue that concurrent backups have been a topic in recent community calls. I can say that engineers from Red Hat are certainly interested in the topic and we're working on potential proposals. Perhaps @onewithname can join a few community calls to highlight their perspective, use case and requirements.

@sseago
Copy link
Collaborator

sseago commented Oct 24, 2023

@onewithname One thing you could do to help gauge how much of a speed-up you might be able to see with parallel resource backups: If you install two separate velero instances (in separate namespaces) and run two 25-namespace backups at the same time (one in each velero), how long does it take before both are complete? If velero threading is the bottleneck, then I'd expect completion in closer to 25 minutes than in 50, but if the APIServer is the bottleneck, then you may not see much improvement. That would help us to determine the potential value of this feature.

@Lyndon-Li
Copy link
Contributor

It is worthy trying what @sseago mentioned to find the bottleneck first. 1 resource/second doesn't look like a normal performance.

@sseago
Copy link
Collaborator

sseago commented Oct 25, 2023

For backups, there's an APIServer List call per-resource-type, per namespace. In the test case where you have 50 namespaces and 60 items per namespace, there will be quite a few apiserver calls -- of those 60 items in a namespace how many resource types are represented? It may be that you're making 1 call for every 5 items or so, on average. 1 second per item is still pretty slow, though. I've done some test backups/restores with a large number of items in a single namespace -- 30k secrets. That's 10x as many backup items as you have (50x60, so you have only 3k items), but at least on backup there's a small fraction of the apiserver calls. Backup takes about 2 minutes. On restore, where there are 2 apiserver calls per item (Create and Patch), it takes about 50 minutes, which is about 10x faster per item than you're seeing on backup.

Does restore take as long as backup for you?

@sseago
Copy link
Collaborator

sseago commented Oct 25, 2023

That being said, if running 2 velero instances increases your backup performance, then that suggests that for your use case, backing up multiple resources at a time in a single backup will significantly improve performance for your use case. At the same time, there may be some cluster performance issues in your environment that should be sorted, or maybe your velero pod needs more memory or CPU resources. It could be that your velero pod is CPU-limited or something similar.

@onewithname
Copy link

onewithname commented Oct 27, 2023

I would argue that concurrent backups have been a topic in recent community calls. I can say that engineers from Red Hat are certainly interested in the topic and we're working on potential proposals. Perhaps @onewithname can join a few community calls to highlight their perspective, use case and requirements.

I would be happy to assist if needed!

@onewithname One thing you could do to help gauge how much of a speed-up you might be able to see with parallel resource backups: If you install two separate velero instances (in separate namespaces) and run two 25-namespace backups at the same time (one in each velero), how long does it take before both are complete? If velero threading is the bottleneck, then I'd expect completion in closer to 25 minutes than in 50, but if the APIServer is the bottleneck, then you may not see much improvement. That would help us to determine the potential value of this feature.

As I have mentioned before in this environment I am using Dell PPDM backup solution - which is "orchestrating" and managing everything. So I don't have the flexibility of running multiple instances of Velero - as it is also managed by the tool. However I will look into if it would be possible to get test you describe arranged as standalone case.

As for API bottleneck - I am no OpenShift expert, so not really sure how to gauge that? (we are running on-premises OpenShift if that matters)

on Velero logs I see, this type of messages:
I1027 03:57:07.180380 1 request.go:601] Waited for 1.045330997s due to client-side throttling, not priority and fairness, request: GET:https://172.24.0.1:443/apis/rbac.istio.io/v1alpha1?timeout=32s I1027 03:57:17.230221 1 request.go:601] Waited for 3.844616025s due to client-side throttling, not priority and fairness, request: GET:https://172.24.0.1:443/apis/operators.coreos.com/v2?timeout=32s

but out of 77k log lines there are 44 entries of "due to client-side throttling", and they between when switching from one namespace to another. So do not think that would be that impactful, might be wrong though.

I've done some test backups/restores with a large number of items in a single namespace -- 30k secrets. That's 10x as many backup items as you have (50x60, so you have only 3k items), but at least on backup there's a small fraction of the apiserver calls. Backup takes about 2 minutes.

I have observed similar performance in my environment as well. Where Velero backs up 3000 resources in 5 seconds, than takes 3 minutes to go from 3000 to 3100.

On restore, where there are 2 apiserver calls per item (Create and Patch), it takes about 50 minutes, which is about 10x faster per item than you're seeing on backup.

Does restore take as long as backup for you?

The restores I have performed usually takes about the same time what I see on the backup when comparing on the same namespace.

It could be that your velero pod is CPU-limited or something similar.

Whenever I checked metrics on the velero pods they never even went up to 20-30% of provisioned resources.
I was using default values before, before last night I increased x4 times - but did not observe any improvement.

@ihcsim
Copy link

ihcsim commented Nov 6, 2023

@sseago Running multiple Velero replicas isn't an option atm, because we are using OADP, which AIUI, hard-coded the Velero replica configuration.

From the Velero logs (at least those that I examined), it doesn't look like the LIST calls to the API server is the bottleneck. The latency seems to come from the backup action. We are seeing the backup log line of Backed up M items out of an estimated total of N at every 1 second interval, for every item of each resource kind.

In some namespaces, for every item to be backed up, we are also seeing frequent occurrences of msg="[common-backup] Error in getting route: configmaps "oadp-registry-config" not found. Assuming this is outside of OADP context.". I assume that has something to do with us no backing up images, but not sure why it will be relevant to resource kinds like pods, service accounts etc.

@ihcsim
Copy link

ihcsim commented Nov 6, 2023

@onewithname It will be interesting to see how your API server is performing (cpu, memory, throttling logs etc.). The Velero logs you posted showed only client-side throttling. The API server could also be doing even more server-side throttling.

@sseago
Copy link
Collaborator

sseago commented Nov 6, 2023

@ihcsim I didn't mean multiple replicas (that doesn't work) -- I meant multiple velero installs (i.e. multiple OADP installs, in different namespaces). In any case, I was not proposing multiple installs as the solution -- but as a way of getting data. If you had 2 velero/OADP installs, you could run 2 backups at once, and we could see whether parallel resource backup in your particular environment with slow per-item backup rates, actually increased the throughput.

As for that "Error in getting route" message, it looks like that's coming from the oadp-1.0 imagestream plugin -- so:

  1. it only runs on imagestream resources in the backup.
  2. That particular message doesn't exist in versions of OADP newer than 1.0

@kaovilai
Copy link
Contributor

kaovilai commented Nov 4, 2024

This is being solved via #7474
#8334
and #7148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Needs Product Blocked needing input or feedback from Product Performance
Projects
None yet
Development

No branches or pull requests