This repository has been archived by the owner on Jul 24, 2024. It is now read-only.
br: add retry for s3 storage write failure (#780) #851
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cherry-pick #780 to release-4.0
You can switch your code base to this Pull Request by using git-extras:
# In br repo: git pr https://github.com/pingcap/br/pull/851
After apply modifications, you can push your change to this PR via:
What problem does this PR solve?
A fix to #774.
What is changed and how it works?
In the past, if we meet an unknown error from gRPC, one pushDown Task would return an
err
and theerrgroup
in(*Client).BackupRanges
would cancel the context for all pushDown tasks. However, after a retry at pushDown task level added, cancelled context is treaded as 'retryable'(Why? Anyway, I leave it untouched.). Since the context has been cancelled, any retry would be meaningless, before the long journey of retry ends, br would stuck.I add a check for
ResetBackupClient
. If context cancelled, it returns immediately.What's more, added a temporary solution for S3 disconnection. It was based on the error message return from TiKV and unstructured, which is unsafe. We should add an error variant for this in the future.
Check List
Tests
Release Note
ErrKVDownloadFailed
is increased for external storage outage.