Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

br: add retry for s3 storage write failure (#780) #851

Merged
merged 2 commits into from
Mar 12, 2021

Conversation

ti-srebot
Copy link
Contributor

@ti-srebot ti-srebot commented Mar 11, 2021

cherry-pick #780 to release-4.0
You can switch your code base to this Pull Request by using git-extras:

# In br repo:
git pr https://github.com/pingcap/br/pull/851

After apply modifications, you can push your change to this PR via:

git push git@github.com:ti-srebot/br.git pr/851:release-4.0-66ae5446b1d6

What problem does this PR solve?

A fix to #774.

What is changed and how it works?

In the past, if we meet an unknown error from gRPC, one pushDown Task would return an err and the errgroup in (*Client).BackupRanges would cancel the context for all pushDown tasks. However, after a retry at pushDown task level added, cancelled context is treaded as 'retryable'(Why? Anyway, I leave it untouched.). Since the context has been cancelled, any retry would be meaningless, before the long journey of retry ends, br would stuck.

I add a check for ResetBackupClient. If context cancelled, it returns immediately.

What's more, added a temporary solution for S3 disconnection. It was based on the error message return from TiKV and unstructured, which is unsafe. We should add an error variant for this in the future.

Check List

Tests

  • Integration test

Release Note

  • Fix a bug that caused, when meeting unknown backup errors, BR would stuck for a period of time.
  • Backoff time of ErrKVDownloadFailed is increased for external storage outage.
  • Added retry for S3 errors in backing up.

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@kennytm
Copy link
Collaborator

kennytm commented Mar 11, 2021

/lgtm

@ti-srebot ti-srebot added the status/LGT1 LGTM1 label Mar 11, 2021
@glorv
Copy link
Collaborator

glorv commented Mar 12, 2021

/lgtm

@ti-srebot ti-srebot added status/LGT2 LGTM2 and removed status/LGT1 LGTM1 labels Mar 12, 2021
@kennytm
Copy link
Collaborator

kennytm commented Mar 12, 2021

/merge

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot merged commit 2ee2ff3 into pingcap:release-4.0 Mar 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants