Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/tikv: refine the retry mechanism of region errors #25165

Merged

Conversation

youjiali1995
Copy link
Contributor

Signed-off-by: youjiali1995 zlwgx1023@gmail.com

What problem does this PR solve?

Problem Summary:

TiKV adds more region errors, e.g., ReadIndexNotReady/ProposalInMergingMode/DataIsNotReady/RegionNotInitialized, TiDB should handle them properly.

What is changed and how it works?

What's Changed:

  • Add the regionScheduling backoff configuration for backoff due to region scheduling which is separated from the regionMiss. The aim is to reduce backoff sleep time.
  • Handle more region errors.
  • Change the order of handling region errors.

Related changes

Check List

Tests

  • No code

Side effects

Release note

  • No release note.

@youjiali1995 youjiali1995 added the sig/transaction SIG:Transaction label Jun 4, 2021
@youjiali1995 youjiali1995 requested review from lysu, sticnarf and ekexium June 4, 2021 09:46
@ti-chi-bot ti-chi-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 4, 2021
Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>
@youjiali1995 youjiali1995 force-pushed the improve-region-error-handling branch from cf384c8 to fdf5875 Compare June 4, 2021 10:29
@@ -89,6 +89,7 @@ var (
BoPDRPC = NewConfig("pdRPC", &metrics.BackoffHistogramPD, NewBackoffFnCfg(500, 3000, EqualJitter), tikverr.NewErrPDServerTimeout(""))
// change base time to 2ms, because it may recover soon.
BoRegionMiss = NewConfig("regionMiss", &metrics.BackoffHistogramRegionMiss, NewBackoffFnCfg(2, 500, NoJitter), tikverr.ErrRegionUnavailable)
BoRegionScheduling = NewConfig("regionScheduling", &metrics.BackoffHistogramRegionScheduling, NewBackoffFnCfg(2, 500, NoJitter), tikverr.ErrRegionUnavailable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the regionScheduling backoff configuration for backoff due to region scheduling which is separated from the regionMiss. The aim is to reduce backoff sleep time.

BoRegionScheduling config is the same as BoRegionMiss?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It reduces the backoff sleep time by dividing the backoff times to different configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see...

store/tikv/region_request.go Show resolved Hide resolved
@@ -939,6 +947,7 @@ func (s *RegionRequestSender) onRegionError(bo *Backoffer, ctx *RPCContext, req
}
return retry, errors.Trace(err)
}

if regionErr.GetServerIsBusy() != nil {
logutil.BgLogger().Warn("tikv reports `ServerIsBusy` retry later",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the log level be adjusted like others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's qualified...

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 4, 2021
@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • lysu
  • sticnarf

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 5, 2021
@sticnarf
Copy link
Contributor

sticnarf commented Jun 5, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: d89d38c

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 5, 2021
@ti-chi-bot
Copy link
Member

@youjiali1995: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/transaction SIG:Transaction size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants