Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsafe recovery: Introduce auto-detect mode for online recovery #5403

Merged
merged 9 commits into from
Aug 11, 2022

Conversation

Connor1996
Copy link
Member

@Connor1996 Connor1996 commented Aug 5, 2022

Signed-off-by: Connor1996 zbk602423539@gmail.com

What problem does this PR solve?

Issue Number: Close #5415

What is changed and how does it work?

Support auto-detect failed stores for online recovery

Check List

Tests

  • Unit test

Related changes

Release note

Support auto-detect failed stores for online recovery

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 5, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • disksing
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 5, 2022
@ti-chi-bot ti-chi-bot requested review from nolouch and rleungx August 5, 2022 08:21
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@codecov
Copy link

codecov bot commented Aug 5, 2022

Codecov Report

Merging #5403 (5a6ca56) into master (0791fe8) will decrease coverage by 0.11%.
The diff coverage is 80.00%.

@@            Coverage Diff             @@
##           master    #5403      +/-   ##
==========================================
- Coverage   75.71%   75.60%   -0.12%     
==========================================
  Files         313      313              
  Lines       31101    31128      +27     
==========================================
- Hits        23549    23535      -14     
- Misses       5544     5588      +44     
+ Partials     2008     2005       -3     
Flag Coverage Δ
unittests 75.60% <80.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tools/pd-ctl/pdctl/command/unsafe_command.go 66.07% <60.00%> (-10.12%) ⬇️
server/api/unsafe_operation.go 88.00% <90.00%> (-3.31%) ⬇️
server/cluster/unsafe_recovery_controller.go 80.91% <93.33%> (+0.01%) ⬆️
pkg/tempurl/tempurl.go 60.00% <0.00%> (-10.00%) ⬇️
pkg/etcdutil/etcdutil.go 82.55% <0.00%> (-5.82%) ⬇️
server/tso/allocator_manager.go 60.36% <0.00%> (-2.49%) ⬇️
server/election/leadership.go 75.25% <0.00%> (-2.07%) ⬇️
server/schedule/hbstream/heartbeat_streams.go 72.72% <0.00%> (-2.03%) ⬇️
server/grpc_service.go 48.22% <0.00%> (-1.53%) ⬇️
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Connor1996
Copy link
Member Author

PTAL @v01dstar

@Connor1996 Connor1996 marked this pull request as ready for review August 8, 2022 09:01
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2022
Copy link
Contributor

@v01dstar v01dstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Member

@v01dstar: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

LGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

server/api/unsafe_operation.go Outdated Show resolved Hide resolved
server/api/unsafe_operation.go Outdated Show resolved Hide resolved
server/cluster/unsafe_recovery_controller.go Outdated Show resolved Hide resolved
server/cluster/unsafe_recovery_controller.go Outdated Show resolved Hide resolved
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@Connor1996 Connor1996 requested a review from rleungx August 9, 2022 07:59
@Connor1996 Connor1996 changed the title unsafe recovery: Introduce force mode for online recovery unsafe recovery: Introduce auto-detect mode for online recovery Aug 9, 2022
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@@ -615,13 +625,22 @@ func (u *unsafeRecoveryController) recordAffectedRegion(region *metapb.Region) {
}
}

func (u *unsafeRecoveryController) isFailed(peer *metapb.Peer) bool {
_, isFailed := u.failedStores[peer.StoreId]
_, isLive := u.storeReports[peer.StoreId]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for at least a round of store heartbeat?

Copy link
Member Author

@Connor1996 Connor1996 Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't avoid that for all the corner cases, add caveat in the flag comment

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM.

server/api/unsafe_operation_test.go Outdated Show resolved Hide resolved
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 11, 2022
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 11, 2022
@Connor1996
Copy link
Member Author

/merge

@ti-chi-bot
Copy link
Member

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 5a6ca56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support auto detect failed stores for online recovery
5 participants