Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler,processor(ticdc): update capture liveness by heartbeat #6613

Merged
merged 3 commits into from
Aug 4, 2022

Conversation

overvenus
Copy link
Member

What problem does this PR solve?

Issue Number: close #4757

What is changed and how it works?

Set liveness to stopping when it is drained by DrainCapture API.
It prevents unnecessary owner switch during rolling upgrade.

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)
master branch this PR
image image

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

None

Signed-off-by: Neil Shen <overvenus@gmail.com>
@overvenus overvenus added component/scheduler TiCDC inner scheduler component. area/ticdc Issues or PRs related to TiCDC. labels Aug 4, 2022
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 4, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • 3AceShowHand
  • liuzix

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 4, 2022
liuzix
liuzix previously approved these changes Aug 4, 2022
Comment on lines 193 to 203
func (a *agent) handleLivenessUpdate(liveness model.Liveness) {
currentLiveness := a.liveness.Load()
if currentLiveness != liveness {
if currentLiveness == model.LivenessCaptureStopping {
// No way to go back, once it becomes shutting down,
return
}
log.Info("schedulerv3: agent liveness changed",
zap.String("old", a.liveness.String()),
zap.String("new", liveness.String()),
zap.String("source", source))
a.liveness = liveness
log.Info("schedulerv3: agent updates liveness",
zap.String("old", currentLiveness.String()),
zap.String("new", liveness.String()))
a.liveness.Store(liveness)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a data race? For example, a.liveness is changed between the Load() and Store() calls by another thread?

@@ -77,7 +77,7 @@ type processor struct {

lazyInit func(ctx cdcContext.Context) error
createTablePipeline func(ctx cdcContext.Context, tableID model.TableID, replicaInfo *model.TableReplicaInfo) (pipeline.TablePipeline, error)
newAgent func(ctx cdcContext.Context) (scheduler.Agent, error)
newAgent func(cdcContext.Context, *model.Liveness) (scheduler.Agent, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you consider making an AgentFactory interface? So that we can put whatever extra arguments we need into the implementation of an AgentFactory, so that further changes will not affect how the interface method NewAgent is called.

Just a suggestion. If you think this part of code is not subject to anymore significant changes, I think the current way is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will change anytime soon, let's add the factory when necessary.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 4, 2022
@liuzix liuzix dismissed their stale review August 4, 2022 08:12

Misclicked

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 4, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2022

Codecov Report

Merging #6613 (2441c06) into master (2ea67ba) will increase coverage by 0.0370%.
The diff coverage is 85.3260%.

Flag Coverage Δ
cdc 65.8718% <67.9245%> (+0.0037%) ⬆️
dm 52.0573% <ø> (+0.0156%) ⬆️
engine 62.7358% <92.3664%> (+0.1997%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #6613        +/-   ##
================================================
+ Coverage   59.2138%   59.2508%   +0.0370%     
================================================
  Files           780        780                
  Lines         88405      88505       +100     
================================================
+ Hits          52348      52440        +92     
- Misses        31415      31422         +7     
- Partials       4642       4643         +1     

@overvenus
Copy link
Member Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 2441c06

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 4, 2022
@ti-chi-bot ti-chi-bot merged commit 6ff0ac6 into pingcap:master Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. component/scheduler TiCDC inner scheduler component. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking issue for reducing high latency
5 participants