Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validator(dm): support start from time & fix & integration test #5534

Merged
merged 17 commits into from
May 27, 2022

Conversation

D3Hunter
Copy link
Contributor

@D3Hunter D3Hunter commented May 23, 2022

What problem does this PR solve?

Issue Number: close #5449

What is changed and how it works?

this pr is large most due to change to .proto and test

  • support validation start --start-time
  • refactor and add more check to validation start, validation start can be used in 2 ways:
    • enable validator on the fly(like create+start in task operation), in this case all validator of target subtask should not have been enabled before;
    • start stopped validator, all validator of target subtask should have been enabled, and cannot add explicit mode and start-time flag
  • add mode to validation status
  • fix bug that we set an not-completely-initialized subtaskConfig into subtask on validator start. now we only update needed config.
  • fix bug that we initialized validator when expect stage is stopped(happens on failover), which may cause resource leak.
  • extract real binlog from relay log binlog pos
  • reset FirstValidateTS when loading pending row changes to avoid failed row changes be marked as error row immediately if validator stopped for a long time.
  • add more integration test

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

`None`.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented May 23, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • buchuitoudegou
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 23, 2022
@D3Hunter
Copy link
Contributor Author

/run-all-tests

@D3Hunter
Copy link
Contributor Author

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review later

dm/pkg/binlog/filename.go Outdated Show resolved Hide resolved
dm/pkg/binlog/filename.go Show resolved Hide resolved
dm/dm/ctl/master/start_stop_validation.go Outdated Show resolved Hide resolved
// OperateValidationTask operate validator of subtask.
// tasks: tasks need to operate
// validatorStages: stage info of subtask validators
// changedSubtaskCfgs: length should be 0 or equal to validatorStages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems changedSubtaskCfgs can tolerate not the same length as validatorStages 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caller should and has make sure about this

dm/dm/master/server.go Outdated Show resolved Hide resolved
Copy link
Contributor

@buchuitoudegou buchuitoudegou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look at IT later.

return terror.Annotate(err, "error while parse start-time, expected in the format like '2006-01-02 15:04:05' or '2006-01-02T15:04:05'")
}
if _, err := utils.ParseStartTime(t.StartTime); err != nil {
return terror.Annotate(err, "error while parse start-time, expected in the format like '2006-01-02 15:04:05' or '2006-01-02T15:04:05'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return terror.Annotate(err, "error while parse start-time, expected in the format like '2006-01-02 15:04:05' or '2006-01-02T15:04:05'")
return terror.Annotate(err, "error while parsing start-time, expected formats are like '2006-01-02 15:04:05' or '2006-01-02T15:04:05'")

Copy link
Contributor Author

@D3Hunter D3Hunter May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 other message below using error while parse style which doesn't meets english grammar, will not change them in this pr. maybe during refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note it in the issue. (this error message will be presented to the users, it would better be grammatical valid and make sense).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those string it not introduced in this pr, will move it into #5143

dm/dm/master/server.go Outdated Show resolved Hide resolved
// if user set mode or start-time explicitly, then we do start operation.
// none of the target validators should have enabled
if explicitModeOrStartTime && !noneEnabled {
msg := fmt.Sprintf("some of target validator(%s) has already enabled.", enabledValidator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we list all the validators that have been enabled? Or maybe leave a workaround here (e.g. "please check validator's statuses by dmctl query-status"). Otherwise, users might have to try over and over again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we list all the validators that have been enabled

not sure how often user start validator only on some subtask, we can enhance it later.

@D3Hunter
Copy link
Contributor Author

/run-all-tests

@D3Hunter
Copy link
Contributor Author

ptal of fc1f258 too.

fix bug that we initialized validator when expect stage is stopped(happens on failover), which may cause resource leak.

@D3Hunter
Copy link
Contributor Author

/run-all-tests

@D3Hunter
Copy link
Contributor Author

/run-all-tests

@D3Hunter
Copy link
Contributor Author

/run-verify

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2022

Codecov Report

Merging #5534 (01d3a3b) into master (fcea4d5) will increase coverage by 0.0993%.
The diff coverage is 59.9280%.

Flag Coverage Δ
cdc 61.4505% <62.1827%> (+0.3024%) ⬆️
dm 52.0594% <54.4329%> (+0.0167%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #5534        +/-   ##
================================================
+ Coverage   56.0768%   56.1761%   +0.0993%     
================================================
  Files           535        528         -7     
  Lines         70143      69857       -286     
================================================
- Hits          39334      39243        -91     
+ Misses        27078      26883       -195     
  Partials       3731       3731                

@D3Hunter
Copy link
Contributor Author

ptal

@lance6716 lance6716 added the area/dm Issues or PRs related to DM. label May 26, 2022
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave for lunch

}

if op == StartValidationOp {
args.explicitMode = cmd.Flags().Changed("mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the default value of mode is "", we in fact don't need this check. Also for start-time.

I'm both OK with these two styles: if explicitMode {checkMode} or if mode != "" {checkMode}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will leave it.

allEnabled, noneEnabled := true, true
for taskName := range subTaskCfgs {
for sourceID := range subTaskCfgs[taskName] {
if s.scheduler.ValidatorEnabled(taskName, sourceID) {
Copy link
Contributor

@lance6716 lance6716 May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check for all subtasks is not atomic, is it safe that other RPC change the status?

seems it will break "changedSubtaskCfgs: length should be 0 or equal to validatorStages"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dm/dm/master/server.go Outdated Show resolved Hide resolved
dm/dm/master/server.go Show resolved Hide resolved
@D3Hunter
Copy link
Contributor Author

/run-dm-integration-test

@D3Hunter
Copy link
Contributor Author

/run-verify

@D3Hunter
Copy link
Contributor Author

ptal @lance6716 @buchuitoudegou

Copy link
Contributor

@buchuitoudegou buchuitoudegou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

dm/tests/dmctl_command/run.sh Show resolved Hide resolved

echo "--> (success) validation start: start validation without explicit mode for source 1"
run_dm_ctl $WORK_DIR "127.0.0.1:$MASTER_PORT" \
"validation start -s $SOURCE_ID1 test" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test case that specifies multiple sources? More than one source is valid too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return terror.Annotate(err, "error while parse start-time, expected in the format like '2006-01-02 15:04:05' or '2006-01-02T15:04:05'")
}
if _, err := utils.ParseStartTime(t.StartTime); err != nil {
return terror.Annotate(err, "error while parse start-time, expected in the format like '2006-01-02 15:04:05' or '2006-01-02T15:04:05'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note it in the issue. (this error message will be presented to the users, it would better be grammatical valid and make sense).

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 27, 2022
"\"unit\": \"Sync\"" 2

echo "--> (fail) validation start: missing mode value"
run_dm_ctl $WORK_DIR "127.0.0.1:$MASTER_PORT" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also check the help message of validation xxx is readable, like "pause-task \[-s source ...\] \[task-name | task-file\] \[flags\]"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no gains, cobra will just print cobra.Command.Use & cobra.Command.Short

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 27, 2022
@lance6716
Copy link
Contributor

/hold

please unhold after addresss comments

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2022
@D3Hunter
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2022
@D3Hunter
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: fed53e2

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 27, 2022
@D3Hunter
Copy link
Contributor Author

/run-dm-integration-test

@ti-chi-bot ti-chi-bot merged commit d957fe8 into pingcap:master May 27, 2022
@D3Hunter D3Hunter deleted the start-from-time branch May 27, 2022 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

start validation from timestamp
5 participants