Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disttask: persist business error to subtask table #47334

Merged
merged 2 commits into from
Oct 9, 2023

Conversation

tangenta
Copy link
Contributor

@tangenta tangenta commented Sep 27, 2023

What problem does this PR solve?

Issue Number: ref #46258

Problem Summary:

  • WaitGlobalTask may exit before cleaning up the global task, which can cause unexpected behavior.
  • The error returned by scheduler.Init() should persist to subtask table instead of constantly retrying.

What is changed and how it works?

As the title said.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/needs-tests-checked labels Sep 27, 2023
@tiprow
Copy link

tiprow bot commented Sep 27, 2023

Hi @tangenta. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@codecov
Copy link

codecov bot commented Sep 27, 2023

Codecov Report

Merging #47334 (9f1cdd9) into master (257278d) will increase coverage by 0.1624%.
Report is 50 commits behind head on master.
The diff coverage is 28.0000%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #47334        +/-   ##
================================================
+ Coverage   72.4858%   72.6483%   +0.1624%     
================================================
  Files          1349       1374        +25     
  Lines        401073     410607      +9534     
================================================
+ Hits         290721     298299      +7578     
- Misses        91239      93486      +2247     
+ Partials      19113      18822       -291     
Flag Coverage Δ
integration 39.8091% <0.0000%> (?)
unit 72.4472% <28.0000%> (-0.0386%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9913% <ø> (ø)
parser 84.7544% <ø> (-0.1186%) ⬇️
br 47.9749% <ø> (-5.0984%) ⬇️

logutil.Logger(m.logCtx).Error("task manager error", zap.Error(err))
}

func (m *Manager) logErrAndPersist(err error, taskID int64) {
m.logErr(err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will here get return value of m.logErr(err), then store return packed err into subtask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is designed NOT to store errors in subtasks. That's why I change into two methods.

disttask/framework/scheduler/manager.go Outdated Show resolved Hide resolved

func (m *Manager) logErrAndPersist(err error, taskID int64) {
m.logErr(err)
err1 := m.taskTable.UpdateErrorToSubtask(m.id, taskID, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When update to subtask failed, what will happen?
We can use backoffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will act like logErr() and keep retrying.

Copy link
Contributor

@ywqzzy ywqzzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 28, 2023
@tangenta
Copy link
Contributor Author

/retest

@tiprow
Copy link

tiprow bot commented Sep 28, 2023

@tangenta: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tangenta
Copy link
Contributor Author

tangenta commented Oct 1, 2023

/retest

@tangenta
Copy link
Contributor Author

tangenta commented Oct 1, 2023

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Oct 1, 2023
Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 9, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GMHDBJD, ywqzzy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Oct 9, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 9, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-09-28 03:31:06.93078523 +0000 UTC m=+72664.517895359: ☑️ agreed by ywqzzy.
  • 2023-10-09 09:34:33.539051271 +0000 UTC m=+1044871.126161417: ☑️ agreed by GMHDBJD.

@ti-chi-bot ti-chi-bot bot merged commit 5538061 into pingcap:master Oct 9, 2023
11 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants