Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightning: add retry if transaction failed while fetching task metas #53041

Open
wants to merge 7 commits into
base: release-7.5
Choose a base branch
from

Conversation

mittalrishabh
Copy link
Contributor

@mittalrishabh mittalrishabh commented May 6, 2024

What problem does this PR solve?

Issue Number: close 53042
Problem Summary:
The transaction may encounter an error message stating "Error 1205: Lock wait timeout exceeded; try restarting transaction". This error can cause the import job to fail. The import job does not always start from the checkpoint, as it depends on the stage at which the failure occurred. If the transaction fails after the import/ingestion is completed, the job will not start from the checkpoint. As a result, the entire bulk load job needs to be restarted, which is a costly operation and can potentially violate our SLOs.

What changed and how does it work?

Add retry with back off if transaction fails. It is going to retry up to 5 times with maxbackoff of 30 seconds and baseline backoff of 1 second.

Check List

Tests

  • [X ] Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Hao Wang and others added 7 commits April 24, 2024 17:49
…il confirm with PingCap. (pingcap#67)

This is to unblock release-7.5.1 PR ci.

Co-authored-by: Hao Wang <hwang4@airbnb.com>
…ingcap#64)

close pingcap#52049

Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
…pingcap#51449) (pingcap#51) (pingcap#63)

close pingcap#51448

Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
…) (pingcap#48) (pingcap#62)

close pingcap#51957

Co-authored-by: Naman Gupta <naman.gupta@airbnb.com>
Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
…#51371) (pingcap#46) (pingcap#61)

close pingcap#51370

Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
Co-authored-by: artem_danilov <artem_danilov@airbnb.com>
Co-authored-by: rishabh_mittal <mittalrishabh@gmail.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 6, 2024
Copy link

ti-chi-bot bot commented May 6, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added the needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. label May 6, 2024
Copy link

ti-chi-bot bot commented May 6, 2024

Hi @mittalrishabh. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. component/dumpling This is related to Dumpling of TiDB. labels May 6, 2024
@sre-bot
Copy link
Contributor

sre-bot commented May 6, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ mittalrishabh
✅ Tema
❌ Hao Wang


Hao Wang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ti-chi-bot ti-chi-bot bot added the sig/planner SIG: Planner label May 6, 2024
Copy link

ti-chi-bot bot commented May 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bb7133, benjamin2037, d3hunter, hi-rustin, lance6716, leavrth, yudongusa, zanmato1984 for approval, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mittalrishabh mittalrishabh changed the base branch from master to release-7.5 May 6, 2024 21:57
Copy link

ti-chi-bot bot commented May 6, 2024

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be approved by the approvers firstly.
  2. AFTER it has been approved by approvers, please wait for the cherry-pick merging approval from triage owners.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added do-not-merge/cherry-pick-not-approved size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/dumpling This is related to Dumpling of TiDB. do-not-merge/cherry-pick-not-approved do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants