Skip to content

session: route LOAD DATA LOCAL to the pessimistic txn provider#68317

Open
joechenrh wants to merge 3 commits into
pingcap:masterfrom
joechenrh:fix/load-data-disable-retry
Open

session: route LOAD DATA LOCAL to the pessimistic txn provider#68317
joechenrh wants to merge 3 commits into
pingcap:masterfrom
joechenrh:fix/load-data-disable-retry

Conversation

@joechenrh
Copy link
Copy Markdown
Contributor

@joechenrh joechenrh commented May 12, 2026

What problem does this PR solve?

Issue Number: close #68316

Problem Summary:

LOAD DATA LOCAL streams the file from the client as one-shot data. Once the client has sent the terminating empty packet, the MySQL protocol gives the server no way to request the file again.

decideTxnMode (pkg/session/session.go:4910) chooses the txn provider for an auto-committed statement. The fall-through path returns ast.Optimistic unless shouldUsePessimisticAutoCommit(stmt) is true, and that function ultimately delegates to isDMLStatement, which intentionally excludes LOAD DATA and IMPORT (session.go:4984):

// Only these DML statements should use pessimistic-auto-commit
// Note: LOAD DATA and IMPORT are intentionally excluded
switch actualStmt.(type) {
case *ast.InsertStmt, *ast.UpdateStmt, *ast.DeleteStmt:
    return true
default:
    return false
}

The combined effect:

TiDB build pessimistic-auto-commit default autocommit INSERT/UPDATE/DELETE autocommit LOAD DATA
Classic false optimistic optimistic
Nextgen true pessimistic optimistic (excluded by isDMLStatement)

Running on the optimistic provider exposes LOAD DATA to the session-level retry path in doCommitWithRetry -> s.retry. When commit hits a retryable error such as kv:8022 TxnLockNotFound, the retry path re-executes the statement and calls LoadDataReaderBuilder.Build a second time. The new Build asks the already-finished client to re-send the file; the read returns EOF and surfaces as:

[ERROR] [conn.go:2144] "drain connection failed in load data" [error=EOF]

masking the real commit error from the user.

What changed and how does it work?

Carve LOAD DATA LOCAL out of the optimistic-by-default rule in decideTxnMode, alongside the existing pessimistic-auto-commit carve-out:

if _, ok := stmt.(*ast.LoadDataStmt); ok {
    return ast.Pessimistic
}

Placing the carve-out in decideTxnMode rather than isDMLStatement is deliberate: it forces LOAD DATA onto the pessimistic provider regardless of pessimistic-auto-commit, so the fix covers both classic builds (where pessimistic-auto-commit=false would otherwise still route LOAD DATA to optimistic) and nextgen builds. The retry guard at doCommitWithRetry (session.go:873) short-circuits on pessimistic txns, so the original commit error reaches the client unchanged and the user can decide how to re-run the LOAD DATA.

This mirrors the design intent of the existing !s.sessionVars.BatchInsert guard at session.go:873, which was added to prevent the same class of "re-execute a non-replayable statement" hazard for the BatchInsert path.

A regression test (TestLoadDataLocalUsesPessimisticTxn) observes TxnCtx.IsPessimistic during the LOAD DATA file read. Verified fail-before / pass-after on stock master.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes

Release note

Fix LOAD DATA LOCAL INFILE silently entering the optimistic session-retry
path on retryable commit errors (e.g. kv:8022 TxnLockNotFound) and masking
the real error as "drain connection failed in load data". LOAD DATA LOCAL
now runs on the pessimistic provider, so the underlying commit error is
delivered to the client directly.

Summary by CodeRabbit

  • Bug Fixes

    • LOAD DATA LOCAL INFILE now consistently uses pessimistic transactions.
  • Tests

    • Added a test to verify LOAD DATA transaction behavior.
  • Chores

    • Updated test shard configuration to adjust parallel test shards.

Review Change Stack

LOAD DATA LOCAL streams the file from the client as one-shot data — once the
client has sent the terminating empty packet, the MySQL protocol gives the
server no way to request the file again. Today, default-configured TiDB
(tidb_txn_mode=PESSIMISTIC + autocommit=ON) routes auto-committed DML to the
optimistic provider via decideTxnMode(), which leaves LOAD DATA exposed to
the session-level optimistic retry path (doCommitWithRetry -> s.retry).

When commit hits a retryable error such as kv:8022 (TxnLockNotFound), the
retry path re-executes the statement and calls LoadDataReaderBuilder.Build a
second time. The new Build asks the client to re-send the file, but the
client has already finished sending and is waiting for the response packet.
The new read returns EOF and surfaces as

  [ERROR] [conn.go:2144] "drain connection failed in load data" [error=EOF]

masking the real commit error from the user.

Carve LOAD DATA LOCAL out of the optimistic-by-default rule in decideTxnMode
so it runs on the pessimistic provider, alongside the existing
pessimistic-auto-commit carve-out. The retry guard at doCommitWithRetry
short-circuits on pessimistic txns, so the original commit error reaches the
client unchanged and the user can decide how to re-run the LOAD DATA.

Includes a regression test that observes TxnCtx.IsPessimistic during the
LOAD DATA file read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-triage-completed labels May 12, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 12, 2026

@joechenrh I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 12, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign overvenus for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7e9ae577-60a6-478b-bb0f-99e4e3c800cd

📥 Commits

Reviewing files that changed from the base of the PR and between 4c55820 and 878ab1f.

📒 Files selected for processing (2)
  • pkg/executor/test/loaddatatest/load_data_test.go
  • pkg/session/session.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/executor/test/loaddatatest/load_data_test.go
  • pkg/session/session.go

📝 Walkthrough

Walkthrough

This PR forces autocommitted LOAD DATA to use the pessimistic transaction provider in decideTxnMode, adds a reader probe and test that observe the chosen transaction mode during LOAD DATA LOCAL INFILE, and increments the loaddatatest_test shard count to include the new test.

Changes

LOAD Data Transaction Mode Enforcement

Layer / File(s) Summary
Explicit LOAD DATA routing in transaction mode decision
pkg/session/session.go
decideTxnMode adds an explicit *ast.LoadDataStmt check that returns ast.Pessimistic, preventing autocommitted LOAD DATA from falling through to the optimistic-by-default path.
Test validation of pessimistic transaction behavior
pkg/executor/test/loaddatatest/load_data_test.go
Introduces loadDataIsPessimisticProbe, a reader wrapper that captures TxnCtx.IsPessimistic at first read, and TestLoadDataLocalUsesPessimisticTxn test that verifies LOAD DATA LOCAL INFILE uses the pessimistic provider with default session configuration.
Test shard count adjustment
pkg/executor/test/loaddatatest/BUILD.bazel
Increases loaddatatest_test shard count from 14 to 15 to support the new test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

contribution, ok-to-test

Suggested reviewers

  • lance6716
  • ekexium

Poem

🐰 I hop through code to mend the fray,
I watch the LOAD DATA find its way.
No masked EOF to cloud the view,
Pessimistic mode now sees it through.
Hooray — commit errors speak true.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: routing LOAD DATA LOCAL to the pessimistic transaction provider.
Description check ✅ Passed The description provides complete detail: problem statement, root cause analysis, implementation approach, testing, and a release note addressing the bugfix.
Linked Issues check ✅ Passed The PR addresses the issue requirements: it routes LOAD DATA to the pessimistic provider to prevent session-level retries, adds a regression test, and includes proper release notes documenting the bugfix.
Out of Scope Changes check ✅ Passed All changes are directly related to the issue: decideTxnMode modification for LOAD DATA routing, test shard count increase, and new regression test for pessimistic transaction verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 12, 2026

Hi @joechenrh. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

joechenrh and others added 2 commits May 12, 2026 16:29
Address review feedback to keep the inline comment to the essential
invariant; the full rationale lives in the PR description and the linked
issue (pingcap#68316).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per CLAUDE.local.md, unit test functions (TestXxx) should not carry
docstring comments unless explicitly requested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 12, 2026

@joechenrh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/build 878ab1f link true /test build
pull-build-next-gen 878ab1f link true /test pull-build-next-gen
pull-unit-test-next-gen 878ab1f link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/unit-test 878ab1f link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.0190%. Comparing base (8ad6083) to head (878ab1f).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68317        +/-   ##
================================================
- Coverage   77.7059%   77.0190%   -0.6870%     
================================================
  Files          1991       1973        -18     
  Lines        552094     552848       +754     
================================================
- Hits         429010     425798      -3212     
- Misses       122164     127047      +4883     
+ Partials        920          3       -917     
Flag Coverage Δ
integration 41.3290% <0.0000%> (+1.5272%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 50.0573% <ø> (-13.0338%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LOAD DATA LOCAL INFILE meets error like "drain connection failed in load data"

1 participant