Skip to content

Conversation

@zyguan
Copy link
Contributor

@zyguan zyguan commented Jan 14, 2026

This PR tries to prevent issues like #1839 by pre-encoding request.

The micro-bench results show that the overhead introduced by this PR is negligible. Besides, we can reduce the mem allocs of AttachContext after this PR.

workload threads pre-encode baseline diff
pointget 64 83120.35 83940.81 -0.98%
batchget 64 24469.9 24629.87 -0.65%
tblscan 64 41522.92 41565.02 -0.10%
idxscan 64 40363.22 40328.56 0.09%
idxlookup 64 16482.58 16497.08 -0.09%

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced error handling for batch request encoding with improved validation of request states.
  • Chores

    • Optimized batch request processing through object pooling to reduce memory allocations and improve efficiency.
    • Refined request lifecycle management with better cleanup and state tracking.
  • Tests

    • Added comprehensive test coverage for batch encoding, pooling reuse, and concurrent operations to ensure reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: zyguan <zhongyangguan@gmail.com>
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 14, 2026
Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 14, 2026
@zyguan
Copy link
Contributor Author

zyguan commented Jan 14, 2026

/cc @ekexium @cfzjywxk @AndreMouche

Copy link
Member

@AndreMouche AndreMouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

return nil
}
data := globalEncodedMsgDataPool.Get(req.Cmd.Size())
n, err := req.Cmd.MarshalTo(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that req.Cmd could being modified concurrently by other goroutines?
maybe we should add some comments for this function for example :
// encodeRequestCmd encodes req.Cmd. The caller must ensure that req is not
// being modified concurrently by other goroutines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, encodeRequestCmd is used in batch client internally. It's called just after tikvrpc.Request.ToBatchCommandsRequest() and before constructing batchCommandsEntry. Note that the req here is the output of ToBatchCommandsRequest, which is a new object of tikvpb.BatchCommandsRequest_Request. So req.Cmd can be only modified inside the batch client by reuseRequestData after the req has been sent. No data race on req.Cmd via public API (SendRequest and SendRequestAsync).

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 14, 2026

if err := encodeRequestCmd(batchReq); err != nil {
cb.Invoke(nil, err)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can early-return via cb.Invoke before entry.cb.Inject(...) is installed at L105, would it be a problem that regionRPC.End()/spanRPC.Finish() won’t run on this path or should we consider move encoding after the inject setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK, if it early-returned, there is no RPC actually, besides, this function shouldn't return error in most of cases.

var globalEncodedMsgDataPool grpc.SharedBufferPool

func init() {
globalEncodedMsgDataPool = grpc.NewSharedBufferPool()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a experimental API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and it has been moved to mem package in new version.

globalEncodedMsgDataPool = grpc.NewSharedBufferPool()
}

type encodedBatchCmd struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about implementing isBatchCommandsRequest_Request_Cmd directly to avoid embedding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, you cannot implement an unexported interface method from a different package in golang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may implement it in kvproto later.

Signed-off-by: zyguan <zhongyangguan@gmail.com>
@ti-chi-bot ti-chi-bot bot removed the approved label Jan 15, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a batch request pre-encoding mechanism with object pooling. It adds an early encoding step to the async batch request flow that converts regular command objects to pre-encoded batch command objects, which are reused from a global pool to reduce per-request memory allocations and ensure validation prior to processing.

Changes

Cohort / File(s) Summary
Batch Encoding Implementation
internal/client/client_batch.go
Introduces globalEncodedMsgDataPool for pooled pre-encoded data, new encodedBatchCmd type with MarshalTo and Size methods, encodeRequestCmd function to encode commands in place with error checking, and reuseRequestData to return data to pool. Integrates encoding into sendBatchRequest and adds error handling for encoding state violations (duplicates, pool exhaustion, size mismatches).
Async Client Integration
internal/client/client_async.go
Adds pre-processing step to encode batch requests immediately after obtaining connection pool via encodeRequestCmd, with early error return on encoding failure.
Test Coverage
internal/client/client_batch_test.go
New comprehensive test file covering encodedBatchCmd.Size and MarshalTo behavior, encoding workflows, pool reuse scenarios with proper state validation, error cases for re-encoding and size mismatches, mixed batch handling, and concurrent encoding/marshalling operations to verify thread-safety.
Test Adjustments
internal/client/client_test.go
Refactors TestBatchClientRecoverAfterServerRestart and TestFastFailWhenNoAvailableConn to instantiate fresh BatchCommandsRequest_Request objects per iteration instead of reusing a single instance.

Sequence Diagram(s)

sequenceDiagram
    participant Async as Async Client
    participant Encoder as encodeRequestCmd
    participant Pool as Encoded Pool
    participant Batch as Batch Processor

    Async->>Encoder: Convert Cmd to encodedBatchCmd
    Encoder->>Pool: Acquire pre-encoded data buffer
    Pool-->>Encoder: Return buffer (or error if exhausted)
    Encoder->>Encoder: Validate & set encoding state
    Encoder-->>Async: Return encodedBatchCmd or error
    
    alt Encoding Success
        Async->>Batch: Send batch with pre-encoded command
        Batch->>Batch: Marshal and process
        Batch->>Pool: Return encoded data (reuseRequestData)
    else Encoding Error
        Async->>Async: Invoke callback with error & return
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppity-hops through pooled commands we go,
Pre-encoded batches, faster they flow!
No waste on allocations today,
Just reuse and relay—hip hip hooray! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and clearly summarizes the main change: pre-encoding requests before sending them to the batch-send-loop, which is the primary objective of the pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01cdf70 and 4ea9cf3.

📒 Files selected for processing (4)
  • internal/client/client_async.go
  • internal/client/client_batch.go
  • internal/client/client_batch_test.go
  • internal/client/client_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/client/client_test.go (1)
tikvrpc/tikvrpc.go (1)
  • Request (242-271)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: integration-next-gen-tikv
  • GitHub Check: integration-raw-tikv (v2)
  • GitHub Check: integration-tikv
  • GitHub Check: integration-local
  • GitHub Check: integration-local-race
  • GitHub Check: race-test
  • GitHub Check: test
🔇 Additional comments (18)
internal/client/client_test.go (4)

686-689: LGTM - Fresh request per iteration aligns with new encoding lifecycle.

Creating a new BatchCommandsRequest_Request per iteration is correct. The pre-encoding flow modifies req.Cmd in place and returns the encoded data to the pool after sending, so reusing the same request object would cause data races or use-after-free issues.


697-698: Consistent with the encoding lifecycle changes.

Same pattern applied correctly in the server-down test scenario.


741-742: Consistent request instantiation after server restart.

Correctly creates fresh request objects for the post-restart verification loop.


1054-1055: Fresh request object for the "no available connections" test.

Creating a new request after marking clients for recreation ensures a clean state for this specific test scenario.

internal/client/client_batch.go (6)

70-92: Well-designed pre-encoded command wrapper.

The encodedBatchCmd type correctly implements the required interface by embedding BatchCommandsRequest_Request_Empty. The MarshalTo and Size methods handle the nil data case appropriately - returning an error on marshal and 0 for size respectively.


94-124: Thorough encoding implementation with proper error handling.

The function correctly:

  • Documents safety guarantees regarding concurrent access
  • Handles nil Cmd for unit test compatibility
  • Detects double-encoding attempts with clear error messages
  • Returns pooled buffers on all error paths (lines 116-117, 119-120)
  • Validates marshal size matches expected size

126-138: Clean pool return logic with double-free protection.

Setting cmd.data = nil after Put prevents double-return to the pool. The return count for testing purposes is a nice touch for validation.


585-586: Correct placement of deferred cleanup.

Using defer reuseRequestData(req) at the start of send ensures the pre-encoded data is always returned to the pool, regardless of how the function exits (success, error, or panic recovery in calling code).


912-914: Early encoding in sendBatchRequest prevents issues like #1839.

Encoding the request command before constructing the batchCommandsEntry ensures validation happens before the request enters the batch pipeline. This is the core fix for the PR objective.


67-68: Pool initialization is appropriate for the use case.

Using grpc.NewSharedBufferPool() at package level. This is an experimental API from grpc-go, but there is no evidence of migration to a mem package in current versions. The current implementation (v1.64.0) continues to provide this API through the main grpc package. No migration action is needed at this time.

internal/client/client_async.go (1)

85-88: Early encoding in async path aligns with synchronous flow.

The encoding step is correctly placed after getConnPool succeeds and before constructing the batchCommandsEntry. Failing early on encoding errors avoids unnecessary entry setup. As discussed in past comments, the early return via cb.Invoke before entry.cb.Inject is acceptable since no actual RPC occurs on this path.

internal/client/client_batch_test.go (7)

27-40: Good unit test for encodedBatchCmd primitives.

Covers the essential Size() and MarshalTo() methods with clear assertions.


42-73: Comprehensive encoding validation test.

The test correctly verifies that:

  1. encodeRequestCmd converts Cmd to *encodedBatchCmd
  2. Encoded and non-encoded batches produce identical marshaled output
  3. Size calculations are consistent

Note: Using the same cmd pointer for both batches (lines 48-49) is intentional - it demonstrates that encoding produces equivalent output to the original.


75-99: Tests the encode-reuse lifecycle.

Validates that after returning data to the pool, subsequent encode operations work correctly with different request types.


101-122: Important edge case: re-encoding after pool return.

This test ensures the "already been encoded and sent" error is raised when attempting to re-encode a request whose data has been returned to the pool.


124-156: Good mixed-batch coverage.

Tests that reuseRequestData only affects encoded requests and leaves normal requests unchanged. Line 154-155 correctly verifies that marshaling a released encodedBatchCmd fails.


158-181: Validates double-return protection.

Confirms that cmd.data is set to nil after the first reuse, and subsequent reuse calls return 0 (no-op).


183-221: Thorough concurrency stress test.

50 goroutines × 100 iterations provides good coverage for race conditions. The use of atomic.AddInt32 for success counting is correct.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@zyguan
Copy link
Contributor Author

zyguan commented Jan 15, 2026

/retest

Copy link
Contributor

@cfzjywxk cfzjywxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 15, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndreMouche, cfzjywxk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jan 15, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 15, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-14 23:33:17.097641761 +0000 UTC m=+18424.711598627: ☑️ agreed by AndreMouche.
  • 2026-01-15 10:21:03.157852107 +0000 UTC m=+57290.771808963: ☑️ agreed by cfzjywxk.

@ti-chi-bot ti-chi-bot bot merged commit e5f6398 into tikv:master Jan 15, 2026
17 of 19 checks passed
@zyguan
Copy link
Contributor Author

zyguan commented Jan 20, 2026

/cherry-pick tidb-8.5

ti-chi-bot pushed a commit to ti-chi-bot/client-go that referenced this pull request Jan 20, 2026
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

@zyguan: new pull request created to branch tidb-8.5: #1849.
But this PR has conflicts, please resolve them!

Details

In response to this:

/cherry-pick tidb-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@coderabbitai coderabbitai bot mentioned this pull request Jan 20, 2026
joechenrh pushed a commit to joechenrh/client-go that referenced this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants