client: pre-encode request before sending to batch-send-loop #1841

zyguan · 2026-01-14T09:18:45Z

This PR tries to prevent issues like #1839 by pre-encoding request.

The micro-bench results show that the overhead introduced by this PR is negligible. Besides, we can reduce the mem allocs of AttachContext after this PR.

workload	threads	pre-encode	baseline	diff
pointget	64	83120.35	83940.81	-0.98%
batchget	64	24469.9	24629.87	-0.65%
tblscan	64	41522.92	41565.02	-0.10%
idxscan	64	40363.22	40328.56	0.09%
idxlookup	64	16482.58	16497.08	-0.09%

Summary by CodeRabbit

Bug Fixes
- Enhanced error handling for batch request encoding with improved validation of request states.
Chores
- Optimized batch request processing through object pooling to reduce memory allocations and improve efficiency.
- Refined request lifecycle management with better cleanup and state tracking.
Tests
- Added comprehensive test coverage for batch encoding, pooling reuse, and concurrent operations to ensure reliability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: zyguan <zhongyangguan@gmail.com>

zyguan · 2026-01-14T15:03:10Z

/cc @ekexium @cfzjywxk @AndreMouche

AndreMouche

Rest LGTM

AndreMouche · 2026-01-14T23:08:52Z

internal/client/client_batch.go

+		return nil
+	}
+	data := globalEncodedMsgDataPool.Get(req.Cmd.Size())
+	n, err := req.Cmd.MarshalTo(data)


is it possible that req.Cmd could being modified concurrently by other goroutines?
maybe we should add some comments for this function for example :
// encodeRequestCmd encodes req.Cmd. The caller must ensure that req is not
// being modified concurrently by other goroutines.

No, encodeRequestCmd is used in batch client internally. It's called just after tikvrpc.Request.ToBatchCommandsRequest() and before constructing batchCommandsEntry. Note that the req here is the output of ToBatchCommandsRequest, which is a new object of tikvpb.BatchCommandsRequest_Request. So req.Cmd can be only modified inside the batch client by reuseRequestData after the req has been sent. No data race on req.Cmd via public API (SendRequest and SendRequestAsync).

cfzjywxk · 2026-01-15T04:21:21Z

internal/client/client_async.go


+	if err := encodeRequestCmd(batchReq); err != nil {
+		cb.Invoke(nil, err)
+		return


It can early-return via cb.Invoke before entry.cb.Inject(...) is installed at L105, would it be a problem that regionRPC.End()/spanRPC.Finish() won’t run on this path or should we consider move encoding after the inject setup?

I think it's OK, if it early-returned, there is no RPC actually, besides, this function shouldn't return error in most of cases.

cfzjywxk · 2026-01-15T04:22:26Z

internal/client/client_batch.go

+var globalEncodedMsgDataPool grpc.SharedBufferPool
+
+func init() {
+	globalEncodedMsgDataPool = grpc.NewSharedBufferPool()


Is this a experimental API?

Yes, and it has been moved to mem package in new version.

internal/client/client_batch.go

cfzjywxk · 2026-01-15T04:24:55Z

internal/client/client_batch.go

+	globalEncodedMsgDataPool = grpc.NewSharedBufferPool()
+}
+
+type encodedBatchCmd struct {


How about implementing isBatchCommandsRequest_Request_Cmd directly to avoid embedding.

Unfortunately, you cannot implement an unexported interface method from a different package in golang.

We may implement it in kvproto later.

internal/client/client_batch.go

internal/client/client_batch_test.go

Signed-off-by: zyguan <zhongyangguan@gmail.com>

coderabbitai · 2026-01-15T06:24:34Z

📝 Walkthrough

Walkthrough

This pull request introduces a batch request pre-encoding mechanism with object pooling. It adds an early encoding step to the async batch request flow that converts regular command objects to pre-encoded batch command objects, which are reused from a global pool to reduce per-request memory allocations and ensure validation prior to processing.

Changes

Cohort / File(s)	Summary
Batch Encoding Implementation `internal/client/client_batch.go`	Introduces `globalEncodedMsgDataPool` for pooled pre-encoded data, new `encodedBatchCmd` type with `MarshalTo` and `Size` methods, `encodeRequestCmd` function to encode commands in place with error checking, and `reuseRequestData` to return data to pool. Integrates encoding into `sendBatchRequest` and adds error handling for encoding state violations (duplicates, pool exhaustion, size mismatches).
Async Client Integration `internal/client/client_async.go`	Adds pre-processing step to encode batch requests immediately after obtaining connection pool via `encodeRequestCmd`, with early error return on encoding failure.
Test Coverage `internal/client/client_batch_test.go`	New comprehensive test file covering `encodedBatchCmd.Size` and `MarshalTo` behavior, encoding workflows, pool reuse scenarios with proper state validation, error cases for re-encoding and size mismatches, mixed batch handling, and concurrent encoding/marshalling operations to verify thread-safety.
Test Adjustments `internal/client/client_test.go`	Refactors `TestBatchClientRecoverAfterServerRestart` and `TestFastFailWhenNoAvailableConn` to instantiate fresh `BatchCommandsRequest_Request` objects per iteration instead of reusing a single instance.

Sequence Diagram(s)

sequenceDiagram
    participant Async as Async Client
    participant Encoder as encodeRequestCmd
    participant Pool as Encoded Pool
    participant Batch as Batch Processor

    Async->>Encoder: Convert Cmd to encodedBatchCmd
    Encoder->>Pool: Acquire pre-encoded data buffer
    Pool-->>Encoder: Return buffer (or error if exhausted)
    Encoder->>Encoder: Validate & set encoding state
    Encoder-->>Async: Return encodedBatchCmd or error
    
    alt Encoding Success
        Async->>Batch: Send batch with pre-encoded command
        Batch->>Batch: Marshal and process
        Batch->>Pool: Return encoded data (reuseRequestData)
    else Encoding Error
        Async->>Async: Invoke callback with error & return
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppity-hops through pooled commands we go,
Pre-encoded batches, faster they flow!
No waste on allocations today,
Just reuse and relay—hip hip hooray! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and clearly summarizes the main change: pre-encoding requests before sending them to the batch-send-loop, which is the primary objective of the pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01cdf70 and 4ea9cf3.

📒 Files selected for processing (4)

internal/client/client_async.go
internal/client/client_batch.go
internal/client/client_batch_test.go
internal/client/client_test.go

🧰 Additional context used

🧬 Code graph analysis (1)

internal/client/client_test.go (1)

tikvrpc/tikvrpc.go (1)

Request (242-271)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: integration-next-gen-tikv
GitHub Check: integration-raw-tikv (v2)
GitHub Check: integration-tikv
GitHub Check: integration-local
GitHub Check: integration-local-race
GitHub Check: race-test
GitHub Check: test

🔇 Additional comments (18)

internal/client/client_test.go (4)

686-689: LGTM - Fresh request per iteration aligns with new encoding lifecycle.

Creating a new BatchCommandsRequest_Request per iteration is correct. The pre-encoding flow modifies req.Cmd in place and returns the encoded data to the pool after sending, so reusing the same request object would cause data races or use-after-free issues.

697-698: Consistent with the encoding lifecycle changes.

Same pattern applied correctly in the server-down test scenario.

741-742: Consistent request instantiation after server restart.

Correctly creates fresh request objects for the post-restart verification loop.

1054-1055: Fresh request object for the "no available connections" test.

Creating a new request after marking clients for recreation ensures a clean state for this specific test scenario.

internal/client/client_batch.go (6)

70-92: Well-designed pre-encoded command wrapper.

The encodedBatchCmd type correctly implements the required interface by embedding BatchCommandsRequest_Request_Empty. The MarshalTo and Size methods handle the nil data case appropriately - returning an error on marshal and 0 for size respectively.

94-124: Thorough encoding implementation with proper error handling.

The function correctly:

Documents safety guarantees regarding concurrent access

Handles nil Cmd for unit test compatibility

Detects double-encoding attempts with clear error messages

Returns pooled buffers on all error paths (lines 116-117, 119-120)

Validates marshal size matches expected size

126-138: Clean pool return logic with double-free protection.

Setting cmd.data = nil after Put prevents double-return to the pool. The return count for testing purposes is a nice touch for validation.

585-586: Correct placement of deferred cleanup.

Using defer reuseRequestData(req) at the start of send ensures the pre-encoded data is always returned to the pool, regardless of how the function exits (success, error, or panic recovery in calling code).

912-914: Early encoding in sendBatchRequest prevents issues like #1839.

Encoding the request command before constructing the batchCommandsEntry ensures validation happens before the request enters the batch pipeline. This is the core fix for the PR objective.

67-68: Pool initialization is appropriate for the use case.

Using grpc.NewSharedBufferPool() at package level. This is an experimental API from grpc-go, but there is no evidence of migration to a mem package in current versions. The current implementation (v1.64.0) continues to provide this API through the main grpc package. No migration action is needed at this time.

internal/client/client_async.go (1)

85-88: Early encoding in async path aligns with synchronous flow.

The encoding step is correctly placed after getConnPool succeeds and before constructing the batchCommandsEntry. Failing early on encoding errors avoids unnecessary entry setup. As discussed in past comments, the early return via cb.Invoke before entry.cb.Inject is acceptable since no actual RPC occurs on this path.

internal/client/client_batch_test.go (7)

27-40: Good unit test for encodedBatchCmd primitives.

Covers the essential Size() and MarshalTo() methods with clear assertions.

42-73: Comprehensive encoding validation test.

The test correctly verifies that:

encodeRequestCmd converts Cmd to *encodedBatchCmd

Encoded and non-encoded batches produce identical marshaled output

Size calculations are consistent

Note: Using the same cmd pointer for both batches (lines 48-49) is intentional - it demonstrates that encoding produces equivalent output to the original.

75-99: Tests the encode-reuse lifecycle.

Validates that after returning data to the pool, subsequent encode operations work correctly with different request types.

101-122: Important edge case: re-encoding after pool return.

This test ensures the "already been encoded and sent" error is raised when attempting to re-encode a request whose data has been returned to the pool.

124-156: Good mixed-batch coverage.

Tests that reuseRequestData only affects encoded requests and leaves normal requests unchanged. Line 154-155 correctly verifies that marshaling a released encodedBatchCmd fails.

158-181: Validates double-return protection.

Confirms that cmd.data is set to nil after the first reuse, and subsequent reuse calls return 0 (no-op).

183-221: Thorough concurrency stress test.

50 goroutines × 100 iterations provides good coverage for race conditions. The use of atomic.AddInt32 for success counting is correct.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

zyguan · 2026-01-15T06:39:44Z

/retest

cfzjywxk

LGTM

ti-chi-bot · 2026-01-15T10:21:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndreMouche, cfzjywxk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~internal/client/OWNERS~~ [cfzjywxk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-01-15T10:21:04Z

[LGTM Timeline notifier]

Timeline:

2026-01-14 23:33:17.097641761 +0000 UTC m=+18424.711598627: ☑️ agreed by AndreMouche.
2026-01-15 10:21:03.157852107 +0000 UTC m=+57290.771808963: ☑️ agreed by cfzjywxk.

zyguan · 2026-01-20T01:40:27Z

/cherry-pick tidb-8.5

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2026-01-20T01:41:07Z

@zyguan: new pull request created to branch tidb-8.5: #1849.
But this PR has conflicts, please resolve them!

Details

In response to this:

/cherry-pick tidb-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Signed-off-by: zyguan <zhongyangguan@gmail.com>

client: pre-encode request before sending to batch-send-loop

7bc25bd

Signed-off-by: zyguan <zhongyangguan@gmail.com>

ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 14, 2026

zyguan added 2 commits January 14, 2026 09:34

fix nil pointer issue in tests

0d8c880

Signed-off-by: zyguan <zhongyangguan@gmail.com>

fix data race in unit tests

5d8c8dc

Signed-off-by: zyguan <zhongyangguan@gmail.com>

ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 14, 2026

ti-chi-bot bot requested review from AndreMouche, cfzjywxk and ekexium January 14, 2026 15:03

AndreMouche approved these changes Jan 14, 2026

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 14, 2026

cfzjywxk reviewed Jan 15, 2026

View reviewed changes

address comments

4ea9cf3

Signed-off-by: zyguan <zhongyangguan@gmail.com>

ti-chi-bot bot removed the approved label Jan 15, 2026

cfzjywxk approved these changes Jan 15, 2026

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 15, 2026

ti-chi-bot bot added the approved label Jan 15, 2026

ti-chi-bot bot merged commit e5f6398 into tikv:master Jan 15, 2026
17 of 19 checks passed

ti-chi-bot pushed a commit to ti-chi-bot/client-go that referenced this pull request Jan 20, 2026

This is an automated cherry-pick of tikv#1841

b0bcc2b

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Jan 20, 2026

client: pre-encode request before sending to batch-send-loop (#1841) #1849

Open

coderabbitai bot mentioned this pull request Jan 20, 2026

upgrade grpc to v1.73 #1779

Open

joechenrh pushed a commit to joechenrh/client-go that referenced this pull request Jan 21, 2026

client: pre-encode request before sending to batch-send-loop (tikv#1841)

8c06c5c

Signed-off-by: zyguan <zhongyangguan@gmail.com>

client: pre-encode request before sending to batch-send-loop #1841

client: pre-encode request before sending to batch-send-loop #1841

Uh oh!

Conversation

zyguan commented Jan 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

zyguan commented Jan 14, 2026

Uh oh!

AndreMouche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

zyguan commented Jan 15, 2026

Uh oh!

cfzjywxk left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Jan 15, 2026

Uh oh!

ti-chi-bot bot commented Jan 15, 2026

[LGTM Timeline notifier]

Uh oh!

Uh oh!

zyguan commented Jan 20, 2026

Uh oh!

ti-chi-bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zyguan commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading