Skip to content

Conversation

@joechenrh
Copy link

@joechenrh joechenrh commented Oct 30, 2025

Why upgrade grpc

Because we are using some expermential API that was already renamed. This will block those libraries that depend on this to use newer version of grpc. (And I found someone has already want to do this: #1672)

What has be changed

  • Change WithRecvBufferPool with WithBufferPool.
  • Use old codec in grpc 1.63 to make sharedBytes works.

sysbench test

# range scan
sysbench oltp_read_only \
  --db-driver=mysql --mysql-host=172.16.4.180 --mysql-port=32003 \
  --mysql-user=root --mysql-db=test \
  --tables=1 --table-size=100000000 \
  --threads=16 --time=1800 --report-interval=10 \
  --rand-type=uniform --range_size=4096 --range_selects=on \
  --point_selects=0 \
  --db-ps-mode=disable run

The QPS/latency are almost same for both version. The memory for 1.73 is slightly reduced (4.09G->3.98G), while the gc rate increases (0.25->0.3), which is an expected result by introducing sync.Pool.

image image
# point lookup
sysbench oltp_read_only \
  --db-driver=mysql --mysql-host=172.16.4.180 --mysql-port=32003 \
  --mysql-user=root --mysql-db=test \
  --tables=1 --table-size=100000000 \
  --threads=16 --time=1800 --report-interval=10 \
  --rand-type=uniform --range_size=4096 --range_selects=on \
  --db-ps-mode=disable run

For point lookup, both version show almost no difference in QPS/lantence/memory.

image image

Summary by CodeRabbit

  • Chores
    • Upgraded gRPC and Protocol Buffer dependencies for improved stability, performance, and compatibility.
    • Reworked internal buffer pooling to a tiered/no-op approach, reducing memory overhead and improving throughput.
    • Added a legacy Protocol Buffer codec to preserve compatibility with older gRPC/protobuf usage and ease migrations.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels Oct 30, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 30, 2025

Welcome @joechenrh!

It looks like this is your first PR to tikv/client-go 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to tikv/client-go. 😃

@ti-chi-bot ti-chi-bot bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 30, 2025
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@joechenrh
Copy link
Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2025
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 31, 2025
@joechenrh joechenrh changed the title upgrade grpc to v1.66 upgrade grpc to v1.73 Nov 4, 2025
@joechenrh
Copy link
Author

/ok-to-test

@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 4, 2025

@joechenrh: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@joechenrh
Copy link
Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2025
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@joechenrh
Copy link
Author

/run-all-tests

@joechenrh
Copy link
Author

joechenrh commented Nov 4, 2025

Previously, grpc was upgraded to 1.64 and reverted (by #1369) due to errors in TiFlash regression tests. So I keep using old API DialContext here.

I've ran a test internally with this branch. The plan ID is 7972306, and you can check the result.


rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded
rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.200.86.26:3930: i/o timeout

(Although I suspect that error is not related to grpc API change because it seems like we couldn't connect to 10.200.86.26:3930. That is, the first mistake was the result, and the second mistake was the cause, as resolver didn't provide any address to LB.)

@joechenrh joechenrh mentioned this pull request Nov 4, 2025
13 tasks
@kennytm kennytm mentioned this pull request Nov 13, 2025
@dveeden
Copy link
Contributor

dveeden commented Nov 23, 2025

This would fix #1789, right?

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Upgrades gRPC/protobuf dependencies, adds a legacy protobuf codec, and changes client buffer-pool and batch-encoding logic to use tiered or nop buffer pools and pointer-backed encoded buffers (no exported API changes).

Changes

Cohort / File(s) Summary
Dependency Upgrades
go.mod
Bumped google.golang.org/grpc v1.63.2 → v1.73.0; updated indirect google.golang.org/genproto/... and google.golang.org/protobuf versions.
Connection pool / Dial
internal/client/conn_pool.go
Added import google.golang.org/grpc/mem; added //nolint:SA1019 on grpc.DialContext; when shared buffer pool is disabled, inject legacyCodec{} via ForceCodec and use mem.NopBufferPool{} to bypass shared recv buffer pool.
Batch encoding & pooling
internal/client/client_batch.go
Replaced global shared buffer pool with mem.NewTieredBufferPool(...); encodeRequestCmd marshals into pointer-backed slice (*data), adjusted Put/Get to use pointer-held slices, updated length checks and assignment so req.Cmd references pointer-held encoded data.
Legacy codec
internal/client/legacy_codec.go
New unexported legacyCodec implementing Marshal, Unmarshal, Name, and messageV2Of to adapt proto v1→v2 messages for pre-1.66 grpc-go behavior.

Sequence Diagram(s)

(Skipped — changes are dependency and internal buffer/codec plumbing without a new multi-component runtime control flow needing visualization.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, size/XL

Suggested reviewers

  • ekexium
  • cfzjywxk

Poem

🐰 I hopped through modules, buffers in tow,
Legacy codec hummed where old bytes go,
Tiered pools stacked, nop ones stood by,
Pointer slices danced and marshaled high,
🥕 A little hop — the code runs so!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'upgrade grpc to v1.73' directly and specifically describes the main change in the pull request, which is updating the gRPC dependency from v1.63.2 to v1.73.0 as shown in go.mod.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/resourcecontrol/resource_control.go`:
- Around line 184-190: The code reads r.Data for a *tikvrpc.CopStreamResponse
even when r.Response can be nil, which can panic; update the CopStreamResponse
handling so the readBytes assignment is guarded by the same nil-check that sets
detailsV2/details (i.e., move readBytes = uint64(len(...)) inside the if
r.Response != nil block or use r.Response.Data instead of r.Data), ensuring
detailsV2, details and readBytes are only accessed when r.Response is non-nil.

@ti-chi-bot ti-chi-bot bot added dco-signoff: no Indicates the PR's author has not signed dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed dco-signoff: yes Indicates the PR's author has signed the dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 21, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@internal/client/legacy_codec.go`:
- Around line 55-56: The lagecyCodec.Name() implementation currently returns an
empty string which violates gRPC codec requirements; update the lagecyCodec.Name
method to return a stable non-empty codec name (for protobuf use "proto") so
Content-Type and registration logic work correctly; locate the lagecyCodec type
and its Name() method and change the return value from "" to "proto".
🧹 Nitpick comments (3)
internal/client/legacy_codec.go (1)

23-24: Fix typo in codec naming (lagecylegacy).
The misspelling makes searchability and future maintenance harder. Consider renaming the type/comment and updating its usage in internal/client/conn_pool.go.

internal/client/conn_pool.go (2)

80-81: Add a brief rationale for the SA1019 suppression.
A short comment (e.g., “DialContext kept due to prior regression on upgrade”) will prevent accidental cleanup later.


125-152: Avoid mutating opts across loop iterations.
opts is re-assigned inside the loop and then appended again when GrpcSharedBufferPool is false, causing duplicate dial options to accumulate across connections. Use a per-iteration slice to keep options stable and predictable.

♻️ Proposed refactor
-		opts = append([]grpc.DialOption{
+		dialOpts := append([]grpc.DialOption{
 			opt,
 			grpc.WithInitialWindowSize(cfg.TiKVClient.GrpcInitialWindowSize),
 			grpc.WithInitialConnWindowSize(cfg.TiKVClient.GrpcInitialConnWindowSize),
 			grpc.WithUnaryInterceptor(unaryInterceptor),
 			grpc.WithStreamInterceptor(streamInterceptor),
 			grpc.WithDefaultCallOptions(callOptions...),
 			grpc.WithConnectParams(grpc.ConnectParams{
 				Backoff: backoff.Config{
 					BaseDelay:  100 * time.Millisecond, // Default was 1s.
 					Multiplier: 1.6,                    // Default
 					Jitter:     0.2,                    // Default
 					MaxDelay:   3 * time.Second,        // Default was 120s.
 				},
 				MinConnectTimeout: a.dialTimeout,
 			}),
 			grpc.WithKeepaliveParams(keepalive.ClientParameters{
 				Time:    time.Duration(keepAlive) * time.Second,
 				Timeout: cfg.TiKVClient.GetGrpcKeepAliveTimeout(),
 			}),
-		}, opts...)
+		}, opts...)
 		if !cfg.TiKVClient.GrpcSharedBufferPool {
-			opts = append(opts, experimental.WithBufferPool(mem.NopBufferPool{}))
+			dialOpts = append(dialOpts, experimental.WithBufferPool(mem.NopBufferPool{}))
 		}
 		conn, err := a.monitoredDial(
 			ctx,
 			fmt.Sprintf("%s-%d", a.target, i),
 			addr,
-			opts...,
+			dialOpts...,
 		)

@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed dco-signoff: no Indicates the PR's author has not signed dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 21, 2026
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ekexium for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 21, 2026
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>

// globalEncodedMsgDataPool is used to pool pre-encoded message data for batch commands.
var globalEncodedMsgDataPool = grpc.NewSharedBufferPool()
var globalEncodedMsgDataPool = mem.NewTieredBufferPool(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewSharedBufferPool is replaced with NewTieredBufferPool.

  • Previous size tier: 16B, 256B, 4KB, 64KB, 1MB
  • New size tier: 256B, 4KB, 16KB, 32KB, 1MB

}, opts...)
if cfg.TiKVClient.GrpcSharedBufferPool {
opts = append(opts, experimental.WithRecvBufferPool(grpc.NewSharedBufferPool()))
if !cfg.TiKVClient.GrpcSharedBufferPool {
Copy link
Author

@joechenrh joechenrh Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After grpc-go 1.66, the default pool was changed from noop pool to defaultBufferPool
https://github.com/grpc/grpc-go/blob/c7ec4d9ae3281bc57a8adce59b572e56965fb728/dialoptions.go#L705-L718
Here we just align with the old behavior: use noop pool by default.

Name: connName,
}
//nolint:SA1019
conn.ClientConn, err = grpc.DialContext(ctx, target, opts...)
Copy link
Author

@joechenrh joechenrh Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep using the old (deprecated) API to avoid potential TiFlash problems.
Ref: pingcap/tiflash#9159

go.uber.org/zap v1.26.0
golang.org/x/sync v0.12.0
google.golang.org/grpc v1.63.2
google.golang.org/grpc v1.73.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other go.mod in examples?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to keep them, but we can upgrade them as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keep same is better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants