-
Notifications
You must be signed in to change notification settings - Fork 253
upgrade grpc to v1.73 #1779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
upgrade grpc to v1.73 #1779
Conversation
|
Welcome @joechenrh! |
|
/hold |
2bcb5a5 to
9bd9fb0
Compare
|
/ok-to-test |
|
@joechenrh: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/unhold |
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
|
/run-all-tests |
|
Previously, grpc was upgraded to 1.64 and reverted (by #1369) due to errors in TiFlash regression tests. So I keep using old API I've ran a test internally with this branch. The plan ID is 7972306, and you can check the result. (Although I suspect that error is not related to grpc API change because it seems like we couldn't connect to |
|
This would fix #1789, right? |
📝 WalkthroughWalkthroughUpgrades gRPC/protobuf dependencies, adds a legacy protobuf codec, and changes client buffer-pool and batch-encoding logic to use tiered or nop buffer pools and pointer-backed encoded buffers (no exported API changes). Changes
Sequence Diagram(s)(Skipped — changes are dependency and internal buffer/codec plumbing without a new multi-component runtime control flow needing visualization.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/resourcecontrol/resource_control.go`:
- Around line 184-190: The code reads r.Data for a *tikvrpc.CopStreamResponse
even when r.Response can be nil, which can panic; update the CopStreamResponse
handling so the readBytes assignment is guarded by the same nil-check that sets
detailsV2/details (i.e., move readBytes = uint64(len(...)) inside the if
r.Response != nil block or use r.Response.Data instead of r.Data), ensuring
detailsV2, details and readBytes are only accessed when r.Response is non-nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/client/legacy_codec.go`:
- Around line 55-56: The lagecyCodec.Name() implementation currently returns an
empty string which violates gRPC codec requirements; update the lagecyCodec.Name
method to return a stable non-empty codec name (for protobuf use "proto") so
Content-Type and registration logic work correctly; locate the lagecyCodec type
and its Name() method and change the return value from "" to "proto".
🧹 Nitpick comments (3)
internal/client/legacy_codec.go (1)
23-24: Fix typo in codec naming (lagecy→legacy).
The misspelling makes searchability and future maintenance harder. Consider renaming the type/comment and updating its usage ininternal/client/conn_pool.go.internal/client/conn_pool.go (2)
80-81: Add a brief rationale for the SA1019 suppression.
A short comment (e.g., “DialContext kept due to prior regression on upgrade”) will prevent accidental cleanup later.
125-152: Avoid mutatingoptsacross loop iterations.
optsis re-assigned inside the loop and then appended again whenGrpcSharedBufferPoolis false, causing duplicate dial options to accumulate across connections. Use a per-iteration slice to keep options stable and predictable.♻️ Proposed refactor
- opts = append([]grpc.DialOption{ + dialOpts := append([]grpc.DialOption{ opt, grpc.WithInitialWindowSize(cfg.TiKVClient.GrpcInitialWindowSize), grpc.WithInitialConnWindowSize(cfg.TiKVClient.GrpcInitialConnWindowSize), grpc.WithUnaryInterceptor(unaryInterceptor), grpc.WithStreamInterceptor(streamInterceptor), grpc.WithDefaultCallOptions(callOptions...), grpc.WithConnectParams(grpc.ConnectParams{ Backoff: backoff.Config{ BaseDelay: 100 * time.Millisecond, // Default was 1s. Multiplier: 1.6, // Default Jitter: 0.2, // Default MaxDelay: 3 * time.Second, // Default was 120s. }, MinConnectTimeout: a.dialTimeout, }), grpc.WithKeepaliveParams(keepalive.ClientParameters{ Time: time.Duration(keepAlive) * time.Second, Timeout: cfg.TiKVClient.GetGrpcKeepAliveTimeout(), }), - }, opts...) + }, opts...) if !cfg.TiKVClient.GrpcSharedBufferPool { - opts = append(opts, experimental.WithBufferPool(mem.NopBufferPool{})) + dialOpts = append(dialOpts, experimental.WithBufferPool(mem.NopBufferPool{})) } conn, err := a.monitoredDial( ctx, fmt.Sprintf("%s-%d", a.target, i), addr, - opts..., + dialOpts..., )
bf57094 to
874db33
Compare
874db33 to
d98bf71
Compare
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
d98bf71 to
fcf6d0c
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
|
|
||
| // globalEncodedMsgDataPool is used to pool pre-encoded message data for batch commands. | ||
| var globalEncodedMsgDataPool = grpc.NewSharedBufferPool() | ||
| var globalEncodedMsgDataPool = mem.NewTieredBufferPool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewSharedBufferPool is replaced with NewTieredBufferPool.
- Previous size tier: 16B, 256B, 4KB, 64KB, 1MB
- New size tier: 256B, 4KB, 16KB, 32KB, 1MB
| }, opts...) | ||
| if cfg.TiKVClient.GrpcSharedBufferPool { | ||
| opts = append(opts, experimental.WithRecvBufferPool(grpc.NewSharedBufferPool())) | ||
| if !cfg.TiKVClient.GrpcSharedBufferPool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After grpc-go 1.66, the default pool was changed from noop pool to defaultBufferPool
https://github.com/grpc/grpc-go/blob/c7ec4d9ae3281bc57a8adce59b572e56965fb728/dialoptions.go#L705-L718
Here we just align with the old behavior: use noop pool by default.
| Name: connName, | ||
| } | ||
| //nolint:SA1019 | ||
| conn.ClientConn, err = grpc.DialContext(ctx, target, opts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep using the old (deprecated) API to avoid potential TiFlash problems.
Ref: pingcap/tiflash#9159
| go.uber.org/zap v1.26.0 | ||
| golang.org/x/sync v0.12.0 | ||
| google.golang.org/grpc v1.63.2 | ||
| google.golang.org/grpc v1.73.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other go.mod in examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to keep them, but we can upgrade them as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keep same is better
Why upgrade grpc
Because we are using some expermential API that was already renamed. This will block those libraries that depend on this to use newer version of grpc. (And I found someone has already want to do this: #1672)
What has be changed
WithRecvBufferPoolwithWithBufferPool.sharedBytesworks.sysbench test
The QPS/latency are almost same for both version. The memory for 1.73 is slightly reduced (4.09G->3.98G), while the gc rate increases (0.25->0.3), which is an expected result by introducing
sync.Pool.For point lookup, both version show almost no difference in QPS/lantence/memory.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.