-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Cherrypick #8657 and #8667 to v1.77.x #8690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change incorporates changes from golang/go#73560 to split reading HTTP/2 frame headers and payloads. If the frame is not a Data frame, it's read through the standard library framer as before. For Data frames, the payload is read directly into a buffer from the buffer pool to avoid copying it from the framer's buffer. ## Testing For 1 MB payloads, this results in ~4% improvement in throughput. ```sh # test command go run benchmark/benchmain/main.go -benchtime=60s -workloads=streaming \ -compression=off -maxConcurrentCalls=120 -trace=off \ -reqSizeBytes=1000000 -respSizeBytes=1000000 -networkMode=Local -resultFile="${RUN_NAME}" # comparison go run benchmark/benchresult/main.go streaming-before streaming-after Title Before After Percentage TotalOps 87536 91120 4.09% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 4074102.92 4070489.30 -0.09% Allocs/op 83.60 76.55 -8.37% ReqT/op 11671466666.67 12149333333.33 4.09% RespT/op 11671466666.67 12149333333.33 4.09% 50th-Lat 78.209875ms 75.159943ms -3.90% 90th-Lat 117.764228ms 107.8697ms -8.40% 99th-Lat 146.935704ms 139.069685ms -5.35% Avg-Lat 82.310691ms 79.073282ms -3.93% GoVersion go1.24.7 go1.24.7 GrpcVersion 1.77.0-dev 1.77.0-dev ``` For smaller payloads, the difference in minor. ```sh go run benchmark/benchmain/main.go -benchtime=60s -workloads=streaming \ -compression=off -maxConcurrentCalls=120 -trace=off \ -reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}" go run benchmark/benchresult/main.go streaming-before streaming-after Title Before After Percentage TotalOps 21490752 21477822 -0.06% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 1902.92 1902.94 0.00% Allocs/op 29.21 29.21 0.00% ReqT/op 286543360.00 286370960.00 -0.06% RespT/op 286543360.00 286370960.00 -0.06% 50th-Lat 352.505µs 352.247µs -0.07% 90th-Lat 433.446µs 434.907µs 0.34% 99th-Lat 536.445µs 539.759µs 0.62% Avg-Lat 333.403µs 333.457µs 0.02% GoVersion go1.24.7 go1.24.7 GrpcVersion 1.77.0-dev 1.77.0-dev ``` RELEASE NOTES: * transport: Avoid a buffer copy when reading data.
…c#8667) This PR removes 2 buffer copies while writing data frames to the underlying net.Conn: one [within gRPC](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/controlbuf.go#L1009-L1022) and the other [in the framer](https://cs.opensource.google/go/x/net/+/master:http2/frame.go;l=743;drc=6e243da531559f8c99439dabc7647dec07191f9b). Care is taken to avoid any extra heap allocations which can affect performance for smaller payloads. A [CL](https://go-review.git.corp.google.com/c/net/+/711620) is out for review which allows using the framer to write frame headers. This PR duplicates the header writing code as a temporary workaround. This PR will be merged only after the CL is merged. ## Results ### Small payloads Performance for small payloads increases slightly due to the reduction of a `deferred` statement. ``` $ go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \ -compression=off -maxConcurrentCalls=120 -trace=off \ -reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}" $ go run benchmark/benchresult/main.go unary-before unary-after Title Before After Percentage TotalOps 7600878 7653522 0.69% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 10007.07 10000.89 -0.07% Allocs/op 146.93 146.91 0.00% ReqT/op 101345040.00 102046960.00 0.69% RespT/op 101345040.00 102046960.00 0.69% 50th-Lat 833.724µs 830.041µs -0.44% 90th-Lat 1.281969ms 1.275336ms -0.52% 99th-Lat 2.403961ms 2.360606ms -1.80% Avg-Lat 946.123µs 939.734µs -0.68% GoVersion go1.24.8 go1.24.8 GrpcVersion 1.77.0-dev 1.77.0-dev ``` ### Large payloads Local benchmarks show a ~5-10% regression with 1 MB payloads on my dev machine. The profiles show increased time spent in the copy operation [inside the buffered writer](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/http_util.go#L334). Counterintuitively, copying the grpc header and message data into a larger buffer increased the performance by 4% (compared to master). To validate this behaviour (extra copy increasing performance) I ran [the k8s benchmark for 1MB payloads](https://github.com/grpc/grpc/blob/65c9be86830b0e423dd970c066c69a06a9240298/tools/run_tests/performance/scenario_config.py#L291-L305) and 100 concurrent streams which showed ~5% increase in QPS without the copies across multiple runs. Adding a copy reduced the performance. Load test config file: [loadtest.yaml](https://github.com/user-attachments/files/23055312/loadtest.yaml) ``` # 30 core client and server Before QPS: 498.284 (16.6095/server core) Latencies (50/90/95/99/99.9%-ile): 233256/275972/281250/291803/298533 us Server system time: 93.0164 Server user time: 142.533 Client system time: 97.2688 Client user time: 144.542 After QPS: 526.776 (17.5592/server core) Latencies (50/90/95/99/99.9%-ile): 211010/263189/270969/280656/288828 us Server system time: 96.5959 Server user time: 147.668 Client system time: 101.973 Client user time: 150.234 # 8 core client and server Before QPS: 291.049 (36.3811/server core) Latencies (50/90/95/99/99.9%-ile): 294552/685822/903554/1.48399e+06/1.50757e+06 us Server system time: 49.0355 Server user time: 87.1783 Client system time: 60.1945 Client user time: 103.633 After QPS: 334.119 (41.7649/server core) Latencies (50/90/95/99/99.9%-ile): 279395/518849/706327/1.09273e+06/1.11629e+06 us Server system time: 69.3136 Server user time: 102.549 Client system time: 80.9804 Client user time: 107.103 ``` RELEASE NOTES: * transport: Avoid two buffer copies when writing Data frames.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v1.77.x #8690 +/- ##
===========================================
+ Coverage 82.21% 83.23% +1.01%
===========================================
Files 417 417
Lines 32198 32296 +98
===========================================
+ Hits 26472 26880 +408
- Misses 4021 4037 +16
+ Partials 1705 1379 -326
🚀 New features to boost your workflow:
|
eshitachandwani
approved these changes
Nov 3, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Area: Transport
Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features.
Type: Performance
Performance improvements (CPU, network, memory, etc)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Original PRs: #8657, #8667
RELEASE NOTES: