You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
transport: Remove buffer copies while writing HTTP/2 Data frames (#8667)
This PR removes 2 buffer copies while writing data frames to the
underlying net.Conn: one [within
gRPC](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/controlbuf.go#L1009-L1022)
and the other [in the
framer](https://cs.opensource.google/go/x/net/+/master:http2/frame.go;l=743;drc=6e243da531559f8c99439dabc7647dec07191f9b).
Care is taken to avoid any extra heap allocations which can affect
performance for smaller payloads.
A [CL](https://go-review.git.corp.google.com/c/net/+/711620) is out for
review which allows using the framer to write frame headers. This PR
duplicates the header writing code as a temporary workaround. This PR
will be merged only after the CL is merged.
## Results
### Small payloads
Performance for small payloads increases slightly due to the reduction
of a `deferred` statement.
```
$ go run benchmark/benchmain/main.go -benchtime=60s -workloads=unary \
-compression=off -maxConcurrentCalls=120 -trace=off \
-reqSizeBytes=100 -respSizeBytes=100 -networkMode=Local -resultFile="${RUN_NAME}"
$ go run benchmark/benchresult/main.go unary-before unary-after
Title Before After Percentage
TotalOps 7600878 7653522 0.69%
SendOps 0 0 NaN%
RecvOps 0 0 NaN%
Bytes/op 10007.07 10000.89 -0.07%
Allocs/op 146.93 146.91 0.00%
ReqT/op 101345040.00 102046960.00 0.69%
RespT/op 101345040.00 102046960.00 0.69%
50th-Lat 833.724µs 830.041µs -0.44%
90th-Lat 1.281969ms 1.275336ms -0.52%
99th-Lat 2.403961ms 2.360606ms -1.80%
Avg-Lat 946.123µs 939.734µs -0.68%
GoVersion go1.24.8 go1.24.8
GrpcVersion 1.77.0-dev 1.77.0-dev
```
### Large payloads
Local benchmarks show a ~5-10% regression with 1 MB payloads on my dev
machine. The profiles show increased time spent in the copy operation
[inside the buffered
writer](https://github.com/grpc/grpc-go/blob/58d4b2b1492dbcfdf26daa7ed93830ebb871faf1/internal/transport/http_util.go#L334).
Counterintuitively, copying the grpc header and message data into a
larger buffer increased the performance by 4% (compared to master).
To validate this behaviour (extra copy increasing performance) I ran
[the k8s benchmark for 1MB
payloads](https://github.com/grpc/grpc/blob/65c9be86830b0e423dd970c066c69a06a9240298/tools/run_tests/performance/scenario_config.py#L291-L305)
and 100 concurrent streams which showed ~5% increase in QPS without the
copies across multiple runs. Adding a copy reduced the performance.
Load test config file:
[loadtest.yaml](https://github.com/user-attachments/files/23055312/loadtest.yaml)
```
# 30 core client and server
Before
QPS: 498.284 (16.6095/server core)
Latencies (50/90/95/99/99.9%-ile): 233256/275972/281250/291803/298533 us
Server system time: 93.0164
Server user time: 142.533
Client system time: 97.2688
Client user time: 144.542
After
QPS: 526.776 (17.5592/server core)
Latencies (50/90/95/99/99.9%-ile): 211010/263189/270969/280656/288828 us
Server system time: 96.5959
Server user time: 147.668
Client system time: 101.973
Client user time: 150.234
# 8 core client and server
Before
QPS: 291.049 (36.3811/server core)
Latencies (50/90/95/99/99.9%-ile): 294552/685822/903554/1.48399e+06/1.50757e+06 us
Server system time: 49.0355
Server user time: 87.1783
Client system time: 60.1945
Client user time: 103.633
After
QPS: 334.119 (41.7649/server core)
Latencies (50/90/95/99/99.9%-ile): 279395/518849/706327/1.09273e+06/1.11629e+06 us
Server system time: 69.3136
Server user time: 102.549
Client system time: 80.9804
Client user time: 107.103
```
RELEASE NOTES:
* transport: Avoid two buffer copies when writing Data frames.
0 commit comments