Skip to content

Optimize recvAndDecompress when receiving compressed payload #7786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

alanprot
Copy link

@alanprot alanprot commented Oct 26, 2024

The PR #7356 in grpc-go introduced a new Codec that uses mem.BufferSlice instead of []byte and also began pooling the underlying []byte slices by default.

With this update, the DecompressedSize method is no longer called for codecs that implement it. Previously, this method was used to determine the size of compressed content, allowing us to allocate the destination []byte with the correct size to avoid oversized slices or reallocation. The reasoning behind the change is that, since the slices are now pooled, the "decompress" method would use multiple smaller slices rather than a single large one, thereby reducing overhead due overallocation/reallocation. See this comment.

The problem is that the method io.Copy allocates a temporary32kb byte slice internally on each call: See

https://github.com/golang/go/blob/889abb17e125bb0f5d8de61bb80ef15fbe2a130d/src/io/io.go#L387-L389

And

https://github.com/golang/go/blob/889abb17e125bb0f5d8de61bb80ef15fbe2a130d/src/io/io.go#L417-L427

This PR now uses uses the io.CopyBuffer method instead of io.Copy, allowing us to pass a []byte used internally directly as a parameter. This buffer now is also being pulled using the new mem package. This change also reintroduces the call to the DecompressedSize method when available or assume the same 32kb to request the the buffer from the pool.

From the doc

CopyBuffer is identical to Copy except that it stages through the provided buffer (if one is required) rather than allocating a temporary one. If buf is nil, one is allocated; otherwise if it has zero length, CopyBuffer panics.

I create a benchmark with zip (which implements theDecompressedSize method) and a noop compression that does not implement the DecompressedSize method.

goos: linux
goarch: amd64
pkg: google.golang.org/grpc
cpu: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
                                              │  /tmp/old   │              /tmp/new              │
                                              │   sec/op    │   sec/op     vs base               │
RPCCompressor/comp=gzip,payloadSize=1024-16     342.0µ ± 3%   295.0µ ± 2%  -13.75% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=10240-16    440.1µ ± 2%   393.9µ ± 2%  -10.49% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=512000-16   5.111m ± 1%   5.179m ± 2%   +1.33% (p=0.004 n=6)
RPCCompressor/comp=noop,payloadSize=1024-16     166.9µ ± 1%   135.3µ ± 1%  -18.93% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=10240-16    183.5µ ± 1%   152.5µ ± 3%  -16.92% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=512000-16   977.7µ ± 2%   941.9µ ± 1%   -3.66% (p=0.002 n=6)
geomean                                         533.4µ        476.4µ       -10.69%

                                              │    /tmp/old    │               /tmp/new               │
                                              │      B/op      │     B/op	vs base               │
RPCCompressor/comp=gzip,payloadSize=1024-16     164.95Ki ± 11%   32.22Ki ±  9%  -80.47% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=10240-16    203.66Ki ±  6%   55.36Ki ± 17%  -72.82% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=512000-16    1.193Mi ± 14%   1.075Mi ±  7%   -9.89% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=1024-16      78.94Ki ±  0%   14.91Ki ±  1%  -81.11% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=10240-16     89.97Ki ±  0%   25.14Ki ±  1%  -72.06% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=512000-16   1032.8Ki ±  5%   939.1Ki ±  4%   -9.07% (p=0.002 n=6)
geomean                                          258.9Ki         94.02Ki        -63.68%

                                              │  /tmp/old  │             /tmp/new             │
                                              │ allocs/op  │ allocs/op   vs base              │
RPCCompressor/comp=gzip,payloadSize=1024-16     257.0 ± 0%   258.0 ± 0%       ~ (p=0.061 n=6)
RPCCompressor/comp=gzip,payloadSize=10240-16    259.0 ± 0%   258.5 ± 0%       ~ (p=0.182 n=6)
RPCCompressor/comp=gzip,payloadSize=512000-16   305.0 ± 2%   308.0 ± 0%       ~ (p=0.063 n=6)
RPCCompressor/comp=noop,payloadSize=1024-16     234.0 ± 0%   231.0 ± 0%  -1.28% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=10240-16    235.0 ± 0%   231.0 ± 0%  -1.70% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=512000-16   308.0 ± 2%   308.5 ± 3%       ~ (p=0.671 n=6)
geomean                                         264.7        263.9       -0.28%

@alanprot alanprot force-pushed the opmize-recvAndDecompress branch from b115b69 to ee988fd Compare October 26, 2024 03:16
Copy link

codecov bot commented Oct 26, 2024

Codecov Report

Attention: Patch coverage is 81.25000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 81.42%. Comparing base (cb32937) to head (8a8fe07).
Report is 37 commits behind head on master.

Files with missing lines Patch % Lines
rpc_util.go 81.25% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7786      +/-   ##
==========================================
+ Coverage   81.39%   81.42%   +0.03%     
==========================================
  Files         368      368              
  Lines       36799    36814      +15     
==========================================
+ Hits        29952    29977      +25     
+ Misses       5615     5606       -9     
+ Partials     1232     1231       -1     
Files with missing lines Coverage Δ
rpc_util.go 79.55% <81.25%> (+0.01%) ⬆️

... and 20 files with indirect coverage changes

---- 🚨 Try these New Features:

@alanprot
Copy link
Author

alanprot commented Oct 26, 2024

@PapaCharlie @dfawley When you get a chance, can you take a peek on this?

Maybe indeed it does not make sense to call DecompressedSize as to do so i need to materialize and create an slice. Maybe i should just use the buffer from the pool on io.CopyBuffer always with 32k and it will be already good enough.

@alanprot alanprot force-pushed the opmize-recvAndDecompress branch from ee988fd to 5e127c4 Compare October 26, 2024 03:53
…g compression

Signed-off-by: alanprot <alanprot@gmail.com>
@alanprot alanprot force-pushed the opmize-recvAndDecompress branch from 5e127c4 to 8a8fe07 Compare October 26, 2024 04:00
@alanprot alanprot changed the title Optimize and fix regression on recvAndDecompress when receiving compressed payload Optimize recvAndDecompress when receiving compressed payload Oct 26, 2024
@aranjans aranjans added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Benchmarking Includes our benchmarking utility. Area: RPC Features Includes Compression, Encoding, Attributes/Metadata, Interceptors. and removed Type: Performance Performance improvements (CPU, network, memory, etc) Area: Benchmarking Includes our benchmarking utility. labels Oct 26, 2024
@aranjans aranjans added this to the 1.69 Release milestone Oct 27, 2024
@easwars easwars self-assigned this Oct 29, 2024
@easwars
Copy link
Contributor

easwars commented Oct 29, 2024

We have another ongoing PR which addresses this issue. Please see: #7653

@easwars
Copy link
Contributor

easwars commented Oct 29, 2024

I think we can close this one in favor of #7653.

@alanprot Please let us know if you think otherwise.

@PapaCharlie
Copy link
Contributor

@alanprot Can you try replacing the io.ReadAll with mem.ReadAll (from #7653) in decompress and running your benchmark again? I'm interested in seeing the results. Also your benchmark results are interesting:

                                              │  /tmp/old   │              /tmp/new              │
                                              │   sec/op    │   sec/op     vs base               │
RPCCompressor/comp=gzip,payloadSize=1024-16     342.0µ ± 3%   295.0µ ± 2%  -13.75% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=10240-16    440.1µ ± 2%   393.9µ ± 2%  -10.49% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=512000-16   5.111m ± 1%   5.179m ± 2%   +1.33% (p=0.004 n=6)
RPCCompressor/comp=noop,payloadSize=1024-16     166.9µ ± 1%   135.3µ ± 1%  -18.93% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=10240-16    183.5µ ± 1%   152.5µ ± 3%  -16.92% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=512000-16   977.7µ ± 2%   941.9µ ± 1%   -3.66% (p=0.002 n=6)
geomean                                         533.4µ        476.4µ       -10.69%

I would have expected to see no difference between old and new noop tests, but I guess because you're using a buffer from the pool when calling io.CopyBuffer instead of letting io.Copy create a new one we're seeing some noticeable gains? Nice

Copy link

github-actions bot commented Nov 5, 2024

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@alanprot
Copy link
Author

alanprot commented Nov 8, 2024

@PapaCharlie
This are the results with the mem.ReadAll from that PR. Seems that indeed that PR fix the issue as well..

@easwars yeah.. i think we can close this one in favor of that one. IDK if we wanna use the benchmark tests from here though.

goos: darwin
goarch: arm64
pkg: google.golang.org/grpc
cpu: Apple M1 Pro
                                              │   /tmp/old   │              /tmp/new              │
                                              │    sec/op    │   sec/op     vs base               │
RPCCompressor/comp=gzip,payloadSize=1024-10     156.5µ ±  2%   146.4µ ± 4%   -6.44% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=10240-10    189.2µ ±  7%   172.2µ ± 3%   -8.99% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=512000-10   1.575m ± 10%   1.540m ± 0%   -2.24% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=1024-10     91.15µ ±  4%   73.58µ ± 0%  -19.27% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=10240-10    97.73µ ±  3%   80.58µ ± 3%  -17.55% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=512000-10   457.6µ ±  2%   463.3µ ± 2%   +1.25% (p=0.041 n=6)
geomean                                         239.8µ         217.8µ        -9.18%

                                              │   /tmp/old    │               /tmp/new               │
                                              │     B/op      │     B/op       vs base               │
RPCCompressor/comp=gzip,payloadSize=1024-10     134.85Ki ± 7%   24.37Ki ±  9%  -81.93% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=10240-10    150.12Ki ± 7%   37.74Ki ± 13%  -74.86% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=512000-10   1144.9Ki ± 5%   987.6Ki ±  6%  -13.74% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=1024-10      78.28Ki ± 0%   14.02Ki ±  1%  -82.09% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=10240-10     88.82Ki ± 0%   23.63Ki ±  1%  -73.39% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=512000-10    975.1Ki ± 3%   857.0Ki ±  3%  -12.12% (p=0.002 n=6)
geomean                                          232.3Ki        79.78Ki        -65.65%

                                              │  /tmp/old  │             /tmp/new             │
                                              │ allocs/op  │ allocs/op   vs base              │
RPCCompressor/comp=gzip,payloadSize=1024-10     252.0 ± 0%   244.0 ± 0%  -3.17% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=10240-10    252.0 ± 0%   244.0 ± 0%  -3.17% (p=0.002 n=6)
RPCCompressor/comp=gzip,payloadSize=512000-10   296.0 ± 1%   288.0 ± 1%  -2.70% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=1024-10     230.0 ± 0%   223.0 ± 0%  -3.04% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=10240-10    230.0 ± 0%   223.0 ± 0%  -3.04% (p=0.002 n=6)
RPCCompressor/comp=noop,payloadSize=512000-10   291.5 ± 2%   279.0 ± 2%  -4.29% (p=0.002 n=6)
geomean                                         257.3        248.9       -3.24%

@easwars
Copy link
Contributor

easwars commented Nov 8, 2024

Thanks @alanprot

@PapaCharlie
Copy link
Contributor

@alanprot just to be clear, are those benchmark results just with mem.ReadAll, without your changes? Or do they include your changes as well?

@PapaCharlie
Copy link
Contributor

Nevermind, you can't really use both mem.ReadAll and DecompressedSize anyway. If #7653 fixes the problem, then I think we can nuke the entire TODO block altogether

Copy link

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Nov 18, 2024
@easwars
Copy link
Contributor

easwars commented Nov 18, 2024

Closing this as #7653 has been merged.

Thank you for your contribution though @alanprot

@easwars easwars closed this Nov 18, 2024
@alanprot
Copy link
Author

alanprot commented Nov 28, 2024

@easwars There is any timeline to release a new version with #7653?

@purnesh42H
Copy link
Contributor

@easwars There is any timeline to release a new version with #7653?

It should be released as part of 1.69

@alanprot
Copy link
Author

Is there a date for the release of 1.69?

@easwars
Copy link
Contributor

easwars commented Dec 2, 2024

12/10/2024 is the scheduled date. Give it +/- 3 days.

@alanprot
Copy link
Author

alanprot commented Dec 2, 2024

Awesome!

Thank you!! :D

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Area: RPC Features Includes Compression, Encoding, Attributes/Metadata, Interceptors. stale Status: Requires Reporter Clarification Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants