-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transport: reuse memory on read path #1455
Comments
Specific areas where we would like to do this:
|
Is someone actively working on this? I've been playing around with different buffer pool strategies, but don't want to look too much more if I'm just duplicating work. I've been trying bucketed I'm trying a *sync.Pool implementation (one *sync.Pool per buffer size bucket) and a leaky buffer implementation (one buffered |
I would prefer that the codec object itself is in a sync pool and is
assumed to be unsafe to use concurrently. This way the codec itself can
optimize its buffer use. This codec that I would like to use with grpc is
the one that I have implemented in github.com/gogo/protobuf/codec
…On Sat, 2 Dec 2017, 01:18 Muir Manders, ***@***.***> wrote:
Is someone actively working on this? I've been playing around with
different buffer pool strategies, but don't want to look too much more if
I'm just duplicating work.
I've been trying bucketed []byte buffer pool where there are tiers of
buffer sizes. For example, if you ask for a 700 byte buffer, it might pull
a buffer from the 1024 byte pool, slice it down to 700 bytes and return it.
I'm trying a *sync.Pool implementation (one *sync.Pool per buffer size
bucket) and a leaky buffer implementation (one buffered chan []byte per
buffer size bucket). They certainly help, but neither one has plucked all
the low-hanging garbage fruit on the first try.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1455 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABvsLVtUmZ3k_U6PNTHhMQgt9P0qZ57qks5s8JddgaJpZM4O_JA3>
.
|
Here is the correct link
https://github.com/gogo/protobuf/blob/master/codec/codec.go
…On Sat, 2 Dec 2017, 10:42 Walter Schulze, ***@***.***> wrote:
I would prefer that the codec object itself is in a sync pool and is
assumed to be unsafe to use concurrently. This way the codec itself can
optimize its buffer use. This codec that I would like to use with grpc is
the one that I have implemented in github.com/gogo/protobuf/codec
On Sat, 2 Dec 2017, 01:18 Muir Manders, ***@***.***> wrote:
> Is someone actively working on this? I've been playing around with
> different buffer pool strategies, but don't want to look too much more if
> I'm just duplicating work.
>
> I've been trying bucketed []byte buffer pool where there are tiers of
> buffer sizes. For example, if you ask for a 700 byte buffer, it might pull
> a buffer from the 1024 byte pool, slice it down to 700 bytes and return it.
>
> I'm trying a *sync.Pool implementation (one *sync.Pool per buffer size
> bucket) and a leaky buffer implementation (one buffered chan []byte per
> buffer size bucket). They certainly help, but neither one has plucked all
> the low-hanging garbage fruit on the first try.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#1455 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABvsLVtUmZ3k_U6PNTHhMQgt9P0qZ57qks5s8JddgaJpZM4O_JA3>
> .
>
|
There is notable garbage produced outside of proto marshaling in the http2 transport, so I was hoping to address that too. I was looking at more generic []byte buffer reuse since it could be used by the existing codec, and used by the http2 transport. How is pooling the codec better than just pooling the []byte slice? The existing codec could pull its []byte from []byte pool, and also implement similar gogoproto optimized logic taking advantage of |
Thats interesting. I didn't think that the http2 transport could also benefit from buffer pooling. At the previous company I worked, we had a similar buffer pooling scheme for a different purpose. The API asks for a buffer size and returns that size of byte buffer for you. It does this by looking through its slice of buffers for a big enough buffer and then returning the resized to your smaller size. When buffers are returned they are stretched back to their original capacity. |
My thinking for the codec was to add an optional method to it that, if implemented, would be called when the buffer it returns from The http2 transport uses byte buffers to hold data it reads from the network; this can and should be re-used with a pool as well. |
My thought was instead of returning the buffer to the codec, return the buffer to the generic buffer pool instead. Then anyone can pull a buffer from the pool, and consumers of the buffers can put them back in the pool when they are done. It is safe to never return a buffer, or even to return a buffer that you did not get from the pool originally. On the other hand it is very unsafe to return a buffer and then still use it, or a slice of it. We would need some approach to make ourselves confident that use-after-release doesn't happen. |
What are other languages doing to optimize this situation? Maybe we can
learn something from c++ or even Java.
…On Sat, 2 Dec 2017, 23:26 Muir Manders, ***@***.***> wrote:
My thought was instead of returning the buffer to the codec, return the
buffer to the generic buffer pool instead. Then anyone can pull a buffer
from the pool, and consumers of the buffers can put them back in the pool
when they are done. It is safe to never return a buffer, or even to return
a buffer that you did not get from the pool originally. On the other hand
it is very unsafe to return a buffer and then still use it, or a slice of
it. We would need some approach to make ourselves confident that
use-after-release doesn't happen.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1455 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABvsLa5exKkGKa5OEtS5Rb-3p0UBGuf_ks5s8c6GgaJpZM4O_JA3>
.
|
The lowest hanging fruit, IMHO, is to reuse the memory buffer allocated to copy bytes from data frame, as @dfawley suggested. grpc-go/transport/http2_client.go Line 869 in cd563b8
This buffer can be recycled when the application has read it and copied to its preallocated chunck to be used for unmarshling. grpc-go/transport/transport.go Line 143 in cd563b8
We can have a few sync.Pool with different sizes increasing in say, powers of 4, and capped to the max size of One problem with sync.Pool is that it drops all the memory during every GC cycle. In following efforts we can work with the Go team to resolve that. Using a leaky buffer comes with the onus of doing our own GC cycles for that memory. I don't think that's a good idea. Things that C++ and Java do don't necessarily apply to Go. Java, IIRC, allocates memory from the OS space and keeps reusing that without ever freeing it again. This memory is not freed by the GC either because it's outside of JVM's jurisdiction. I personally think, in gRPC-Go, we shouldn't go about allocating a big a chunk and doing our memory management on that. We should at least give sync.Pool a chance and see if that can be optimized to fit our use case. |
Fair enough. It would be great to compare sync.Pool with classic buffer reuse. |
sync.Pool is not enough as follow issue say, golang/go#23199 . Unbound capacity buffer will cause worse GC or memory overhead . Maybe chunked capacity pool/buffer would be better |
@Zeymo Thanks for the suggestions. We are aware of sync.Pool's behavior to not return the most fitting buffer. In the suggestion above I mention using buckets of sync.Pool with different sizes. |
I've got another idea to reuse the buffer for stream.
type parser struct {
r io.Reader
header [5]byte
buf []byte
}
if length > cap(p.buf) {
p.buf = make([]byte, int(length))
}
msg = p.buf[:length] What do you think? |
For your second code snippet, I think you'd actually want to compare against the buffer's capacity rather than length. You can actually slice up to the capacity of a buffer, not just the length. The gob package does something very similar: That way, if you need buffers of size X, X-1, and then X, you aren't allocating twice. |
@havoc-io |
@coocood This actually seems like a good idea to reuse buffer on the gRPC layer (perhaps a variation of this might work on the transport layer as well). It's simple, doesn't require global shared pools. However, the only thing that worries me is the case when we have long lived streams with intermittent reading/writing activity of large messages (say >=1MB). In such a case, this stream will carry a large buffer for its life. Now if there were thousands of such streams we're in serious trouble. Perhaps, this problem can be solved by freeing this buffer every once in a while from the stream monitor loop here: Line 283 in 7f2472b
|
@MakMukhi |
@coocood I like that idea too. You're right accessing the buffer from two different goroutines would require use of locks which is an additional performance overhead. CallOption seems like a good way to go about it. |
@MakMukhi |
Just a note for @coocood: in #1774 (comment) you wrote:
While I agree it's rare, this is exactly the case here (the API gateway the flamegraph is from is a "client" for a few hundred backends, and it keeps long-lived connections to them) |
@dfawley do you need me looking into anything else? I'm not 100% sure though this issue is the best place to discuss the problem we're seeing as, although related, as you mentioned they're likely different issues. |
@CAFxX - could you file a separate issue for the problems you're encountering, please? In terms of additional information to gather, it would be interesting to know how often connections are being created in your setup. It seems like you have not only a large pool of connections, but new connections are also being created and destroyed frequently - is this the case? Also, I don't see any raw numbers in the flame graph - what is the total amount of allocations you are seeing (cumulative and per-minute/etc)? |
How about to use sync.Pool to reuse memory buffers which creates by recvMsg method of parser to read gRPC message from i/o layer. In my project this is about 5% of total memory allocations. |
@MakMukhi are there any plans to continue with this work ? |
@dfawley should be able to help with prioritization/resource allocation here. |
We don't have resources to focus on performance improvements at this time. This is unlikely to be prioritized for at least the next 3-6 months. If someone is interested in contributing, we would be willing to accept PRs if the author ensures that thorough performance benchmarking is done and shows clear wins. |
If this is the one I'm thinking of, it allowed one side to force potentially large allocations on the other side without actually sending a similar amount of data. I.e. the sender declares it will send 1GB of data on a stream; the receiver makes the allocation; then the sender doesn't send anything further on that stream. This could very quickly OOM the peer (across many streams). |
From reading this issue and related ones, it seems there is a tension between users who have mostly small message types and do not (or may not) use compression and those who have much larger message types. It sounds like the former group does not need, and may actively be harmed by, adding any type of pooling or memory reuse while the latter group would see a substantial benefit from it. If that is accurate, it doesn't sound like there is a one size fits all solution here. What are your thoughts on being able to add a type via a Perhaps something like: type RecvBuffers interface {
GetReadBuffer(size int) *bytes.Buffer
ReturnReadBuffer(b *bytes.Buffer)
GetDecompressionBuffer(size int) *bytes.Buffer
ReturnDecompressionBuffer(b *bytes.Buffer)
} Implementers could decide to return a pooled buffer based on the requested size (or not). The default implementation could behave as the recv flow does now by allocating the requested memory each time. Assuming this could be implemented in a way that doesn't harm the "out of the box" performance characteristics, this would allow users to control their own memory usage. Thoughts? |
@dfawley Thoughts on the above? I'm happy to attempt this but I don't want to go off on the wrong path |
Sorry for the delay. This is a pretty involved issue, and it's been a year+ since I've really thought about it. I don't like the sound of injecting custom memory re-use logic into grpc. If it needs to be manually configured, there should be one or two variables that can be tweaked instead. But ideally we can find a way to make things work well out of the box for 99%+ of users. |
Hi, is this still in consideration? We just did some tests on the cortex project and we could see a significant reduction on cpu and memory utilization (~15%) on some of our high tps components. On our tests we just created Is this an approach we could take here? |
Seems this is being worked on: #5862 for streamed messages. |
One possible idea is to use a leaky buffer.
The text was updated successfully, but these errors were encountered: