-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a mechanism for encoded message buffer recycling #6619
Comments
Thanks for the detailed issue. We will get back to you shortly. |
Gentle ping on this. We discussed this issue and the associated PR with easwars, dfawley, zasweq, joybestourous and s-matyukevich during our trip to the google office. @dfawley expressed concerns about where buffers are released:
Anything you'd like to add? |
Apologies for the delay. I've been swamped with other things and I haven't been able to get to this yet. |
Unfortunately I could not get to this in time and I'm going to be away from work for a while. Someone else from the team will pick this up. Thanks for your patience. |
Do you mean, "Stats handlers may keep a reference"? This is what I was worried about (I mentioned this in #6613). My initial thought was that doing so should not be allowed (if messages need to be preserved beyond the scope of an interceptor, they should be memcpy'd). There's no way to enforce that, though, and we'll likely make the ecosystem more brittle if we change the rule there. There's precedent for merging these kinds of PRs and automatically turning off the feature if any interceptors are added (see #5862). Could we take a similar approach here? Lines 705 to 707 in e14d583
|
Some discussion on the same issue of stat handlers is in #6660. |
We have a take a step back and look at the best way to implement these buffer recycling enhancements. However, our team needs more cycles to investigate and review the changes proposed. We will keep posting updates as our investigation progresses. |
@arvindbr8 Do you have any update to share? Is there a way for me to help? |
@HippoBaro -- Our team is tracking this item internally - which includes a wider analysis and review with other implementations of gRPC. However, this item could not be assigned this quarter with the other high priority projects in the pipeline. Not sure if there is much to do until we pick this up. We will keep you updated. |
Hey @arvindbr8, I'm following up on this since it's the end of the quarter. What are the odds this issue will be picked up this coming quarter? We'd see some pretty significant performance improvements on our end if this became a reality. Thanks in advance! |
We might be doing some work on codecs next quarter depending on internal priorities. If so, the most likely change would be to use a scatter-gather API that references the transport's buffers directly and takes ownership of the buffers. It would probably also want to have a way to indicate that the buffers are no longer needed, as we already re-use these buffers. The two would enable "zero"-copy in gRPC (excluding the Go std HTTP/2 framer which doesn't support it) which should improve performance on data heavy workloads for codecs that can take advantage of it (proto doesn't). |
OK makes sense. Is there a way for us to provide any help here? For our project, we'd really like to avoid running off of a fork but we're seeing memory allocation rates on the order of multiple gigabytes per second... We even had to set up some rate limits around actually generating/sending responses to prevent the process from OOMing. We'd save a pretty significant amount of hardware with this. |
Apologies - now that I looked more closely at the proposal itself, I noticed that you're looking at the sending of data ( For the In both directions, the buffers being passed around might want a method on them to enable the side that produced them to re-use them. The API needs to support buffer ownership transfer, re-use, and scatter-gather, even if any given codec (including protobuf) doesn't support those features. Maybe something like this? type Codec interface {
Marshal(msg any) (Buffer, error)
Unmarshal(buf Buffer, msg any) error
}
type Buffer interface {
Data() [][]byte
// Free is called by the consumer of the buffer when it is done using the data buffers.
// The producer of the buffer may then reuse it.
Free()
} It would be interesting to brainstorm other ideas and see which one is the most ergonomic. If you're able to do that and also implement the solution that we find will be best, then that would be extremely helpful. |
Funny you should mention that! I did do that: #6608. It's still rough, but it basically just extends the existing The author of this issue also provided his own thoughts/approach in #6613. I think the right answer probably lies somewhere in between. I've spent a decent amount of time looking into this issue and how it should be solved, so I'm very happy to help brainstorm. I however haven't spent much time actually contributing to gRPC-go so I don't want to presume my approach really matches the existing style/standards in the code. Maybe it'd be worth having an actual meeting to talk about this? I understand that your team has a lot on its plate and may not have time to address this, but this is a big enough problem for us at LinkedIn that my team would be happy to allocate engineer power to tackle this. All you need to do is point us in the right direction! |
So, after my meeting with @dfawley, we arrived at a proposal. The type Codec interface {
Marshal(v any) (length int, seq iter.Seq2[[]byte, error])
ReturnBuffer([]byte)
GetBuffer(len int) []byte
Unmarshal(v any, length int, seq iter.Seq2[[]byte, error]) error
} (this could be done by having a new The point of this new interface is to allow zero-copy codecs to exist, and allowing those codecs to (optionally) reuse buffers. In this case, upgrading the existing gRPC codec to this interface is trivial. Both The We want to open the discussion on this before jumping in head first, please let us know what you think! |
This works for us at Vitess. We're very interested on having a scatter-gather kind of API, particularly for serialization. Just thumbs up overall. Anything I can do to help with the implementation and/or early testing? |
Hey @vmg good to hear! Can you give me some more insight on how you'd use this? Just trying to get a better feel for what should be prioritized first. There's some interesting implications with what I've proposed. For example, some of the public APIs may have to be changed to accept the new |
We're also interested in this for SpiceDB. Most of our allocations are coming from gRPC messages, so anything to reuse those helps our overall scalability and latency. I'm not sure we have strong opinions on the hook to provide the data slice, just that the slices can be reused. A simple sync.Pool implementation would be heaps (pun intended) of help. EDIT: FWIW, SpiceDB also uses vtprotobuf (as @vmg mentions below). |
@PapaCharlie: for Vitess, we happen to often work with very large packets coming from the MySQL binary protocol which we must make available through the distributed system via GRPC. Although the default ProtoBuf Go library does not support incremental/chunked serialization, I maintain an alternative implementation (https://github.com/planetscale/vtprotobuf/) that Vitess and many other OSS projects use. Being able to serialize as an |
This new codec, as outlined in grpc#6619 will allow the reuse of buffers to their fullest extent. Note that this deliberately does not (yet) implement support for stathandlers, however all the relevant APIs have been updated and both old and new Codec implementations are supported.
This new codec, as outlined in grpc#6619 will allow the reuse of buffers to their fullest extent. Note that this deliberately does not (yet) implement support for stathandlers, however all the relevant APIs have been updated and both old and new Codec implementations are supported.
Hey everyone, circling back on this after actually implementing a lot of this. Here's the interface that we landed on: package encoding
type CodecV2 interface {
Marshal(v any) (out mem.BufferSlice, err error)
Unmarshal(data mem.BufferSlice, v any) error
Name() string
}
...
package mem
type BufferSlice []*Buffer
type Buffer struct {
data []byte
refs *atomic.Int32
free func([]byte)
} These |
Hey @vmg, what are your thoughts on this? Got some thumbs up from the other folks on this issue but since you maintain vtprotobuf, your code generator will likely be able to make the best use of this |
This seems sensible to me! To be fair, there may be a sharp corner hidden here which is not obvious by looking at the API. My homeboy @frouioui is going to try your CodecV2 branch next week to attempt a Vitess optimization which we've been planning for a while. I think he'll have more meaningful feedback very soon. |
@vmg maybe there are some updates? We are also very interested in this (https://github.com/thanos-io/thanos). |
I took a stab at implementing a vtprotobuf Codec V2 now that #7356 was merged. Does this look right? authzed/spicedb@main...jzelinskie:spicedb:grpc-codecv2 |
@jzelinskie: I'm not quite sure! Could you please open a PR on the upstream repository and we can review and discuss it? |
Are there benchmarks for the buffer recycling available? I'm consistently getting better results with the original codec. I can't tell if my tests are subtly wrong, or if using Full code in gist: https://gist.github.com/coxley/3c3eab255f5a34cf43c33491f3c21703 With CodecV2:
With Old:
|
I spoke a little soon, this is expected. But something seems off. The benchmark is similarly inflated in new vs. old when using |
Addressed in #7571 |
Use case(s) - what problem will this feature solve?
On their way to the wire, messages are encoded through user-provided
encoding.Codec
modules. These modules implement the following interface:The implementation of this interface is responsible for allocating the resulting byte slice. Once gRPC is done writing the message to the underlying transport, the reference to the slice is dropped, and garbage collection has to scavenge it. This means memory used for message encoding is never directly reused but has to go through a GC cycle before it can be allocated again.
Proposed Solution
We propose adding an additional codec interface that complements the existing one:
This API takes the existing type
grpc.SharedBufferPool
that provides an allocator-like API. Codecs that can marshal into pre-existing buffers, such asprotobuf
, may implement it.This interface should be optional because codecs need to estimate an upper bound for the size of the marshaled message to allocate a sufficiently large buffer. However, this estimate need not be accurate, as Go will reallocate buffers transparently if needed, and the GC will safely collect the original slice.
This new interface allows the allocation side to reuse buffers, but gRPC needs to return them to be recycled. We propose that gRPC return these encoded message buffers to the buffer pool once messages have been fully written to the transport and are no longer read from.
This implies adding new APIs for clients and servers, such as:
and
Giving users the ability to provide their implementation of buffer pools is advantageous for those that deal with specific messages of known sizes, and their implementation can be optimal. It also leverages prior art (see
grpc.WithRecvBufferPool
, for example)Alternatives Considered
Changes to the codec could be omitted entirely: simply returning buffers to a user-provided pool and letting them coordinate with their codec to pull from it. However, this would make it difficult for the default
proto
codec to support this feature.Alternatively, this change could be made without any external API changes by internally creating and managing the buffer pool. This would prevent third-party codecs from benefitting from the improvement, however.
Additional Context
We are currently running a service that uses gRPC streaming to stream (potentially) large files, in chunks, back to gRPC clients over the network. We measured that the Go allocation volume per second is roughly equal to the network throughput of the host. This creates GC cycles that introduce latency spikes and prevent us from predictably saturating the network at a reasonable CPU cost. After investigation, we have isolated the source of most of these allocations to protobuf slice creation during message serialization.
This service demonstrated the benefit of this approach with a very significant reduction in allocation volume (>90% fewer bytes allocated per second and a 7x reduction in allocations) and much lower CPU usage (>30% less). We offer an implementation of this proposal for context and as a conversation starter: #6613
I should note that there was prior work on this issue here #2817 and here #2816 but these issues didn't attract much discussion.
Thank you!
The text was updated successfully, but these errors were encountered: