-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer #9401
Conversation
06c9166
to
c6b77ea
Compare
Some tests are breaking for this upgrade, have to fix them. |
7ed1789
to
b61cbbd
Compare
Edit: Memory unsafe optimizations are back again :) Solved by letting relevant protobuf messages keep an unmarshalling buffer reference. |
d79ea9f
to
f484e3a
Compare
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
ba307e7
to
35b8b14
Compare
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
35b8b14
to
7251bb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for addressing all my comments.
Follow up on #9401 While the write path has been carefully crafted to make sure that we don't keep references to the strings backed by arrays with data from grpc stream (because it would cause a memory leak), this doesn't seem to be the case in the query path, where some tests started failing after we started to reuse those backing arrays through the usage of memory buffers implemented in the new gRPC library. This change reverts the buffer freeing, means that it won't be recycled (and will be garbage collected) to ensure data correctness, while we investigate where the data references are kept. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Follow up on #9401 While the write path has been carefully crafted to make sure that we don't keep references to the strings backed by arrays with data from grpc stream (because it would cause a memory leak), this doesn't seem to be the case in the query path, where some tests started failing after we started to reuse those backing arrays through the usage of memory buffers implemented in the new gRPC library. This change reverts the buffer freeing, means that it won't be recycled (and will be garbage collected) to ensure data correctness, while we investigate where the data references are kept. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
…otobuf messages to retain their unmarshaling buffer (#9401) Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
* Revert "Don't free buffers after reading query stream (#9721)" This reverts commit f7b6017. * Revert: Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer (#9401) Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com> * Revert "Distributor.queryIngesterStream: Free gRPC buffers upon error (#9756)" This reverts commit eda1a4b. --------- Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
* [release-2.14] fix(deps): update github.com/thanos-io/objstore digest to f90c89a (main) (#9625) * Update github.com/thanos-io/objstore digest to f90c89a (#9534) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> (cherry picked from commit 3c97a61) * update changelog Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> --------- Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Vladimir Varankin <vladimir.varankin@grafana.com> * 2.14.1 Prepare patch release (#9796) * bump patch version Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> * rebuild assets Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> --------- Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> * MQE: Fix handling of string results (#9803) This would previously panic after the query was closed * chore(deps): update grafana/mimirtool docker tag to v2.14.1 (#9806) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Revert upgrade to google.golang.org/grpc v1.66.2 (#9811) * Revert "Don't free buffers after reading query stream (#9721)" This reverts commit f7b6017. * Revert: Upgrade to google.golang.org/grpc v1.66.2 / modify certain protobuf messages to retain their unmarshaling buffer (#9401) Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com> * Revert "Distributor.queryIngesterStream: Free gRPC buffers upon error (#9756)" This reverts commit eda1a4b. --------- Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com> * Improve lock contention affecting read and write latencies during TSDB head compaction (cherry-pick Prometheus PR 15242) (#9822) * Cherry-pick Prometheus PR 15242 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> --------- Signed-off-by: Marco Pracucci <marco@pracucci.com> --------- Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Grot (@grafanabot) <43478413+grafanabot@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Vladimir Varankin <vladimir.varankin@grafana.com> Co-authored-by: Joshua Hesketh <joshua.hesketh@grafana.com> Co-authored-by: Đurica Yuri Nikolić <durica.nikolic@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>
What this PR does
Upgrade google.golang.org/grpc from v1.65.0 to v1.66.2. The performance regression referenced from dskit's reversal to v1.65.0 is fixed in v1.66.2, and the potential functional regression also referenced in said reversal is Loki specific.
What we found out is that v1.66 fixes a serious performance bottleneck in conjunction with compression; the
decompress
function is optimized through memory pooling, whereas v1.65.0 would useio.ReadAll
and cause a lot of allocations during decompression. As a result, we saw ingester CPU usage increase by ~40% when enabling gRPC compression. The increase is now much more modest, maybe ~7% based on observations.In order to keep allowing for unsafe references to unmarshalling buffers in e.g.
LabelAdapter
, I had to introduce a custom gRPC unmarshalling hook (mimirpb.CodecV2.Unmarshal
) plus a scheme for protobuf messages to keep a reference to their unmarshalling buffer (mimirpb.UnmarshalerV2
). The underlying reason is that from v1.66 on, gRPC immediately recycles each buffer after unmarshalling, unless a reference is kept. This causes data races with e.g.LabelAdapter
taking unsafe string references to the unmarshal buffer, unless, as mentioned, a buffer reference is kept with the root protobuf message.Ideally, one should call
FreeBuffer
on protobuf messages keeping an unmarshalling buffer reference so the buffer can be given back to the gRPC pool. If this isn't done though, there also shouldn't be a memory leak since the buffer should be garbage collected.Which issue(s) this PR fixes or relates to
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.