-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: add initial symbol tables support #5906
base: main
Are you sure you want to change the base?
Conversation
Nice! Is there any benchmark for this? |
I made a quick first pass, this looks pretty cool! I could not understand if the layered querier topology is covered as well. Is the table (indexing space) somehow recursively divided? |
This looks really amazing! Would be interesting to see benchmarks too, especially with the lookup table bimap! |
a7553a1
to
526bc24
Compare
526bc24
to
6b85334
Compare
Add initial support for symbol tables; symbol tables are sent via hints that are used to deduplicate strings. The main benefit of doing this is to reduce network traffic && reduce number of allocations needed for all of the different strings. Strings are referenced by their number. In total, there could be math.MaxUint64 strings. To avoid blocking in querier, symbol tables / references are automatically adjusted depending on which store it is in the stores slice. In other words, the whole math.MaxUint64 space is divided into equal number of strings for each StoreAPI. If that limit is breached then raw strings are sent instead. This is taken care of by the LookupTable builder. We ensure that old versions are compatible with this one by passing over the maximum number of allowed unique strings via StoreRequest. If that limit is zero then we disable the building of the lookup table. This compression is beneficial in almost all cases. The worst case is when there are a lot of unique metrics with unique strings in each metric. However, I strongly believe that this will only happen 1% of the time due to the nature of monitoring. This is because a set of metrics will always have some identical dimensions i.e. same labels with only one or two changing. I have also attempted an alternative implementation whereas a `Label` could be `oneof` compressed labels or the regular labels. However, the implementation wasn't quite as elegant, cumbersome. This is much cleaner. For now, only streaming Sidecar + Thanos Store is implemented. We could add support for this in other components as a follow-up. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
6b85334
to
18b1603
Compare
👋
Each upper layer divides the uint64 space into equal parts for each StoreAPI. If any StoreAPI exceeds that limit then it errors out. Given that string references are unique, I think the function that I'm using should always remap all IDs properly. I also thought about adding this behind a flag but is it realistic to have thousands and thousands of StoreAPIs that give millions of unique strings? 🤔 math.MaxUint64 is What benchmarks would you like to see for this? If this code looks good to my team mates then I'll try this out in prod and see the results 👁️ |
I agree that it's hard to have a benchmark since a lot of benefits will come from reducing network bandwidth. So anything we run on a single machine will not be representative. |
I was thinking benchmarking the compression/decompression might be useful so that if this is changed in the future, we can catch regressions! 🙂 |
Was there any reduction in query latency? |
Doesn't look so, maybe just a little bit :( but with much smaller network usage, it's possible to stuff more recording/alerting rules into it |
cc @bboreham - want to check it? You might have some thoughts. (: |
Thanks for this; I have considered doing something like this in the remote-write protocol. Re some conversations we had at PromCon, you may like grafana/mimir-prometheus#353. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, the implementation looks good to me and your prod test results seem persuasive 👀 , I have mostly small nits.
FWIW, idea somewhat related idea to others in this conversation - reusing strings in receiver #5899 - did not manage to finalize it yet though |
Thanks @bboreham sending me this way. @GiedriusS I have a similar implementation of a symbols table, based originally on the interner we already use in remote write. Let me know if you'd like to contribute yours upstream so that it could be reused in remote write. |
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Been running this for months, haven't spotted any problems. PTAL. |
This looks good overall, I'll take another look next week to make sure we don't miss something. |
It takes >17s. to decompress >80M elements. Doing that concurrently cuts down the time to 3-4 seconds. We have lots of spare CPU so let's do that. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
…port Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Updated PR - merged I can't imagine running Thanos in prod without this anymore because Queriers would get easily overwhelmed - we have thousands of recording rules and the labels completely dominate the % of bandwidth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Any recommendations on what value to set the number of workers to?
How much does this feature help, any dashboards/metrics to share? @GiedriusS 👀 Updated: NVM I see the improvements! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐ ⭐ ⭐
One more nit regarding flag from me, but I'll leave it up to you, the implementation looks good 🙌
@@ -111,6 +111,9 @@ func registerQuery(app *extkingpin.App) { | |||
maxConcurrentSelects := cmd.Flag("query.max-concurrent-select", "Maximum number of select requests made concurrently per a query."). | |||
Default("4").Int() | |||
|
|||
maxConcurrentDecompressWorkers := cmd.Flag("query.max-concurrent-decompress-workers", "Maximum number of workers spawned to decompress a set of compressed storepb.Series. Setting this to higher than zero enables label compression during querying - CPU usage will be slightly higher whilst network usage will be significantly lower."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag is a tiny bit confusing to me - it sounds like it configures the number of decompress workers, but in fact this also turns on / off the compression itself. It either sounds like two distinct flags would be better or somehow these configs should be combined, if it can be, in a reasonable way so that the flag name reflects it (but I struggle to find one).
Also I wonder if do not want to put this behind hidden flag straightway, a bit more documentation would be nice (but can be done in a separate PR).
@GiedriusS anything still blocking this? Or you just need to resolve conflicts? |
This will need to be reworked a bit after the latest changes because order became important now. It means that we need to somehow merge different symbol references from multiple StoreAPIs in such a way as to preserve order. I don't know how to do that, at the moment. The references will need to somehow be encoded in such a way as to preserve order. |
Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
Do we still want to continue this work? |
+1 for continuing the work |
Add initial support for symbol tables; symbol tables are sent via hints that are used to deduplicate strings. The main benefit of doing this is to reduce network traffic && reduce number of allocations needed for all of the different strings.
Strings are referenced by their number. In total, there could be math.MaxUint64 strings. To avoid blocking in querier, symbol tables / references are automatically adjusted depending on which store it is in the stores slice. In other words, the whole math.MaxUint64 space is divided into equal number of strings for each StoreAPI. If that limit is breached then raw strings are sent instead. This is taken care of by the LookupTable builder. We ensure that old versions are compatible with this one by passing over the maximum number of allowed unique strings via StoreRequest. If that limit is zero then we disable the building of the lookup table.
This compression is beneficial in almost all cases. The worst case is when there are a lot of unique metrics with unique strings in each metric. However, I strongly believe that this will only happen 1% of the time due to the nature of monitoring. This is because a set of metrics will always have some identical dimensions i.e. same labels with only one or two changing.
I have also attempted an alternative implementation whereas a
Label
could beoneof
compressed labels or the regular labels. However, the implementation wasn't quite as elegant, cumbersome. This is much cleaner.For now, only streaming Sidecar + Thanos Store is implemented. We could add support for this in other components as a follow-up.
Signed-off-by: Giedrius Statkevičius giedrius.statkevicius@vinted.com