store: add initial symbol tables support #5906

GiedriusS · 2022-11-18T14:57:11Z

Add initial support for symbol tables; symbol tables are sent via hints that are used to deduplicate strings. The main benefit of doing this is to reduce network traffic && reduce number of allocations needed for all of the different strings.

Strings are referenced by their number. In total, there could be math.MaxUint64 strings. To avoid blocking in querier, symbol tables / references are automatically adjusted depending on which store it is in the stores slice. In other words, the whole math.MaxUint64 space is divided into equal number of strings for each StoreAPI. If that limit is breached then raw strings are sent instead. This is taken care of by the LookupTable builder. We ensure that old versions are compatible with this one by passing over the maximum number of allowed unique strings via StoreRequest. If that limit is zero then we disable the building of the lookup table.

This compression is beneficial in almost all cases. The worst case is when there are a lot of unique metrics with unique strings in each metric. However, I strongly believe that this will only happen 1% of the time due to the nature of monitoring. This is because a set of metrics will always have some identical dimensions i.e. same labels with only one or two changing.

I have also attempted an alternative implementation whereas a Label could be oneof compressed labels or the regular labels. However, the implementation wasn't quite as elegant, cumbersome. This is much cleaner.

For now, only streaming Sidecar + Thanos Store is implemented. We could add support for this in other components as a follow-up.

Signed-off-by: Giedrius Statkevičius giedrius.statkevicius@vinted.com

yeya24 · 2022-11-18T15:47:22Z

Nice! Is there any benchmark for this?

fpetkovski · 2022-11-19T18:14:39Z

I made a quick first pass, this looks pretty cool! I could not understand if the layered querier topology is covered as well. Is the table (indexing space) somehow recursively divided?

saswatamcode · 2022-11-21T06:39:18Z

This looks really amazing! Would be interesting to see benchmarks too, especially with the lookup table bimap!

Add initial support for symbol tables; symbol tables are sent via hints that are used to deduplicate strings. The main benefit of doing this is to reduce network traffic && reduce number of allocations needed for all of the different strings. Strings are referenced by their number. In total, there could be math.MaxUint64 strings. To avoid blocking in querier, symbol tables / references are automatically adjusted depending on which store it is in the stores slice. In other words, the whole math.MaxUint64 space is divided into equal number of strings for each StoreAPI. If that limit is breached then raw strings are sent instead. This is taken care of by the LookupTable builder. We ensure that old versions are compatible with this one by passing over the maximum number of allowed unique strings via StoreRequest. If that limit is zero then we disable the building of the lookup table. This compression is beneficial in almost all cases. The worst case is when there are a lot of unique metrics with unique strings in each metric. However, I strongly believe that this will only happen 1% of the time due to the nature of monitoring. This is because a set of metrics will always have some identical dimensions i.e. same labels with only one or two changing. I have also attempted an alternative implementation whereas a `Label` could be `oneof` compressed labels or the regular labels. However, the implementation wasn't quite as elegant, cumbersome. This is much cleaner. For now, only streaming Sidecar + Thanos Store is implemented. We could add support for this in other components as a follow-up. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

GiedriusS · 2022-11-21T11:59:51Z

👋

I made a quick first pass, this looks pretty cool! I could not understand if the layered querier topology is covered as well. Is the table (indexing space) somehow recursively divided?

Each upper layer divides the uint64 space into equal parts for each StoreAPI. If any StoreAPI exceeds that limit then it errors out. Given that string references are unique, I think the function that I'm using should always remap all IDs properly.

I also thought about adding this behind a flag but is it realistic to have thousands and thousands of StoreAPIs that give millions of unique strings? 🤔 math.MaxUint64 is 18446744073709551615 😄 I don't think it should even be reached. An even then maybe we could bump to uint128.

What benchmarks would you like to see for this? If this code looks good to my team mates then I'll try this out in prod and see the results 👁️

fpetkovski · 2022-11-21T12:52:07Z

I agree that it's hard to have a benchmark since a lot of benefits will come from reducing network bandwidth. So anything we run on a single machine will not be representative.

saswatamcode · 2022-11-22T08:08:04Z

What benchmarks would you like to see for this? If this code looks good to my team mates then I'll try this out in prod and see the results

I was thinking benchmarking the compression/decompression might be useful so that if this is changed in the future, we can catch regressions! 🙂
But yes, it won't capture the network bandwidth benefits, so trying in prod sounds better for that.

GiedriusS · 2022-11-22T08:15:54Z

Here's how network traffic dropped after deploying this:

Dropped by 3x. This is one of the servers that is only evaluating rules via Thanos Ruler so the effect is visible the most.

fpetkovski · 2022-11-22T08:28:28Z

Was there any reduction in query latency?

GiedriusS · 2022-11-22T09:34:40Z

Was there any reduction in query latency?

Doesn't look so, maybe just a little bit :( but with much smaller network usage, it's possible to stuff more recording/alerting rules into it

GiedriusS · 2022-11-22T11:14:07Z

RAM usage also dropped around ~20% with no noticeable increase in CPU usage 🤔

bwplotka · 2022-11-22T13:09:33Z

cc @bboreham - want to check it? You might have some thoughts. (:

bboreham · 2022-11-22T15:28:49Z

Thanks for this; I have considered doing something like this in the remote-write protocol.
A question that occurs to me is "when is a symbol table discarded from memory?", but perhaps I didn't understand.

Re some conversations we had at PromCon, you may like grafana/mimir-prometheus#353.

matej-g

Nice job, the implementation looks good to me and your prod test results seem persuasive 👀 , I have mostly small nits.

pkg/store/bucket.go

pkg/store/lookup_table.go

pkg/store/bucket.go

pkg/store/tsdb.go

matej-g · 2022-11-22T15:40:55Z

FWIW, idea somewhat related idea to others in this conversation - reusing strings in receiver #5899 - did not manage to finalize it yet though

cstyan · 2022-11-24T21:45:43Z

Thanks @bboreham sending me this way.

@GiedriusS I have a similar implementation of a symbols table, based originally on the interner we already use in remote write. Let me know if you'd like to contribute yours upstream so that it could be reused in remote write.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

GiedriusS · 2023-01-09T09:34:55Z

Been running this for months, haven't spotted any problems. PTAL.

fpetkovski · 2023-01-13T16:32:20Z

This looks good overall, I'll take another look next week to make sure we don't miss something.

pkg/query/querier.go

pkg/store/storepb/rpc.proto

It takes >17s. to decompress >80M elements. Doing that concurrently cuts down the time to 3-4 seconds. We have lots of spare CPU so let's do that. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

…port Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

GiedriusS · 2023-02-01T13:46:40Z

Updated PR - merged main && made changes according to comments. Also, while running this in prod I've noticed that decompression takes a long time with queries that retrieve millions of series hence I've added concurrent decompression support that is tunable via a new flag. Also, now this functionality is disabled unless that flag is set to a value bigger than zero.

I can't imagine running Thanos in prod without this anymore because Queriers would get easily overwhelmed - we have thousands of recording rules and the labels completely dominate the % of bandwidth.

fpetkovski

LGTM!

Any recommendations on what value to set the number of workers to?

yeya24 · 2023-02-02T18:01:52Z

How much does this feature help, any dashboards/metrics to share? @GiedriusS 👀

Updated: NVM I see the improvements!

matej-g

⭐ ⭐ ⭐

One more nit regarding flag from me, but I'll leave it up to you, the implementation looks good 🙌

matej-g · 2023-02-07T09:55:27Z

cmd/thanos/query.go

@@ -111,6 +111,9 @@ func registerQuery(app *extkingpin.App) {
 	maxConcurrentSelects := cmd.Flag("query.max-concurrent-select", "Maximum number of select requests made concurrently per a query.").
 		Default("4").Int()

+	maxConcurrentDecompressWorkers := cmd.Flag("query.max-concurrent-decompress-workers", "Maximum number of workers spawned to decompress a set of compressed storepb.Series. Setting this to higher than zero enables label compression during querying - CPU usage will be slightly higher whilst network usage will be significantly lower.").


This flag is a tiny bit confusing to me - it sounds like it configures the number of decompress workers, but in fact this also turns on / off the compression itself. It either sounds like two distinct flags would be better or somehow these configs should be combined, if it can be, in a reasonable way so that the flag name reflects it (but I struggle to find one).

Also I wonder if do not want to put this behind hidden flag straightway, a bit more documentation would be nice (but can be done in a separate PR).

matej-g · 2023-03-03T08:34:01Z

@GiedriusS anything still blocking this? Or you just need to resolve conflicts?

GiedriusS · 2023-03-03T09:03:45Z

This will need to be reworked a bit after the latest changes because order became important now. It means that we need to somehow merge different symbol references from multiple StoreAPIs in such a way as to preserve order. I don't know how to do that, at the moment. The references will need to somehow be encoded in such a way as to preserve order.

stale · 2023-05-21T19:39:48Z

Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2023-06-18T09:35:24Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

yeya24 · 2023-06-18T17:20:25Z

Do we still want to continue this work?

MitchellJThomas · 2023-09-27T17:39:54Z

+1 for continuing the work

pull-request-size bot added the size/L label Nov 18, 2022

GiedriusS added component: query component: store component: sidecar and removed size/L labels Nov 18, 2022

GiedriusS force-pushed the add_symbol_tables_support branch from a7553a1 to 526bc24 Compare November 21, 2022 08:42

pull-request-size bot added the size/L label Nov 21, 2022

GiedriusS force-pushed the add_symbol_tables_support branch from 526bc24 to 6b85334 Compare November 21, 2022 11:10

GiedriusS force-pushed the add_symbol_tables_support branch from 6b85334 to 18b1603 Compare November 21, 2022 11:50

pull-request-size bot added size/XL and removed size/L labels Nov 21, 2022

GiedriusS marked this pull request as ready for review November 21, 2022 12:01

GiedriusS mentioned this pull request Nov 21, 2022

store: add initial symbol tables support vinted/thanos#33

Merged

matej-g reviewed Nov 22, 2022

View reviewed changes

store: fix bug with big numbers in lookup table

63b6656

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

stale bot added the stale label Jan 7, 2023

GiedriusS removed the stale label Jan 9, 2023

GiedriusS requested review from matej-g and fpetkovski January 9, 2023 09:34

fpetkovski reviewed Jan 16, 2023

View reviewed changes

pkg/query/querier.go Outdated Show resolved Hide resolved

pkg/store/storepb/rpc.proto Show resolved Hide resolved

GiedriusS mentioned this pull request Jan 24, 2023

Consider v2 of /api/v1/read with improvements prometheus/prometheus#11882

Open

GiedriusS added 6 commits February 1, 2023 14:31

query: concurrently decompress data

33e2be0

It takes >17s. to decompress >80M elements. Doing that concurrently cuts down the time to 3-4 seconds. We have lots of spare CPU so let's do that. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

query: limit max decompression workers

d64dcd4

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

query: make decompression worker count tunable

a5fb2ac

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

query: improve message

90eb4dc

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

Merge remote-tracking branch 'origin/main' into add_symbol_tables_sup…

f226905

…port Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

*: fix build

5a0809b

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

fpetkovski approved these changes Feb 2, 2023

View reviewed changes

matej-g approved these changes Feb 7, 2023

View reviewed changes

GiedriusS mentioned this pull request Mar 8, 2023

e2e/store: fix flaky limits test #6193

Merged

stale bot added the stale label May 21, 2023

stale bot closed this Jun 18, 2023

yeya24 reopened this Jun 18, 2023

stale bot removed the stale label Sep 27, 2023

GiedriusS mentioned this pull request Jan 17, 2024

Version 2 of string interning #7067

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: add initial symbol tables support #5906

store: add initial symbol tables support #5906

GiedriusS commented Nov 18, 2022

yeya24 commented Nov 18, 2022

fpetkovski commented Nov 19, 2022

saswatamcode commented Nov 21, 2022

GiedriusS commented Nov 21, 2022 •

edited

Loading

fpetkovski commented Nov 21, 2022

saswatamcode commented Nov 22, 2022

GiedriusS commented Nov 22, 2022

fpetkovski commented Nov 22, 2022

GiedriusS commented Nov 22, 2022

GiedriusS commented Nov 22, 2022

bwplotka commented Nov 22, 2022

bboreham commented Nov 22, 2022

matej-g left a comment •

edited

Loading

matej-g commented Nov 22, 2022

cstyan commented Nov 24, 2022

GiedriusS commented Jan 9, 2023

fpetkovski commented Jan 13, 2023

GiedriusS commented Feb 1, 2023

fpetkovski left a comment

yeya24 commented Feb 2, 2023 •

edited

Loading

matej-g left a comment

matej-g Feb 7, 2023

matej-g commented Mar 3, 2023

GiedriusS commented Mar 3, 2023

stale bot commented May 21, 2023

stale bot commented Jun 18, 2023

yeya24 commented Jun 18, 2023

MitchellJThomas commented Sep 27, 2023

store: add initial symbol tables support #5906

Are you sure you want to change the base?

store: add initial symbol tables support #5906

Conversation

GiedriusS commented Nov 18, 2022

yeya24 commented Nov 18, 2022

fpetkovski commented Nov 19, 2022

saswatamcode commented Nov 21, 2022

GiedriusS commented Nov 21, 2022 • edited Loading

fpetkovski commented Nov 21, 2022

saswatamcode commented Nov 22, 2022

GiedriusS commented Nov 22, 2022

fpetkovski commented Nov 22, 2022

GiedriusS commented Nov 22, 2022

GiedriusS commented Nov 22, 2022

bwplotka commented Nov 22, 2022

bboreham commented Nov 22, 2022

matej-g left a comment • edited Loading

Choose a reason for hiding this comment

matej-g commented Nov 22, 2022

cstyan commented Nov 24, 2022

GiedriusS commented Jan 9, 2023

fpetkovski commented Jan 13, 2023

GiedriusS commented Feb 1, 2023

fpetkovski left a comment

Choose a reason for hiding this comment

yeya24 commented Feb 2, 2023 • edited Loading

matej-g left a comment

Choose a reason for hiding this comment

matej-g Feb 7, 2023

Choose a reason for hiding this comment

matej-g commented Mar 3, 2023

GiedriusS commented Mar 3, 2023

stale bot commented May 21, 2023

stale bot commented Jun 18, 2023

yeya24 commented Jun 18, 2023

MitchellJThomas commented Sep 27, 2023

GiedriusS commented Nov 21, 2022 •

edited

Loading

matej-g left a comment •

edited

Loading

yeya24 commented Feb 2, 2023 •

edited

Loading