-
Notifications
You must be signed in to change notification settings - Fork 820
Optimised Ring.ShuffleShard() and disabled subring cache in store-gateway, ruler and compactor #3601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimised Ring.ShuffleShard() and disabled subring cache in store-gateway, ruler and compactor #3601
Conversation
|
||
// Whether the shuffle-sharding subring cache is disabled. This option is set | ||
// internally and never exposed to the user. | ||
SubringCacheDisabled bool `yaml:"-"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config option is "disabled bool" instead of "enabled bool" so that the default will be "enabled" (because the zero value of a bool is false).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
be49fee
to
438fa45
Compare
@@ -34,6 +34,11 @@ func GenerateTokens(numTokens int, takenTokens []uint32) []uint32 { | |||
i++ | |||
} | |||
|
|||
// Ensure returned tokens are sorted. | |||
sort.Slice(tokens, func(i, j int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ring lifecycler it's already guaranteed all tokens are sorted. I've introduced sorting here as well to have same guarantee in tests too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM so far. (I thought it's still a draft :))
c0d0e97
to
9d2de1d
Compare
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
3e87a3d
to
49e56e8
Compare
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…ortexproject#3815) Signed-off-by: Marco Pracucci <marco@pracucci.com>
What this PR does:
We've been reported an high memory utilisation in the store-gateway when shuffle-sharding is enabled and the cluster has a large number of tenants and replicas. This memory utilisation increase is caused by the in-memory subring cache used by the shuffle-sharding, which is used to reduce the CPU utilisation due to the expensive computation of the subring.
However, store-gateway, ruler and compactor use
Ring.ShuffleShard()
only to check whether a tenant/block/rulegroup belongs to their own shard and this operation is run, for each tenant, only at a regular interval (it doesn't depend on QPS).Disabling subring cache is a possible solution (proposed in this PR), but then CPU utilisation is expected to increase. For this reason, in this PR I've also worked to optimise
Ring.ShuffleShard()
paying attention to not increase CPU time forRing.Get()
(problem is that it's easy to optimiseShuffleShard()
but hard to do without impactingGet()
).BenchmarkRing_ShuffleShard_512Tokens (512 tokens per instance)
BenchmarkRing_Get
BenchmarkRing_ShuffleShard (128 tokens per instance)
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]