-
Notifications
You must be signed in to change notification settings - Fork 626
ruler: ignore rulers in non-operation states when getting and syncing rules #11569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b12cbb4
to
c3e85f9
Compare
"Shuffle Shard Size 2 with 1 joining ruler": { | ||
shuffleShardSize: 2, | ||
tokensByRuler: map[string][]uint32{ | ||
"ruler1": append( | ||
[]uint32{userToken("user1", 0) + 1}, | ||
append( | ||
generateTokenForGroups(1, rules[0], rules[1]), | ||
generateTokenForGroups(3, rules[2])...)..., | ||
), | ||
"ruler2": append( // ruler2 is in JOINING state so will be ignored when distributing rules. | ||
[]uint32{userToken("user1", 1) + 1, userToken("user2", 0) + 1}, | ||
generateTokenForGroups(1, rules[2])..., | ||
), | ||
"ruler3": append( | ||
[]uint32{userToken("user2", 1) + 1, userToken("user3", 0) + 1}, | ||
generateTokenForGroups(1, rules[3], rules[4], rules[5], rules[6], rules[7], rules[8])..., | ||
), | ||
// While ruler4 registers some tokens for all users, it doesn't actually evaluate any rules. | ||
// ruler1 and ruler2 are selected for subring for user1 and user2, even though ruler2 is in the JOINING | ||
// state (this means all the rules are evaluated on ruler1). | ||
// This does form part of user3's subring but user3's single rule is evaluated on ruler3. | ||
"ruler4": append( | ||
[]uint32{userToken("user1", 2) + 1, userToken("user2", 2) + 1, userToken("user3", 1) + 1}, | ||
generateTokenForGroups(2, rules[2])..., | ||
), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to check if the result of this test case is expected - when getting the shuffle sharded subring for a user, non-ACTIVE rulers are not filtered out. Rules do end up being evaluated on the ACTIVE rulers in the subring though.
The issue I see is that it's possible during rollout or scaling for all the rulers in a shuffle sharded subring to be in a non-ACTIVE state, in which case all rules will not be evaluated for a user.
This was happening before my PR as well. This could be fixed (if it is an issue) by moving GetSubringForOperationStates(op)
before calling ShuffleShard()
on the ring, so the shuffle sharded ring is made from only the instances in the requested state. Would need to be careful to update the logic in all the places that gets the shuffled sharded subring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking exactly at this test and then I noticed your comment. I'm glad we both put our 👀 on this.
I agree this behaviour in related from your PR. Is it desired? It depends. Generally speaking, this is the expected behaviour of shuffle sharding in ingesters (where shuffle sharding has been introduced at the beginning): the goal is to not reshard series among different ingesters during a rolling update. I also think it's a desired property for the store-gateways shuffle sharding.
Is it desired for rulers? I don't know right now (I need to think about it more). Synching rules take time. Given rulers unregister from the ring during a restart, changing this logic would realistically just cover the time it takes for the initial sync (when the ruler is in JOINING state). Moving rule groups to the remaining ACTIVE ruler in the shard (if any) or to another ruler outside the original tenant's shard may not make much difference. The only real difference may be in case the shard size = 1 (if the shard is composed by the JOINING ruler, then no rules would be evaluated while JOINING). However, synching rules on other rules will take time too, so not sure would be a significant improvement changing this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love 1 line logic changes. Thanks!
a889411
to
3739b62
Compare
What this PR does
Currently,
forEachRulerInTheRing()
fails if there are rulers in the ring in states other than the states in the operation argument. This is becauseforEachRulerInTheRing()
callsGetReplicationSetForOperation()
. Any ruler in a non-operation state is seen as unhealthy byGetReplicationSetForOperation()
. As RF = 1, the number of healthy instances must match the total number of instances in the ring. ThereforeGetReplicationSetForOperation()
errors and that's propagated toforEachRulerInTheRing()
.However, it's possible for all rules to be processed (evaluated or synced) correctly even when there are "unhealthy" instances. This is because rules are only processed on healthy instances and are moved around rulers depending on the rulers that are healthy. This causes some endpoints to return errors more often than necessary.
As a concrete example,
api/v1/rules
and/api/v1/alerts
will return errors if there are any JOINING or LEAVING rulers in the ring. Rulers can be in these state during rollouts, or scaleups and scaledowns. These endpoints useGetRules()
which callsforEachRulerInTheRing()
withRuleEvalRingOp
.RuleEvalRingOp
only considers ACTIVE rulers as healthy.Instead of erroring when there are non-ACTIVE rulers in the ring, it's safe to ignore non-ACTIVE rulers in this case:
The ring can be eventually consistent (e.g. if using memberlist). Therefore rulers could transition between states and redistribute rules during the process of getting rules (e.g. ruler is JOINING but transitions to ACTIVE and rulers are redistributed). So it's possible to get a partial set of results if rulers in other states are ignored. But there are already similar cases happening now, like if a ruler is not present in the ring when we get the list of rulers to get rules from, but joins the ring and becomes active while getting rules.
To implement this, I've introduced a new ring function,
GetSubringForOperationStates()
in grafana/dskit#699. This is used to filter out the instances in the non-operation states (for theGetRules()
case, this will filter out all the non-ACTIVE instances) beforeGetReplicationSetForOperation()
is called. The effect is that GetReplicationSetForOperation()` will only error if there are instances in the operation states that have timed out heartbeats.I also considered using
GetAllHealthy()
, rather than introducing a new function. However,GetAllHealthy()
ignores out rulers that are in the requested states but have timed out heartbeats rather than than erroring. While rulers being in JOINING and LEAVING states are expected from time to time due to rollouts and scaling, if everything is working as expected, active rulers should not end up with heartbeat timeouts.Another alternative was to add retries to
forEachRulerInTheRing()
- we don't expect the ring to always have rulers in the non-operation states. However, in the case ofGetRules()
, rulers can take a long time to transition from JOINING to ACTIVE if there are many rules to load, so retries would either be ineffective or have high latency.As well as
GetRules()
,forEachRulerInTheRing()
is used bynotifySyncRules()
, which notifies all the JOINING and ACTIVE rulers to sync rules. Previously this would fail if any rulers were in the LEAVING state before notifying any ruler to sync rules. It's safe to not notify LEAVING rulers as they do not pick up new rules. This new behaviour ends up being an improvement as other rulers will still be notified when there are LEAVING rulers.Which issue(s) this PR fixes or relates to
Fixes #Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.