-
Notifications
You must be signed in to change notification settings - Fork 820
add configurable final-sleep time for basic lifecycler #5517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add configurable final-sleep time for basic lifecycler #5517
Conversation
a5bb4d0
to
9f603aa
Compare
pkg/storegateway/gateway_ring.go
Outdated
@@ -109,6 +111,8 @@ func (cfg *RingConfig) RegisterFlags(f *flag.FlagSet) { | |||
f.DurationVar(&cfg.WaitStabilityMinDuration, ringFlagsPrefix+"wait-stability-min-duration", time.Minute, "Minimum time to wait for ring stability at startup. 0 to disable.") | |||
f.DurationVar(&cfg.WaitStabilityMaxDuration, ringFlagsPrefix+"wait-stability-max-duration", 5*time.Minute, "Maximum time to wait for ring stability at startup. If the store-gateway ring keeps changing after this period of time, the store-gateway will start anyway.") | |||
|
|||
f.DurationVar(&cfg.FinalSleep, ringFlagsPrefix+"final-sleep", 0*time.Second, "The sleep seconds when store-gateway is shutting down.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this has default value 0s but other components have 30s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because SG we have replications, even we don't sleep and other pods still think this SG is good, it is okay. We will only get one error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot make such assumptions. Store gateway can still run with replication factor set to 1 if configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, sure. Set the default to zero for SG, Ruler, and Alertmanager, so it is upto the user to make the right decision for the final delay.
If they don't have HA for ruler,alertmanager and decided to use DDB KV, then the delay need to be larger than DDB puller-sync-time.
If they have HA for ruler,alertmanger, then this finalSleep is not needed. They can afford one error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document this properly if we add this flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should I put the explanation. Kind of long. Tried to add some comments on the flag description. But don't think that is the good place to put long explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recommendation
"The sleep seconds wait before pod shutdown allowing a delay to propagate ring changes."
@wenxu1024 we need to add this flag in the changelog. There you can explain better. Maybe that is what @yeya24 was referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can document at https://cortexmetrics.io/docs/configuration/v1guarantees/
@@ -79,6 +81,7 @@ func (cfg *RingConfig) RegisterFlags(f *flag.FlagSet) { | |||
cfg.KVStore.RegisterFlagsWithPrefix(rfprefix, "alertmanagers/", f) | |||
f.DurationVar(&cfg.HeartbeatPeriod, rfprefix+"heartbeat-period", 15*time.Second, "Period at which to heartbeat to the ring. 0 = disabled.") | |||
f.DurationVar(&cfg.HeartbeatTimeout, rfprefix+"heartbeat-timeout", time.Minute, "The heartbeat timeout after which alertmanagers are considered unhealthy within the ring. 0 = never (timeout disabled).") | |||
f.DurationVar(&cfg.FinalSleep, rfprefix+"final-sleep", 30*time.Second, "The sleep seconds when alertmanager is shutting down.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this flag useful mainly for migration between Memberlist to DDB? And not used in other cases?
If it is only useful for a specific scenario, I would prefer to set the default value to 0s, which doesn't change existing behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for memberlist this effect is small, because the information can reach other nodes within 2 secs. But for DDB, it takes 30s, we need to sleep to wait other pods know this pod is bad.
@danielblando what do you think? A default zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have other ring implementations, too. Not only DDB and memberlist.
The value should take all as considerations.
To not break existing cases, we should do 0s and documenting what's the recommended value to set for each ring store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before. Just set default to 0. So default behavior is same as before.
Signed-off-by: Wen Xu <wenxuamz@amazon.com>
Signed-off-by: Wen Xu <wenxuamz@amazon.com>
…tore Signed-off-by: Wen Xu <wenxuamz@amazon.com>
Signed-off-by: Wen Xu <wenxuamz@amazon.com>
a0d6547
to
0045d7a
Compare
Signed-off-by: Wen Xu <wenxuamz@amazon.com>
0045d7a
to
94bcd70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge on green. Thanks @wenxu1024
What this PR does:
Add a configurable final sleep to the ruler/alertmanager/store-gateway lifecycler. This is to overcome the information delay in the ring if DDB KV is used as the backend.
e.g Ruler-1 is stopping and set it self to LEAVING. The information takes about 30s to reach other ruler, within in this 30s, if other ruler takes a GetRules call, it will iterate all the rulers that it thinks are healthy. It will wrongfully think the LEAVING ruler is healthy. But in reality, the LEAVING ruler already shutdown itself (because we don't have any delay). Then the GetRules will fail (because we don't have HA, and we can not tolerate a single ruler failure)
The solution is to add a delay when the ruler set itself to LEAVING, during this LEAVING time, if other ruler send the request. This LEAVING ruler should be able to handle the request. After other ruler realized this ruler is LEAVING, they will stop to send further request to the LEAVING ruler. So we will be okay from then on.
Similar idea as in the lifeCycler, when the ingester is shutting itself down. I can still do the processShutdown duties and final sleep for 30s to scrape the metrics.
Version 2:
Found out that if we sleep right after the instance is set to LEAVING. Then during this sleep time, other instances know this instance is LEAVING (but this instance is still in the ring), then we have another problem. https://github.com/cortexproject/cortex/blob/master/pkg/ring/ring.go#L556
So the error just changed from
unable to retrieve rules from ruler
totoo many unhealthy instances in the ring
This is not what we wanted.
So, we propose to sleep after the instance is removed from the Ring. Then next heartbeat of the other instances will see this instance is removed. Then we won't have the
too many unhealthy instances in the ring
This is a workaround for components that are still not HA yet (eg. ruler)
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]