-
Notifications
You must be signed in to change notification settings - Fork 820
Add jitter to HA deduping heartbeats (#1543) #1748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you elaborate which specific problem you're addressing in this PR? I've checked out the linked #1543 but I'm not sure I'm following it. Thanks! |
pkg/distributor/ha_tracker.go
Outdated
@@ -100,6 +102,10 @@ func (cfg *HATrackerConfig) RegisterFlags(f *flag.FlagSet) { | |||
"distributor.ha-tracker.update-timeout", | |||
15*time.Second, | |||
"Update the timestamp in the KV store for a given cluster/replica only after this amount of time has passed since the current stored timestamp.") | |||
f.DurationVar(&cfg.UpdateTimeoutJitter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to enforce that FailoverTimeout > UpdateTimeout+UpdateTimeoutJitter
by at least time.Second
. Check in newClusterTracker
where we already enforce that Failover > Update
.
pkg/distributor/ha_tracker.go
Outdated
// Adjust update timeout slightly to spread heartbeats out over time with +/- UpdateTimeoutJitter | ||
var jitter time.Duration | ||
if c.cfg.UpdateTimeoutJitter != 0 { | ||
jitter = time.Duration(rand.Int63n(int64(2*c.cfg.UpdateTimeoutJitter))) - c.cfg.UpdateTimeoutJitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we might be able to set the jitter for the distributor instead of per request. At least, I'd like to see what the write load reduction is doing it that way. @gouthamve thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, final nits :)
Also please update the changelog with ENHANCEMENT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of tiny comments I would be glad you address before merging it.
pkg/distributor/ha_tracker.go
Outdated
} | ||
|
||
if cfg.FailoverTimeout <= cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax+time.Second { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place to validate the config. Please add func (cfg *HATrackerConfig) Validate() error
and move the validation there (should return an error on validation failure, nil
otherwise), add the Validate()
function to distributor.Config
(this should call HATrackerConfig.Validate()
) and the call the distributor.Config.Validate()
method from the Validate()
function in cortex.go
(looks complex, but it's a very easy chain of Validate()
calls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im okay with making this change - however wondering if it would make more sense to abstract this out as we add more validations. right now it is only validating a single field. but yea, lmk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get stick to the PR's scope. Config validation support has been introduced very recently and will be adopted more over the time. My suggestion is to validate only the new config option introduced with this PR, while we'll improve the rest of config over the time.
pkg/distributor/ha_tracker.go
Outdated
} | ||
|
||
if cfg.FailoverTimeout <= cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax+time.Second { | ||
return nil, fmt.Errorf("HA Tracker failover timeout must be 1s greater than update timeout, %d is <= %d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make the message more clear this way:
fmt.Errorf("HA Tracker failover timeout (%v) must be at least 1s greater than update timeout (%v)", cfg.FailoverTimeout, cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax)
Moreover, given the error message, the check should be <
instead of <=
, so:
if cfg.FailoverTimeout < cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax+time.Second {
pkg/distributor/ha_tracker.go
Outdated
if cfg.FailoverTimeout <= cfg.UpdateTimeout { | ||
return nil, fmt.Errorf("HA Tracker failover timeout must be greater than update timeout, %d is <= %d", cfg.FailoverTimeout, cfg.UpdateTimeout) | ||
var jitter time.Duration | ||
if cfg.UpdateTimeoutJitterMax != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better > 0
for safety.
Signed-off-by: adnan <adnan.khan021@gmail.com>
addresses #1534, adds a new
distributor.ha-tracker.update-timeout-jitter
config to add a +/- jitter to the UpdateTimeout for the HA dedupe heartbeat.