Skip to content

Add jitter to HA deduping heartbeats (#1543) #1748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2019
Merged

Add jitter to HA deduping heartbeats (#1543) #1748

merged 1 commit into from
Nov 8, 2019

Conversation

adnxn
Copy link
Contributor

@adnxn adnxn commented Oct 22, 2019

addresses #1534, adds a new distributor.ha-tracker.update-timeout-jitter config to add a +/- jitter to the UpdateTimeout for the HA dedupe heartbeat.

@pracucci
Copy link
Contributor

Can you elaborate which specific problem you're addressing in this PR? I've checked out the linked #1543 but I'm not sure I'm following it.

Thanks!

@adnxn
Copy link
Contributor Author

adnxn commented Oct 23, 2019

@pracucci: oops, typo. meant #1534.

@@ -100,6 +102,10 @@ func (cfg *HATrackerConfig) RegisterFlags(f *flag.FlagSet) {
"distributor.ha-tracker.update-timeout",
15*time.Second,
"Update the timestamp in the KV store for a given cluster/replica only after this amount of time has passed since the current stored timestamp.")
f.DurationVar(&cfg.UpdateTimeoutJitter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to enforce that FailoverTimeout > UpdateTimeout+UpdateTimeoutJitter by at least time.Second. Check in newClusterTracker where we already enforce that Failover > Update.

// Adjust update timeout slightly to spread heartbeats out over time with +/- UpdateTimeoutJitter
var jitter time.Duration
if c.cfg.UpdateTimeoutJitter != 0 {
jitter = time.Duration(rand.Int63n(int64(2*c.cfg.UpdateTimeoutJitter))) - c.cfg.UpdateTimeoutJitter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we might be able to set the jitter for the distributor instead of per request. At least, I'd like to see what the write load reduction is doing it that way. @gouthamve thoughts?

@gouthamve
Copy link
Contributor

Hey @adnxn, so this is pretty good, and almost done!

But as @cstyan mentioned, best to instantiate the amount of jitter when we create the distributor itself and not call rand() per request.
With the above change and validating config (as suggested by Callum), I think this is good to go!

Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, final nits :)

Also please update the changelog with ENHANCEMENT

@adnxn adnxn marked this pull request as ready for review November 5, 2019 16:33
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of tiny comments I would be glad you address before merging it.

}

if cfg.FailoverTimeout <= cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax+time.Second {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place to validate the config. Please add func (cfg *HATrackerConfig) Validate() error and move the validation there (should return an error on validation failure, nil otherwise), add the Validate() function to distributor.Config (this should call HATrackerConfig.Validate()) and the call the distributor.Config.Validate() method from the Validate() function in cortex.go (looks complex, but it's a very easy chain of Validate() calls).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im okay with making this change - however wondering if it would make more sense to abstract this out as we add more validations. right now it is only validating a single field. but yea, lmk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get stick to the PR's scope. Config validation support has been introduced very recently and will be adopted more over the time. My suggestion is to validate only the new config option introduced with this PR, while we'll improve the rest of config over the time.

}

if cfg.FailoverTimeout <= cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax+time.Second {
return nil, fmt.Errorf("HA Tracker failover timeout must be 1s greater than update timeout, %d is <= %d",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the message more clear this way:

fmt.Errorf("HA Tracker failover timeout (%v) must be at least 1s greater than update timeout (%v)", cfg.FailoverTimeout, cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax)

Moreover, given the error message, the check should be < instead of <=, so:

if cfg.FailoverTimeout < cfg.UpdateTimeout+cfg.UpdateTimeoutJitterMax+time.Second {

if cfg.FailoverTimeout <= cfg.UpdateTimeout {
return nil, fmt.Errorf("HA Tracker failover timeout must be greater than update timeout, %d is <= %d", cfg.FailoverTimeout, cfg.UpdateTimeout)
var jitter time.Duration
if cfg.UpdateTimeoutJitterMax != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better > 0 for safety.

Signed-off-by: adnan <adnan.khan021@gmail.com>
@bboreham bboreham merged commit 492cc3b into cortexproject:master Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants