-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JitterDelay option when creating workflows. #4886
Conversation
// Add a random jitter to start time, if requested. | ||
jitterStartSeconds := startRequest.GetJitterStartSeconds() | ||
if jitterStartSeconds > 0 { | ||
firstDecisionTaskBackoffSeconds += rand.Int31n(jitterStartSeconds + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rand has a seed value somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, not sure that it does. Be careful to call rand.Seed(time.Now().UnixNano())
someplace lest you get a surprisingly deterministic result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - turns out we already seed rand in host/service.go for every service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries with this diff as it's self-evident what's going on, but I'd normally suggest pulling the IDL changes into a separate diff/PR so as to not obscure any changes
common/util.go
Outdated
@@ -530,6 +530,12 @@ func CreateHistoryStartWorkflowRequest( | |||
firstDecisionTaskBackoffSeconds += delayStartSeconds | |||
} | |||
|
|||
// Add a random jitter to start time, if requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jitter should also apply to each occurrence of a workflow when there is a cron schedule. Otherwise, only the first run of a cron workflow will benefit from the jitter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is now working - see my latest commit.
Tested new cron changes via: (in cadence-samples) bin/helloworld -m worker And noting that the workflow doesn't run every 10 seconds, but rather runs between 10-15 seconds every 10 seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check if we have a test for next cron start? Based on the code it looks like next cron schedule should consider jitter again (expected behavior) as opposed to delay_start not being considered (expected behavior).
If this is easy to test, we should test; it can simply be breakable with a code change later.
|
||
If you make changes in the idls submodule and want to test them locally, you can easily do that by using go mod to use the local idls directory instead of github.com/uber/cadence-idl. Temporarily add the following to the bottom of go.mod: | ||
|
||
```replace github.com/uber/cadence-idl => ./idls``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return roundedInterval | ||
|
||
var jitter time.Duration | ||
if jitterStartSeconds > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be good to validate jitterStartSeconds somewhere to make sure its max value is lower than the time till next schedule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, we should return an error when it's negative.
common/backoff/cron_test.go
Outdated
{"* * * * *", "2018-12-17T08:00:00+00:00", 10, time.Second * 60}, | ||
{"* * * * *", "2018-12-17T08:00:10+00:00", 30, time.Second * 50}, | ||
{"* * * * *", "2018-12-17T08:00:25+00:00", 5, time.Second * 35}, | ||
{"* * * * *", "2018-12-17T08:00:59+00:00", 45, time.Second * 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try 60+ here and expect error
startTime time.Time, | ||
closeTime time.Time, | ||
jitterStartSeconds int32, | ||
) time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are moving the jitter to this function, you might want to change the return type to (error, time.Duration)
@@ -2084,6 +2085,10 @@ func (wh *WorkflowHandler) StartWorkflowExecution( | |||
return nil, wh.error(errInvalidDelayStartSeconds, scope, tags...) | |||
} | |||
|
|||
if startRequest.GetJitterStartSeconds() < 0 { | |||
return nil, wh.error(errInvalidJitterStartSeconds, scope, tags...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if startRequest.GetJitterStartSeconds() < 0 { | ||
return nil, wh.error(errInvalidJitterStartSeconds, scope, tags...) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add the jitter < cron_interval
check below
* Add JitterDelay option when creating workflows. * Update idls * getting jitter to work with cron * Update IDL Co-authored-by: David Porter <david.porter@uber.com>
What changed?
Added a JitterStart parameter for the customer to specify a # of seconds to jitter the start of their workflow.
Why?
When customers create a large # of workflows and specify they should all start at the same time, it can cause a thundering herd problem. This parameter allows the start time to be jittered so that the large # of workflows can be started at slightly different times.
How did you test it?
Unit test and tested manually via CLI:
./cadence --domain samples-domain workflow start --tl helloWorldGroup --wt main.helloWorldWorkflow --et 30 --input '"Jittered"' --jitter_start_seconds 5
Potential risks
Release notes
Documentation Changes