-
Notifications
You must be signed in to change notification settings - Fork 16
Make timer task non-blocking #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make timer task non-blocking #161
Conversation
cgillum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the new implementation for createTimer, it appears to me that we are creating multiple timers at once, each with an increasing duration. For example, if the max timer duration is 3 days, and a user schedules a timer for 10 days, we'll create 4 timers all at once:
- Fires in 3 days
- Fires in 6 days
- Fires in 9 days
- Fires in 10 days
Am I understanding the implementation correctly, and is this the intended design?
|
Thank you for pointing that out @cgillum. I see how creating the timers in advance would not work since that would mean creating timers that are long. I will work on fixing this implementation. Thank you. |
|
@cgillum @davidmrdavid @kaibocai @shreyas-gopalakrishna Can you please help me review this PR whenever you get a chance? Thank you! |
cgillum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change looks good! I'd love to get another person to double-check for correctness.
I do wonder if we should update the existing longTimer test to help us confirm the timing of the timers? For example, the test currently uses an AtomicInteger counter to ensure that we get the right number of timer-fired events. Should we enhance this test to also keep track of the timing of each of these events to ensure that each one comes in increments of 3 seconds?
kaibocai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you!
|
Good point @cgillum! I just enhanced the |
davidmrdavid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me! I just have a question about why we're using System.nanoTime() in the orchestration instead of the "currentUtcDateTime` API.
| CompletableTask<Void> firstTimer = createTimerTask(finalFireAt); | ||
| CompletableFuture<Void> timerChain = createTimerChain(finalFireAt, firstTimer.future); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we add some comments here explaining how the timerChain behaves in both the short timer and long timer cases?
| AtomicReferenceArray<Long> timestamps = new AtomicReferenceArray<>(4); | ||
| DurableTaskGrpcWorker worker = this.createWorkerBuilder() | ||
| .addOrchestrator(orchestratorName, ctx -> { | ||
| timestamps.set(counter.get(), System.nanoTime()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused as to why System.nanoTime() works here. Shouldn't we be obtaining deterministic timestamps from the ctx object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I re-read the test and this makes sense. You're measuring the time between replays, not the deterministic time.
Issue describing the changes in this PR
Resolves #136
Cause: When timers are longer than 3 days, we break them up into smaller timers that are 3-day long. However, there was a bug in this logic where we were blocking execution for those smaller timers. This explains why when we have an
anyOfstatement between an external event and a timer longer than 3 days, the external event is missed - we were still just waiting for that long timer to complete.Solution: We have implemented the timer logic to be asynchronous.
Pull request checklist
CHANGELOG.md