Skip to content

Conversation

@kamperiadis
Copy link
Contributor

Issue describing the changes in this PR

Resolves #136

Cause: When timers are longer than 3 days, we break them up into smaller timers that are 3-day long. However, there was a bug in this logic where we were blocking execution for those smaller timers. This explains why when we have an anyOf statement between an external event and a timer longer than 3 days, the external event is missed - we were still just waiting for that long timer to complete.
Solution: We have implemented the timer logic to be asynchronous.

Pull request checklist

  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes are added to the CHANGELOG.md
  • I have added all required tests (Unit tests, E2E tests)

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the new implementation for createTimer, it appears to me that we are creating multiple timers at once, each with an increasing duration. For example, if the max timer duration is 3 days, and a user schedules a timer for 10 days, we'll create 4 timers all at once:

  1. Fires in 3 days
  2. Fires in 6 days
  3. Fires in 9 days
  4. Fires in 10 days

Am I understanding the implementation correctly, and is this the intended design?

@kaibocai kaibocai requested a review from davidmrdavid August 21, 2023 16:07
@kamperiadis
Copy link
Contributor Author

Thank you for pointing that out @cgillum. I see how creating the timers in advance would not work since that would mean creating timers that are long. I will work on fixing this implementation. Thank you.

@kamperiadis kamperiadis changed the title Make timer task async Make timer task non-blocking Sep 5, 2023
@kamperiadis
Copy link
Contributor Author

@cgillum @davidmrdavid @kaibocai @shreyas-gopalakrishna Can you please help me review this PR whenever you get a chance? Thank you!

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change looks good! I'd love to get another person to double-check for correctness.

I do wonder if we should update the existing longTimer test to help us confirm the timing of the timers? For example, the test currently uses an AtomicInteger counter to ensure that we get the right number of timer-fired events. Should we enhance this test to also keep track of the timing of each of these events to ensure that each one comes in increments of 3 seconds?

Copy link
Member

@kaibocai kaibocai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

@kamperiadis
Copy link
Contributor Author

Good point @cgillum! I just enhanced the longTimer test as per your suggestion. Thank you.

Copy link
Member

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me! I just have a question about why we're using System.nanoTime() in the orchestration instead of the "currentUtcDateTime` API.

Comment on lines +945 to +946
CompletableTask<Void> firstTimer = createTimerTask(finalFireAt);
CompletableFuture<Void> timerChain = createTimerChain(finalFireAt, firstTimer.future);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add some comments here explaining how the timerChain behaves in both the short timer and long timer cases?

AtomicReferenceArray<Long> timestamps = new AtomicReferenceArray<>(4);
DurableTaskGrpcWorker worker = this.createWorkerBuilder()
.addOrchestrator(orchestratorName, ctx -> {
timestamps.set(counter.get(), System.nanoTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused as to why System.nanoTime() works here. Shouldn't we be obtaining deterministic timestamps from the ctx object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, I re-read the test and this makes sense. You're measuring the time between replays, not the deterministic time.

@kamperiadis kamperiadis merged commit b78297b into microsoft:main Sep 7, 2023
@kamperiadis kamperiadis deleted the kamperiadis/nonBlockingTimer branch September 7, 2023 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orchestrator function doesn't wait for external event if timer is more than 3days

4 participants