Skip to content

Apply modern event loop algorithm with new SDK flag #432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 19, 2025

Conversation

cretz
Copy link
Member

@cretz cretz commented Mar 18, 2025

What was changed

(description as mostly taken from a comment above the SDK flag)

Before this flag, workflow async logic did the following (in no particular order):

  • Reorder jobs itself
  • Schedule primary workflow method when the initialize job was seen
  • Tick the event loop after each job processed
  • Check all conditions at once and resolve all that are satisfied

We have since learned that all of these are wrong in one way or another. So if this flag is set on the first activation/task of a workflow, the following logic now applies respectively:

  • Leave Core's job ordering as is
  • Do not schedule the primary workflow method until after all jobs have been applied and it's not already present.
  • Tick the event loop only once after everything applied
  • Only resolve the first condition that is satisfied and re-run the entire event loop

We know this solves at least #427 where all-handlers-finished was improperly assumed true too early.

Checklist

  1. Closes [Bug] Replay after UnhandledCommand can cause main workflow body to complete before signals are handled #427
  2. Closes [Feature Request] Align activation job application with TS changes #375

@cretz cretz requested a review from a team March 18, 2025 17:28
{
Apply(job);
// Run scheduler once. Do not check conditions when patching or querying.
var checkConditions = job.NotifyHasPatch == null && job.QueryWorkflow == null;
// We only run the scheduler after each job in legacy event loop logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We only run the scheduler after each job in legacy event loop logic
// We run the scheduler after each job in legacy event loop logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to say "we only do X in legacy" to clarify we don't do it in non-legacy

Copy link

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, small comments/changes.

and just to verify my understanding:
Check all conditions at once and resolve all that are satisfied.
->
Only resolve the first condition that is satisfied and re-run the entire event loop.

we're doing this so that we can give subsequent signals (i.e. if multiple signals are sent) a chance to be handled, in the case that resolved conditions from the first signal cause the workflow to complete/terminate/finish

@cretz
Copy link
Member Author

cretz commented Mar 19, 2025

and just to verify my understanding:
Check all conditions at once and resolve all that are satisfied.
->
Only resolve the first condition that is satisfied and re-run the entire event loop.

we're doing this so that we can give subsequent signals (i.e. if multiple signals are sent) a chance to be handled, in the case that resolved conditions from the first signal cause the workflow to complete/terminate/finish

There's a few reasons why you don't want to tick the event loop after each job. I should have linked this to #375 which I am doing now, but also see temporalio/sdk-python#606 and we have had internal discussions on this. Basically waking up the event loop mid-job-apply causes conditions and other things to be confusingly applied too early. It becomes hard to reason about when a coroutine in a handler may run. This also aligns the event loop with other SDKs (in addition to fixing a few other things).

@THardy98
Copy link

THardy98 commented Mar 19, 2025

There's a few reasons why you don't want to tick the event loop after each job. I should have linked this to #375 which I am doing now, but also see temporalio/sdk-python#606 and we have had internal discussions on this. Basically waking up the event loop mid-job-apply causes conditions and other things to be confusingly applied too early. It becomes hard to reason about when a coroutine in a handler may run. This also aligns the event loop with other SDKs (in addition to fixing a few other things).

Yeah I found these issues trying to gather some more context on this, very helpful. In effect, this boils down to predictable handler behavior (at least wrt to the principle bug fix). Thanks for the explanation :)

@cretz cretz merged commit bac42d3 into temporalio:main Mar 19, 2025
8 checks passed
@cretz cretz deleted the async-approach-fix branch March 19, 2025 19:10
@vchirikov
Copy link

Hi @cretz , when it will be released?

@cretz
Copy link
Member Author

cretz commented Mar 31, 2025

Next .NET SDK release, but we do not have a firm date at this time, should not be too long

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants