Skip to content

Update graceful worker shutdown #627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 14, 2025

Conversation

yuandrew
Copy link
Contributor

@yuandrew yuandrew commented May 28, 2025

What was changed

Updated existing graceful worker shutdown spec

Why?

Existing spec was outdated

Checklist

  1. Closes [Feature Request] Write spec for graceful worker shutdown #261

  2. How was this tested:
    Tests will be added in a separate Issue/PR, see parent issue for more details

  3. Any docs updates needed?

a cancel. It must be possible for the user to determine if this cancel was issued
by server, or is the result of worker shutdown
* When a worker shutdown is initiated, the activity context isn't canceled until shutdown
has succeeded or the shutdown timeout has elapsed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should call this "graceful shutdown timeout" so it's not confused with some actual timeout for shutting down?

Comment on lines 13 to 14
* When a worker shutdown is initiated, the activity context isn't canceled until shutdown
has succeeded or the shutdown timeout has elapsed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to clarify that this shutdown timeout defaults as 0/no-timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea to clarify this

* It must be possible for the user to determine whether the activity context cancel was the result of worker shutdown,
or issued by server
* TODO: Java - need to add a way for Activities to know when the worker is being shutdown.
* TODO: Typescript - figure out what happens in TS
* Activities may handle the cancel however they like (including continuing running)
* Heartbeating is possible while the shutdown process is ongoing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heartbeating may notify the activity the worker is shutting down, but there should be an option to continue heartbeating will the graceful shutdown process is ongoing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for Core-based SDKs, heartbeating does not notify about the grace period, that is delivered in a separate manner, and heartbeating continues to work regardless of activity status (during or after grace period) IIUC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is why I put the "may" in there

* It must be possible for the user to determine whether the activity context cancel was the result of worker shutdown,
or issued by server
* TODO: Java - need to add a way for Activities to know when the worker is being shutdown.
* TODO: Typescript - figure out what happens in TS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS: The activity will receive a cancellation with message 'WORKER_SHUTDOWN'. See this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

* When a worker shutdown is initiated, the activity context isn't canceled until the
graceful shutdown timeout has elapsed
* Graceful shutdown language behavior
* Core based SDKs - when graceful shutdown timeout isn't specified, that is treated as no-timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that Core defaults to no-timeout.

However, TS actually defaults the graceful shutdown timeout to 0. From a quick look, it seems like Python, .Net and Ruby are all doing the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No timeout and 0 are effectively the same I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite, if you remember from a few weeks (or months ago), Spencer mentioned when graceful_shutdown_period is not set, there is no timeout, so cancels are never issued to activities.

But sounds like from James looking at the core SDKs, each SDK will instead send 0 when the language specific timeout isn't set, meaning no SDK actually has default == no-timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we only made that one optional for backward compatibility with TS back then, because the TS Worker was emitting activity cancellation by itself and we didn't want to shake the TS worker's logic at that moment. So we instead made it possible for the TS SDK to disable Core's activity cancellation on shutdown.

TS is now relying on Core for that behavior, just like other SDKs, so I think it would be safe to make that option mandatory in Core's worker config. Definitely not high priority and very low impact, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 So from a user POV, "no timeout and 0 are effectively the same"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if "no timeout" here means not specifying "graceful shutdown", then yes

* It must be possible for an activity to determine that a worker is being shut down even before graceful timeout elapses
* It must be possible for the user to determine whether the activity context cancel was the result of worker shutdown,
or issued by server
* TODO: Java - need to add a way for Activities to know when the worker is being shutdown.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Quinn-With-Two-Ns can you remind me, is this still a TODO? I can't remember when/where you last mentioned this.

If it's still true, I can create an GH issue and link the issue to this TODO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think temporalio/sdk-java#1005 (comment) covers this too

@yuandrew yuandrew marked this pull request as ready for review July 8, 2025 22:55
@yuandrew yuandrew enabled auto-merge (squash) July 14, 2025 17:21
@yuandrew yuandrew merged commit c9d352c into temporalio:main Jul 14, 2025
21 of 22 checks passed
@yuandrew yuandrew deleted the graceful-worker-shutdown branch July 14, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Write spec for graceful worker shutdown
4 participants