Skip to content

Conversation

@pdoerner
Copy link
Contributor

@pdoerner pdoerner commented Apr 1, 2025

What changed?
Added two new history events: NexusOperationCancelRequestCompleted and NexusOperationCancelRequestFailed

Why?
To support Nexus operation cancellations that should wait only until the cancellation has been delivered to the handler.

Breaking changes
No

Server PR

@pdoerner pdoerner requested review from a team as code owners April 1, 2025 22:33
Comment on lines +1059 to +1060
NexusOperationCancelRequestCompletedEventAttributes nexus_operation_cancel_request_completed_event_attributes = 61;
NexusOperationCancelRequestFailedEventAttributes nexus_operation_cancel_request_failed_event_attributes = 62;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these be marked as worker_may_ignore? I am curious about the backwards-compatible properties from an SDK POV.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will need to be

Comment on lines +1059 to +1060
NexusOperationCancelRequestCompletedEventAttributes nexus_operation_cancel_request_completed_event_attributes = 61;
NexusOperationCancelRequestFailedEventAttributes nexus_operation_cancel_request_failed_event_attributes = 62;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what happens on current server version on Nexus cancel requested? Will any changes we make server side still work with older SDKs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we record a NEXUS_OPERATION_CANCEL_REQUESTED event when we receive the command. Then we actually send the request to the handler. We do not record the operation as cancelled until the completion callback is invoked with a cancelled error. None of this behavior will change. We will just start recording a history event after the handler responds to the cancel request.

Copy link
Member

@cretz cretz Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so this means that we are effectively just sending more events that we usually would for the same situation, and if we properly mark as worker_may_ignore, we're good.

My only concern now is from the SDK side, how is an SDK that is supposed to wait for this event supposed to know it is never coming on an older server? So at least for Core-based SDKs, the default cancel type for activities is "try cancel" (i.e. fire and forget) and the default cancel type for child workflows is "wait cancel completed". I think Nexus operations are going to have to be "try cancel" by default to make "wait cancel completed" an opt-in behavior since it is not supported on all GA Nexus servers. In Go and Java, I think "try cancel" is effectively the default for activities, and "wait cancel requested" or "try cancel" is for child workflows, but unsure exactly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently SDKs are doing "wait cancel completed" which would continue to be the default. These new events are for supporting the "try cancel" case. When a newer SDK is talking to an older server, it will never receive these new events and will effectively fall back to "wait cancel completed". I think this behavior is okay, do you see an issue with it? I believe this is the same as how child workflows are handled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...

I think this behavior is okay, do you see an issue with it?

No issue if we make the SDK-side default of this new enum to be "wait cancel completed" which we've done for child workflows in Core-based SDKs, but never have done in Go and Java based on my quick reading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I misspoke. These events are for "wait cancel requested" and "try cancel" won't be waiting on anything from the server. So it is okay to make either the default but I still lean toward "wait cancel completed" since this is the current behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Whatever is the default today I think we need to make the default when we make it configurable in SDKs for compat reasons.

Copy link
Contributor

@rodrigozhou rodrigozhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit comments.

Comment on lines +960 to +961
// The `WORKFLOW_TASK_COMPLETED` event that the corresponding RequestCancelNexusOperation command was reported
// with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The `WORKFLOW_TASK_COMPLETED` event that the corresponding RequestCancelNexusOperation command was reported
// with.
// The `WORKFLOW_TASK_COMPLETED` event that the corresponding RequestCancelNexusOperation command was reported.

Comment on lines +968 to +969
// The `WORKFLOW_TASK_COMPLETED` event that the corresponding RequestCancelNexusOperation command was reported
// with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The `WORKFLOW_TASK_COMPLETED` event that the corresponding RequestCancelNexusOperation command was reported
// with.
// The `WORKFLOW_TASK_COMPLETED` event that the corresponding RequestCancelNexusOperation command was reported.

@pdoerner pdoerner merged commit eb80371 into master Apr 3, 2025
4 checks passed
@pdoerner pdoerner deleted the nexus-cancel-delivered-events branch April 3, 2025 15:55
pdoerner added a commit to temporalio/temporal that referenced this pull request Apr 8, 2025
## What changed?
<!-- Describe what has changed in this PR -->
Added support for the new `NexusOperationCancelRequestCompleted` and
`NexusOperationCancelRequestFailed` events which were added in
temporalio/api#564
These events are marked with `WorkerMayIgnore=true` for compatibility
with older SDKs.

## Why?
<!-- Tell your future self why have you made these changes -->
These events are used to support the new "wait cancellation requested"
Nexus operation cancellation type.

## How did you test it?
<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
Updated functional tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants