Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(workflow): Workflow Activation Encoder was discarding SDK flags #1530

Merged
merged 9 commits into from
Sep 25, 2024

Conversation

mjameswh
Copy link
Contributor

@mjameswh mjameswh commented Sep 21, 2024

What was changed

  • The logic that applies payload encoding on Workflow Activation Completions was missing some object spread operators, causing lang's SDK flags to be discarded. This PR fixes that logic.

  • Introduced a mechanism that allows retroactively enabling SDK Flags on Workflow Task being replayed, based on the SDK version that produced that originally processed the WFT.

Why?

  • Due to SDK flags not being properly recorded to the Workflow Task, this bug would result in Non Determinism Errors on attempts to replay a Workflow where the presence or absence of a flag would effectively impact subsequent commands produced by the Workflow.

  • The most common symptoms of this issue would be an NDE that involves:

    • A TimerCanceled command; or
    • A change in the ordering of some commands.
  • Problematic Workflow Tasks will have been produced specifically by either TS SDK 1.11.0 or 1.11.1. Similar symptoms on Workflow Tasks emitted by other releases of the TS SDK would not be related to this bug.

  • Given that existing Workflow Histories may already be affected by this omission, there's a high probability that updating Workers to an SDK release that fixes this issue would cause storms of NDEs errors, for which the only way forward would be to reset or terminate those workflows. To avoid that situation, we introduced a mechanism to retroactively enable the affected SDK Flags on WFT emitted by SDK 1.11.0 or 1.11.1.

Fixes #1533.

@mjameswh mjameswh requested a review from a team as a code owner September 21, 2024 18:43
@ikonst
Copy link
Contributor

ikonst commented Sep 23, 2024

Consider adding a comment to Activator.hasFlag to describe how flags work in that regard:

+   // During replays, sdkMetadata should dictate flags, as to respect their state in previous runs
    if (!this.info.unsafe.isReplaying && flag.default) {
      this.knownFlags.add(flag.id);
      return true;
    }

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little unfortunate to have to look at version strings, but, makes sense to me.

packages/workflow/src/flags.ts Outdated Show resolved Hide resolved
packages/workflow/src/internals.ts Outdated Show resolved Hide resolved
@mjameswh mjameswh merged commit e91df70 into temporalio:main Sep 25, 2024
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] NDE while replaying Workflows in 1.11.1 with same code
3 participants