Skip to content

Conversation

@Shivs11
Copy link
Member

@Shivs11 Shivs11 commented Nov 13, 2025

What changed?

  • Add revision number mechanics for child workflows and CAN workflows.
  • Additionally, this PR introduces the following change conceptually: Child workflows of AutoUpgrade parents and CAN workflow executions of AutoUpgrade workflows have their first task as AutoUpgrade by default. Previously, it was unversioned.

Why?

  • Resolving eventual consistency issues.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

  • None since this feature is gated by a dc

Note

Adds inherited_auto_upgrade_info propagation and revision-number-based dispatch so child and continue-as-new workflows start with AutoUpgrade behavior when appropriate; updates matching/history logic and tests.

  • API/Proto:
    • Add StartWorkflowExecutionRequest.inherited_auto_upgrade_info (and getter) to carry deployment version and revision from parent/previous run.
  • History Service:
    • Populate/propagate InheritedAutoUpgradeInfo when starting child workflows and during ContinueAsNew; avoid when pinned inheritance applies or namespaces differ.
    • On start, set execution VersioningInfo (deployment, behavior AutoUpgrade) and revision number from inherited info.
    • Minor fixes: use request context for version checks; guard SetVersioningRevisionNumber against nil.
  • Matching:
    • Add test hook MatchingIgnoreRoutingConfigRevisionCheck and allow bypassing routing-config revision gate in updates.
    • Minor cleanup in partition manager; keep revision-based queue selection logic.
  • Tests:
    • Add/rename functional tests covering child/CAN inheritance, cross-TQ behavior, retry no-bounce with revision lag, and propagation helpers; introduce rollback utility for task queue user data.
  • Deps:
    • Bump go.temporal.io/api to latest prerelease.

Written by Cursor Bugbot for commit 8a342ab. This will update automatically on new commits. Configure here.

@Shivs11 Shivs11 changed the title Add revision number mechanics for child Add revision number mechanics for child and CAN workflows Nov 13, 2025
@Shivs11 Shivs11 marked this pull request as ready for review November 14, 2025 03:46
@Shivs11 Shivs11 requested review from a team as code owners November 14, 2025 03:46

subComponent1, err := testComponent.SubComponent1.Get(chasmContext)
s.NoError(err)
subComponent1 := testComponent.SubComponent1.Get(chasmContext)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer]: ignore this as I have another PR open.
#8699

Comment on lines 78 to +79
useRevisionNumbers bool
// TODO: this is always false. clean it up.
useNewDeploymentData bool
useNewDeploymentData bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two always have same values, we should just use one var. IMO useRevisionNumbers describes the flag better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sorry, I should have kept a comment there but basically, after this PR goes in, I am sending another one which will just deal with versioning_3_test refactors.

That PR shall remove this variable completely and replace wherever we are using this with useRevisionNumbers.

I didn't wanna clutter this PR since it's anyways kinda big and is making some fundamental changes.

})

t.Run("async_deployment", func(t *testing.T) {
t.Run("async_new_deployment_data", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: async comes with new deployment data anyway so not sure if we need to mention it I tried to avoid making test names so long.

}

// TODO (Shivam): I think one may need to slightly modify this now that we are actually "inheriting" the autoupgrade version.
func (s *Versioning3Suite) TestWorkflowRetry_Unpinned_ExpectNoInherit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, if the logic is changing in this PR, shouldn't we update the test at the same time? otherwise, it'd fail, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, that comment is misleading imo - what i meant to add there was change the name of this test since we are inheriting the version from the parent!!

What this test is testing will still pass with my changes. The reason being as follows:

This test is testing that a retry of an autoUpgrade workflow bounces "ahead" to the next current deployment version of the task-queue. This, in theory, shall also work with my changes since the retried workflow shall inherit the revision number (and deployment version) prior to the retry, but will always be scheduled on the current deployment version if it is more recent (in this case it will be since we wait for tv2 to be current prior to kickstarting the retry)

}

// Fallback of revision number mechanics: Independent activities are eventually consistent.
func (s *Versioning3Suite) TestActivityTQLags_IndependentActivityDispatchesToItsCurrentDeployment() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no point in having this imho

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that revision numbers cannot solve. It didn't make sense to have a test for something for revision numbers where it's a limitation and also this is not something which we wanna solve in the near future either. I think it was decided to let customers know that while running indepedant activities, one could expect some bouncing of tasks so it is important to keep activity defs compatible

Copy link
Contributor

@ShahabT ShahabT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get @carlydf approval for history side? she's more familiar with this part and also this is related to trampolining.

Copy link
Contributor

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I think a good chunk of repeated test code can be avoided if we just wait for propagation from inside the setCurrent helper function.

Also, I wonder if TestChildStartsWithParentRevision_SameTQ_TQLags, TestChildStartsWithNoInheritedAutoUpgradeInfo_CrossTQ, and TestChildStartsWithParentRevision_SameTQ_TQAhead could share more code. If it ends up being flaky or we want to fix the test, it's nice to only have to fix code in one place

}

// Fallback of revision number mechanics: Independent activities are eventually consistent.
func (s *Versioning3Suite) TestActivityTQLags_IndependentActivityDispatchesToItsCurrentDeployment() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

return resp.GetWorkerDeploymentInfo().GetRoutingConfigUpdateState() == enumspb.ROUTING_CONFIG_UPDATE_STATE_COMPLETED
}, 10*time.Second, 100*time.Millisecond)

// Update the userData for the workflow TQ *only* by setting the current version to v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait im confused, doesn't removing this completely change the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the diff is misleading!
This is comparing the diff line by line, and on these lines stood the test TestActivityTQLags_IndependentActivityDispatchesToItsCurrentDeployment which I am removing in this PR.

The diff you are seeing here is just showing the outdated test's content. This test that I am adding is brand new.

Copy link
Contributor

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant propagation checks

Status: enumspb.WORKER_DEPLOYMENT_VERSION_STATUS_CURRENT,
}}, []string{}, tqTypeWf, tqTypeAct)

// Wait until all task queue partitions know that v1 is current.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to comment this out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is gonna undergo some changes so these changes can be ignored.

@Shivs11 Shivs11 enabled auto-merge (squash) November 27, 2025 02:20
@Shivs11 Shivs11 merged commit ada80b8 into main Nov 27, 2025
96 of 98 checks passed
@Shivs11 Shivs11 deleted the ss/revision-number-for-child-CAN branch November 27, 2025 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants