Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle applyParentClose target domain failover #4533

Merged
merged 5 commits into from
Oct 16, 2021

Conversation

demirkayaender
Copy link
Contributor

@demirkayaender demirkayaender commented Oct 4, 2021

What changed?
With the previous implementation, there was a race condition where we could skip sending parentClosePolicy to one of the cross cluster children if that child was failing over at the same time. This diff ensures that we will track all the children and retry applyParentClosePolicy task if this happens.

In a later diff, I'm planning to split processCloseExecution into 3 functions as processCloseExecution, recordChildCompletion and applyParentPolicy; then, planning to create 2 new transfer tasks for the 2 new later functions. This will help us retry only the failed parts of closeExecutions instead of retrying the whole thing. processCloseExecution will still call the other 2 as it's doing today.

Depends on cadence-workflow/cadence-idl#89

Why?
Without this diff, we might leak some child workflows or have unintended child workflows running.

How did you test it?
Unit tests.

Potential risks

Release notes

Documentation Changes

schema/cassandra/cadence/schema.cql Outdated Show resolved Hide resolved
schema/cassandra/cadence/versioned/v0.33/domain_list.cql Outdated Show resolved Hide resolved
@@ -585,6 +586,7 @@ type (

// ApplyParentClosePolicyTask identifies a task for applying parent close policy
ApplyParentClosePolicyTask struct {
DomainList []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to also add this field to CloseExecutionTransfer task? If some children's domains are active in the current cluster, then we can record them and only process child workflows from those domains when executing the transfer task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that will be in a separate PR, right now I'm just fixing the failover issue we talked about so source won't skip any child.

common/persistence/dataManagerInterfaces.go Outdated Show resolved Hide resolved
common/persistence/dataManagerInterfaces.go Outdated Show resolved Hide resolved
service/history/execution/mutable_state_task_generator.go Outdated Show resolved Hide resolved
@demirkayaender demirkayaender force-pushed the applyParent branch 4 times, most recently from aaf7aae to ec8173f Compare October 14, 2021 02:11
@coveralls
Copy link

coveralls commented Oct 14, 2021

Pull Request Test Coverage Report for Build 0333f658-228e-4722-bf52-1f7fa43436dc

  • 46 of 82 (56.1%) changed or added relevant lines in 11 files are covered.
  • 107 unchanged lines in 21 files lost coverage.
  • Overall coverage increased (+0.03%) to 56.85%

Changes Missing Coverage Covered Lines Changed/Added Lines %
common/persistence/nosql/nosqlplugin/cassandra/workflowUtils.go 1 2 50.0%
common/persistence/sql/sqlExecutionStore.go 1 2 50.0%
common/persistence/nosql/nosqlplugin/cassandra/workflowParsingUtils.go 5 7 71.43%
common/persistence/persistence-tests/persistenceTestBase.go 0 2 0.0%
service/history/task/transfer_active_task_executor.go 7 9 77.78%
common/persistence/dataManagerInterfaces.go 1 4 25.0%
common/persistence/nosql/nosqlExecutionStoreUtil.go 0 3 0.0%
common/persistence/serialization/getters.go 5 8 62.5%
common/persistence/sql/sqlExecutionStoreUtil.go 0 4 0.0%
common/persistence/persistence-tests/executionManagerTest.go 0 15 0.0%
Files with Coverage Reduction New Missed Lines %
common/persistence/sql/sqlExecutionStoreUtil.go 1 68.47%
service/history/queue/timer_queue_processor_base.go 1 79.5%
service/matching/matcher.go 1 91.06%
client/history/client.go 2 40.32%
client/history/metricClient.go 2 45.94%
common/persistence/executionManager.go 2 76.51%
common/persistence/statsComputer.go 2 96.43%
common/task/fifoTaskScheduler.go 2 87.63%
service/history/execution/mutable_state_builder.go 2 69.77%
service/history/handler.go 2 48.32%
Totals Coverage Status
Change from base Build fbf37564-3fa7-4a18-a162-dbe1ae1e90ca: 0.03%
Covered Lines: 81464
Relevant Lines: 143296

💛 - Coveralls

common/persistence/dataManagerInterfaces.go Outdated Show resolved Hide resolved
schema/cassandra/cadence/schema.cql Outdated Show resolved Hide resolved
common/persistence/serialization/interfaces.go Outdated Show resolved Hide resolved
service/history/task/transfer_active_task_executor.go Outdated Show resolved Hide resolved
common/persistence/dataManagerInterfaces.go Outdated Show resolved Hide resolved
@demirkayaender demirkayaender merged commit d9e5003 into cadence-workflow:master Oct 16, 2021
@demirkayaender demirkayaender deleted the applyParent branch October 16, 2021 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants