-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle applyParentClose target domain failover #4533
Handle applyParentClose target domain failover #4533
Conversation
3e735a6
to
b8f0af8
Compare
@@ -585,6 +586,7 @@ type ( | |||
|
|||
// ApplyParentClosePolicyTask identifies a task for applying parent close policy | |||
ApplyParentClosePolicyTask struct { | |||
DomainList []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to also add this field to CloseExecutionTransfer task? If some children's domains are active in the current cluster, then we can record them and only process child workflows from those domains when executing the transfer task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that will be in a separate PR, right now I'm just fixing the failover issue we talked about so source won't skip any child.
common/persistence/nosql/nosqlplugin/cassandra/workflowUtils.go
Outdated
Show resolved
Hide resolved
aaf7aae
to
ec8173f
Compare
service/history/task/cross_cluster_source_task_executor_test.go
Outdated
Show resolved
Hide resolved
ec8173f
to
2e6965b
Compare
2e6965b
to
6664fe6
Compare
3e9401e
to
9fd179a
Compare
9fd179a
to
0caef5f
Compare
What changed?
With the previous implementation, there was a race condition where we could skip sending parentClosePolicy to one of the cross cluster children if that child was failing over at the same time. This diff ensures that we will track all the children and retry applyParentClosePolicy task if this happens.
In a later diff, I'm planning to split processCloseExecution into 3 functions as
processCloseExecution
,recordChildCompletion
andapplyParentPolicy
; then, planning to create 2 new transfer tasks for the 2 new later functions. This will help us retry only the failed parts of closeExecutions instead of retrying the whole thing. processCloseExecution will still call the other 2 as it's doing today.Depends on cadence-workflow/cadence-idl#89
Why?
Without this diff, we might leak some child workflows or have unintended child workflows running.
How did you test it?
Unit tests.
Potential risks
Release notes
Documentation Changes