-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Subworkflow completes but main workflow is not starting next task #3089
Comments
@akash0996 it would be helpful if you can provide the workflow json, and also check conductor server logs for any errors? I assume the next task is a decision task based on the image provided? |
Please also include what persistence, queue and lock implementation is used. |
@jxu-nflx I have attached wfs below Main WF
Sub WF
|
same issue happened several times,usually after retrying failed subworkflow.
It keeps going after restarting conductor server |
@jxu-nflx Can you reproduce this issue or any suggestions to avoid this serious problem? have not enough time to read through all related codes @akash0996 I have trouble in reproducing this issue, can you? any information would be helpful for me to reproduce and try to fix this problem |
Hi @BrandonDotLin , Can you please check if workflow sweeper is running or not? Ideally async system task should be polled every 60 seconds. |
Yes, with following config:
|
any updates for this issue? @aravindanr |
We are looking into a fix for this issue and will release a fix soon. |
Great. Can you reproduce this issue so far? Would you please share if you did. When this happened, the deciderQueue message was popped but nothing happened with the workflow, everything worked well after I change the popped flag to 0 (mysql persistence) |
do we have any update on this issue? |
This seems like an issue with conductor-mysql-persistence module (maintained by conductor-community). There was a similar issue (#3183) with sub_workflow task for which a fix was released in v3.10.7. |
This issue(not start next task) can happen even current finished task is not sub-workflow |
After upgrading to Server logs do not yield anything particular. Is there a way where the next task can be 'forced'? |
this issue is fixed here: #3197 |
Hi @apanicker-nflx , sorry to mention you again. Would you please give some instructions to reproduce issue so I can try to fix it myself? We need to fix this problem asap |
@BrandonDotLin We came across this issue when running regression tests at scale. The likely cause was attributed to a race condition between updating subworkflow task in main workflow and the subworkflow itself completing. We have since fixed this race condition with the latest fix in v3.11.3 |
would you please advise which part of changes in v3.11.3 was related to race condition fix? It seems all about output of subworkflow. |
Part of the changes were made in v3.10.7 and part of the changes are in v3.11.3. |
we ave created a simple workflow with version 3.11.3, it turns out the output of the sub workflow is not being fed back to main flow. more specifically the join does not resume after prev fork tasks are completed.. here is a JSON { may be you can let us know what configuration we are missing |
@apanicker-nflx - can you paste an example json of the problem in question and how this version 3.11.3 fixed the race around condition thank you |
JOIN task does not complete immediately, rather will be evaluated asynchronously by the workflow reconciler.
Unfortunately, I do not.
This version along with some other changes in an earlier version fixed the consequences of race conditions. In this specific case, the workflow repair service was tasked with identifying and fixing cases where the subworkflow status/output was not reflected correctly in the parent workflow's subworkflow task. |
"joinOn":[ in my case the join is contingent on task 0 and task1 . does that means if the tasks are complete the join will evaluate to "TRUE"? |
This issue is stale, because it has been open for 45 days with no activity. Remove the stale label or comment, or this will be closed in 7 days. |
This issue was closed, because it has been stalled for 7 days with no activity. |
The text was updated successfully, but these errors were encountered: