Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failFast field doesn't work #10312

Open
2 of 3 tasks
gocpplua opened this issue Jan 5, 2023 · 11 comments · May be fixed by #11992
Open
2 of 3 tasks

failFast field doesn't work #10312

gocpplua opened this issue Jan 5, 2023 · 11 comments · May be fixed by #11992
Labels
area/templates/dag P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@gocpplua
Copy link

gocpplua commented Jan 5, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

failFast flag is invalid

Version

v3.4.2

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

I have a workflow yaml like this.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: dag-primay-branch-
spec:
  entrypoint: statis
  templates:
  - name: a
    container:
      image:  docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]
  - name: b
    retryStrategy:
      limit: "2"
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["sleep 30; echo haha"]
  - name: c
    retryStrategy:
      limit: "3"
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo intentional failure; exit 2"]
  - name: d
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]
  - name: statis
    dag:
      failFast: false
      tasks:
      - name: A
        template: a
      - name: B
        depends: "A"
        template: b
      - name: C
        depends: "A"
        template: c
      - name: D
        depends: "B"
        template: d
      - name: E
        depends: "D"
        template: d

The dependencies are as follows :

step1:      A
          /     \
step2:  B       C
         |
step3:  D
         |
step4:  E

When I was using FailFast, I found that this parameter was invalid. Regardless of setting True/False, step C failed, and BDE would execute it.



### Logs from the workflow controller

```text
$ kubectl logs -n argo deploy/workflow-controller | grep dag-primay-branch-txrwg
time="2023-01-05T08:18:01.061Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.065Z" level=info msg="Updated phase  -> Running" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.065Z" level=info msg="DAG node dag-primay-branch-txrwg initialized Running" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.065Z" level=info msg="All of node dag-primay-branch-txrwg.A dependencies [] completed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.065Z" level=info msg="Pod node dag-primay-branch-txrwg-4194302311 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.087Z" level=info msg="Created pod: dag-primay-branch-txrwg.A (dag-primay-branch-txrwg-a-4194302311)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.087Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.087Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:01.113Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534322 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:11.090Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:11.090Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:11.090Z" level=info msg="node changed" namespace=default new.message=PodInitializing new.phase=Pending new.progress=0/1 nodeID=dag-primay-branch-txrwg-4194302311 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:11.090Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:11.090Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:18:11.108Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534353 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.112Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.112Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.112Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=dag-primay-branch-txrwg-4194302311 old.message=PodInitializing old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.112Z" level=info msg="All of node dag-primay-branch-txrwg.C dependencies [A] completed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.113Z" level=info msg="Retry node dag-primay-branch-txrwg-4227857549 initialized Running" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.113Z" level=info msg="Pod node dag-primay-branch-txrwg-3127743452 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.136Z" level=info msg="Created pod: dag-primay-branch-txrwg.C(0) (dag-primay-branch-txrwg-c-3127743452)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.136Z" level=info msg="All of node dag-primay-branch-txrwg.B dependencies [A] completed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.136Z" level=info msg="Retry node dag-primay-branch-txrwg-4211079930 initialized Running" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.136Z" level=info msg="Pod node dag-primay-branch-txrwg-1127430305 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.157Z" level=info msg="Created pod: dag-primay-branch-txrwg.B(0) (dag-primay-branch-txrwg-b-1127430305)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.157Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.157Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.196Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534522 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:33.202Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-a-4194302311/labelPodCompleted
time="2023-01-05T08:19:43.138Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:43.138Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:43.138Z" level=info msg="node changed" namespace=default new.message= new.phase=Running new.progress=0/1 nodeID=dag-primay-branch-txrwg-1127430305 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:43.138Z" level=info msg="node changed" namespace=default new.message= new.phase=Running new.progress=0/1 nodeID=dag-primay-branch-txrwg-3127743452 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:43.139Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:43.139Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:43.144Z" level=info msg="cleaning up pod" action=terminateContainers key=default/dag-primay-branch-txrwg-c-3127743452/terminateContainers
time="2023-01-05T08:19:43.151Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534582 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.223Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.223Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.224Z" level=info msg="node unchanged" namespace=default nodeID=dag-primay-branch-txrwg-1127430305 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.224Z" level=info msg="node changed" namespace=default new.message="Error (exit code 2)" new.phase=Failed new.progress=0/1 nodeID=dag-primay-branch-txrwg-3127743452 old.message= old.phase=Running old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.224Z" level=info msg="1 child nodes of dag-primay-branch-txrwg.C failed. Trying again..." namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.224Z" level=info msg="Pod node dag-primay-branch-txrwg-2523602073 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.236Z" level=info msg="Created pod: dag-primay-branch-txrwg.C(1) (dag-primay-branch-txrwg-c-2523602073)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.236Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.236Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.249Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534603 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:19:53.255Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-c-3127743452/labelPodCompleted
time="2023-01-05T08:20:03.240Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.241Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.241Z" level=info msg="node changed" namespace=default new.message="Error (exit code 2)" new.phase=Failed new.progress=0/1 nodeID=dag-primay-branch-txrwg-2523602073 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.241Z" level=info msg="node unchanged" namespace=default nodeID=dag-primay-branch-txrwg-1127430305 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.241Z" level=info msg="2 child nodes of dag-primay-branch-txrwg.C failed. Trying again..." namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.241Z" level=info msg="Pod node dag-primay-branch-txrwg-1047024506 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.266Z" level=info msg="Created pod: dag-primay-branch-txrwg.C(2) (dag-primay-branch-txrwg-c-1047024506)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.266Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.266Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.289Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534649 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:03.295Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-c-2523602073/labelPodCompleted
time="2023-01-05T08:20:13.145Z" level=info msg="cleaning up pod" action=killContainers key=default/dag-primay-branch-txrwg-c-3127743452/killContainers
time="2023-01-05T08:20:13.268Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.268Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.268Z" level=info msg="node changed" namespace=default new.message= new.phase=Running new.progress=0/1 nodeID=dag-primay-branch-txrwg-1127430305 old.message= old.phase=Running old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.268Z" level=info msg="node changed" namespace=default new.message="Error (exit code 2)" new.phase=Failed new.progress=0/1 nodeID=dag-primay-branch-txrwg-1047024506 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.268Z" level=info msg="3 child nodes of dag-primay-branch-txrwg.C failed. Trying again..." namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.268Z" level=info msg="Pod node dag-primay-branch-txrwg-442883127 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.273Z" level=info msg="cleaning up pod" action=terminateContainers key=default/dag-primay-branch-txrwg-b-1127430305/terminateContainers
time="2023-01-05T08:20:13.281Z" level=info msg="Created pod: dag-primay-branch-txrwg.C(3) (dag-primay-branch-txrwg-c-442883127)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.281Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.281Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.304Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534705 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:13.311Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-c-1047024506/labelPodCompleted
time="2023-01-05T08:20:23.283Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=dag-primay-branch-txrwg-1127430305 old.message= old.phase=Running old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="node changed" namespace=default new.message="Error (exit code 2)" new.phase=Failed new.progress=0/1 nodeID=dag-primay-branch-txrwg-442883127 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="No more retries left. Failing..." namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="node dag-primay-branch-txrwg-4227857549 phase Running -> Failed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="node dag-primay-branch-txrwg-4227857549 message: No more retries left" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="node dag-primay-branch-txrwg-4227857549 finished: 2023-01-05 08:20:23.284703468 +0000 UTC" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="node dag-primay-branch-txrwg-4211079930 phase Running -> Succeeded" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="node dag-primay-branch-txrwg-4211079930 finished: 2023-01-05 08:20:23.284844065 +0000 UTC" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.284Z" level=info msg="All of node dag-primay-branch-txrwg.D dependencies [B] completed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.285Z" level=info msg="Pod node dag-primay-branch-txrwg-4110414216 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.297Z" level=info msg="Created pod: dag-primay-branch-txrwg.D (dag-primay-branch-txrwg-d-4110414216)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.297Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.297Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.317Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534761 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:23.323Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-b-1127430305/labelPodCompleted
time="2023-01-05T08:20:23.323Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-c-442883127/labelPodCompleted
time="2023-01-05T08:20:33.299Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:33.299Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:33.299Z" level=info msg="node changed" namespace=default new.message= new.phase=Running new.progress=0/1 nodeID=dag-primay-branch-txrwg-4110414216 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:33.299Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:33.299Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:33.304Z" level=info msg="cleaning up pod" action=terminateContainers key=default/dag-primay-branch-txrwg-d-4110414216/terminateContainers
time="2023-01-05T08:20:33.320Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534803 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.274Z" level=info msg="cleaning up pod" action=killContainers key=default/dag-primay-branch-txrwg-b-1127430305/killContainers
time="2023-01-05T08:20:43.647Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.647Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.647Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=dag-primay-branch-txrwg-4110414216 old.message= old.phase=Running old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.648Z" level=info msg="All of node dag-primay-branch-txrwg.E dependencies [D] completed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.648Z" level=info msg="Pod node dag-primay-branch-txrwg-4127191835 initialized Pending" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.665Z" level=info msg="Created pod: dag-primay-branch-txrwg.E (dag-primay-branch-txrwg-d-4127191835)" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.665Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.665Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.686Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534827 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:43.692Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-d-4110414216/labelPodCompleted
time="2023-01-05T08:20:53.667Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:53.667Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:53.667Z" level=info msg="node changed" namespace=default new.message= new.phase=Running new.progress=0/1 nodeID=dag-primay-branch-txrwg-4127191835 old.message= old.phase=Pending old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:53.671Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:53.671Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:20:53.673Z" level=info msg="cleaning up pod" action=terminateContainers key=default/dag-primay-branch-txrwg-d-4127191835/terminateContainers
time="2023-01-05T08:20:53.691Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=2534863 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.305Z" level=info msg="cleaning up pod" action=killContainers key=default/dag-primay-branch-txrwg-d-4110414216/killContainers
time="2023-01-05T08:21:03.788Z" level=info msg="Processing workflow" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=dag-primay-branch-txrwg-4127191835 old.message= old.phase=Running old.progress=0/1 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="Outbound nodes of dag-primay-branch-txrwg set to [dag-primay-branch-txrwg-442883127 dag-primay-branch-txrwg-4127191835]" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="node dag-primay-branch-txrwg phase Running -> Failed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="node dag-primay-branch-txrwg finished: 2023-01-05 08:21:03.789608909 +0000 UTC" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="Checking daemoned children of dag-primay-branch-txrwg" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg=reconcileAgentPod namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="Updated phase Running -> Failed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="Marking workflow completed" namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.789Z" level=info msg="Checking daemoned children of " namespace=default workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.795Z" level=info msg="cleaning up pod" action=deletePod key=default/dag-primay-branch-txrwg-1340600742-agent/deletePod
time="2023-01-05T08:21:03.812Z" level=info msg="Workflow update successful" namespace=default phase=Failed resourceVersion=2534886 workflow=dag-primay-branch-txrwg
time="2023-01-05T08:21:03.826Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/dag-primay-branch-txrwg-d-4127191835/labelPodCompleted
time="2023-01-05T08:21:23.674Z" level=info msg="cleaning up pod" action=killContainers key=default/dag-primay-branch-txrwg-d-4127191835/killContainers

Logs from in your workflow's wait container

$ kubectl logs -c wait -l workflows.argoproj.io/workflow="dag-primay-branch-txrwg",workflow.argoproj.io/phase!=Succeeded
time="2023-01-05T08:18:03.985Z" level=info msg="Starting Workflow Executor" version=v3.4.2
time="2023-01-05T08:18:03.987Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:18:03.987Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-a-4194302311 template="{\"name\":\"a\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"cowsay\"],\"args\":[\"hello world\"],\"resources\":{}}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:18:03.987Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:19:24.036Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:19:24.036Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:19:24.036Z" level=info msg="No output parameters"
time="2023-01-05T08:19:24.036Z" level=info msg="No output artifacts"
time="2023-01-05T08:19:24.036Z" level=info msg="Alloc=6736 TotalAlloc=12007 Sys=18898 NumGC=4 Goroutines=7"
time="2023-01-05T08:19:36.448Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:19:36.448Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-c-3127743452 template="{\"name\":\"c\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo intentional failure; exit 2\"],\"resources\":{}},\"retryStrategy\":{\"limit\":3}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:19:36.449Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:19:40.451Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:19:40.451Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:19:40.451Z" level=info msg="No output parameters"
time="2023-01-05T08:19:40.451Z" level=info msg="No output artifacts"
time="2023-01-05T08:19:40.451Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2023-01-05T08:19:40.451Z" level=info msg="Deadline monitor stopped"
time="2023-01-05T08:19:40.451Z" level=info msg="Alloc=7969 TotalAlloc=12272 Sys=18642 NumGC=3 Goroutines=4"
time="2023-01-05T08:19:55.536Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:19:55.537Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-c-2523602073 template="{\"name\":\"c\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo intentional failure; exit 2\"],\"resources\":{}},\"retryStrategy\":{\"limit\":3}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:19:55.537Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:19:58.538Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:19:58.538Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:19:58.538Z" level=info msg="No output parameters"
time="2023-01-05T08:19:58.538Z" level=info msg="No output artifacts"
time="2023-01-05T08:19:58.538Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2023-01-05T08:19:58.538Z" level=info msg="Deadline monitor stopped"
time="2023-01-05T08:19:58.539Z" level=info msg="Alloc=6415 TotalAlloc=12236 Sys=19666 NumGC=4 Goroutines=5"
time="2023-01-05T08:20:05.584Z" level=info msg="Starting Workflow Executor" version=v3.4.2
time="2023-01-05T08:20:05.589Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:20:05.589Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-c-1047024506 template="{\"name\":\"c\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo intentional failure; exit 2\"],\"resources\":{}},\"retryStrategy\":{\"limit\":3}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:20:05.589Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:20:08.589Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:20:08.589Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:20:08.589Z" level=info msg="No output parameters"
time="2023-01-05T08:20:08.589Z" level=info msg="No output artifacts"
time="2023-01-05T08:20:08.589Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2023-01-05T08:20:08.589Z" level=info msg="Alloc=6368 TotalAlloc=12222 Sys=19154 NumGC=4 Goroutines=7"
time="2023-01-05T08:20:15.595Z" level=info msg="Starting Workflow Executor" version=v3.4.2
time="2023-01-05T08:20:15.599Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:20:15.599Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-c-442883127 template="{\"name\":\"c\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo intentional failure; exit 2\"],\"resources\":{}},\"retryStrategy\":{\"limit\":3}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:20:15.599Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:20:18.599Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:20:18.599Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:20:18.599Z" level=info msg="No output parameters"
time="2023-01-05T08:20:18.599Z" level=info msg="No output artifacts"
time="2023-01-05T08:20:18.599Z" level=info msg="Deadline monitor stopped"
time="2023-01-05T08:20:18.599Z" level=info msg="Alloc=6222 TotalAlloc=12188 Sys=23506 NumGC=4 Goroutines=7"
time="2023-01-05T08:19:36.457Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:19:36.457Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-b-1127430305 template="{\"name\":\"b\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"sleep 30; echo haha\"],\"resources\":{}},\"retryStrategy\":{\"limit\":2}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:19:36.457Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:20:10.473Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:20:10.473Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:20:10.473Z" level=info msg="No output parameters"
time="2023-01-05T08:20:10.473Z" level=info msg="No output artifacts"
time="2023-01-05T08:20:10.473Z" level=info msg="Deadline monitor stopped"
time="2023-01-05T08:20:10.473Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2023-01-05T08:20:10.473Z" level=info msg="Alloc=6103 TotalAlloc=12035 Sys=19154 NumGC=4 Goroutines=5"
time="2023-01-05T08:20:25.689Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:20:25.689Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-d-4110414216 template="{\"name\":\"d\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"cowsay\"],\"args\":[\"hello world\"],\"resources\":{}}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:20:25.690Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:20:31.693Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:20:31.693Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:20:31.693Z" level=info msg="No output parameters"
time="2023-01-05T08:20:31.693Z" level=info msg="No output artifacts"
time="2023-01-05T08:20:31.693Z" level=info msg="Deadline monitor stopped"
time="2023-01-05T08:20:31.693Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2023-01-05T08:20:31.693Z" level=info msg="Alloc=6874 TotalAlloc=12123 Sys=18898 NumGC=3 Goroutines=4"
time="2023-01-05T08:20:45.802Z" level=info msg="Starting Workflow Executor" version=v3.4.2
time="2023-01-05T08:20:45.807Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-01-05T08:20:45.807Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=dag-primay-branch-txrwg-d-4127191835 template="{\"name\":\"d\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"cowsay\"],\"args\":[\"hello world\"],\"resources\":{}}}" version="&Version{Version:v3.4.2,BuildDate:2022-10-23T04:49:38Z,GitCommit:c08563baf7bafafe3aeb3284cd3410308603cad4,GitTag:v3.4.2,GitTreeState:clean,GoVersion:go1.18.7,Compiler:gc,Platform:linux/amd64,}"
time="2023-01-05T08:20:45.807Z" level=info msg="Starting deadline monitor"
time="2023-01-05T08:20:51.807Z" level=info msg="Main container completed" error="<nil>"
time="2023-01-05T08:20:51.807Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-01-05T08:20:51.807Z" level=info msg="No output parameters"
time="2023-01-05T08:20:51.807Z" level=info msg="No output artifacts"
time="2023-01-05T08:20:51.807Z" level=info msg="Deadline monitor stopped"
time="2023-01-05T08:20:51.807Z" level=info msg="Alloc=6129 TotalAlloc=12208 Sys=19154 NumGC=4 Goroutines=6"
@sarabala1979 sarabala1979 added P3 Low priority type/regression Regression from previous behavior (a specific type of bug) labels Jan 5, 2023
@NikeNano
Copy link
Contributor

NikeNano commented Jan 8, 2023

@sarabala1979 I would be happy to see if I can find the issue.

@NikeNano NikeNano self-assigned this Jan 8, 2023
@sarabala1979
Copy link
Member

@NikeNano Thanks Nike. Please submit the PR

@aaaaahaaaaa
Copy link

@sarabala1979 Could you clarify what is the status of the failfast functionality for dags? Is it supposed to be fully functional or known to be broken?

I'm asking because it doesn't seem to work on our side either, and I found clues in past GH issues (although quite old) that there are known bugs that wouldn't be fixed in favour to a different feature.

@dablelv
Copy link

dablelv commented Mar 28, 2023

I encountered the same issue. Does this issue has been fixed? Is there any one can help me, thx.

@sarabala1979
Copy link
Member

@aaaaahaaaaa @dablelv Would you like to submit the fix? we are happy to have more contributors

@dablelv
Copy link

dablelv commented Mar 28, 2023

@aaaaahaaaaa @dablelv Would you like to submit the fix? we are happy to have more contributors

Thank you for your invitation. If I have free time, I‘m very happy to join the work of Argo.

Besides, I found a way to solve this issue temporarily in the biz code. If one node of the workflow failed, the workflow PHASE became "Failed". So we can let a node to poll the current workflow state to stop itself. It's not elegant, but it can solve the problem.

@capacman
Copy link

affecting 3.4.7 version also

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the problem/stale This has not had a response in some time label Sep 17, 2023
@terrytangyuan terrytangyuan removed the problem/stale This has not had a response in some time label Sep 20, 2023
@JasonChen86899
Copy link

hi @agilgur5 please assign me.

I have found the location where the bug occurred, but I have found that this modification will affect some execution and node status of the entire DAG workflow, so I have a question about this fix
Due to the premature settings of entire DAG workflow‘s failure and completion status by failed pods, those NodeRunning status pods will remain NodeRunning status and unscheduled pods will not have any records in argo UI display.

So do we need to mark the unscheduled pods as NodeOmitted status? And the entire workflow needs to wait for all unscheduled pods to be marked as NodeOmitted before setting the completion status to DAG workflow

@agilgur5
Copy link
Member

Mentioned on Slack that you can submit a PR directly, no need to be assigned.

Regarding unscheduled nodes, good question, I believe they should be marked as "Skipped". When a Workflow is shutdown (terminated or stopped), all unscheduled nodes are currently marked as "Skipped", so that would match that existing behavior.
There is also an open feature request for a "Cancelled" status that may better represent these kinds of edge cases.

JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Oct 13, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
@JasonChen86899 JasonChen86899 linked a pull request Oct 13, 2023 that will close this issue
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Oct 13, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Oct 16, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Oct 18, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Oct 18, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
@agilgur5 agilgur5 added P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important and removed P3 Low priority labels Oct 24, 2023
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Oct 30, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Dec 23, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Dec 24, 2023
Signed-off-by: Goober <chenhao86899@gmail.com>
@agilgur5 agilgur5 changed the title failFast flag is invalid failFast field doesn't work Jun 6, 2024
JasonChen86899 added a commit to JasonChen86899/argo-workflows that referenced this issue Jun 15, 2024
Signed-off-by: Goober <chenhao86899@gmail.com>
@tooptoop4
Copy link
Contributor

#10430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/templates/dag P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants