activeDeadlineSeconds at workflow level not working as expected #5796

akshaybhatt14495 · 2021-05-03T12:39:27Z

Summary

What happened/
Let say activeDeadlineSeconds at workflow level is 60 sec, and step will sleep for 60 secs. then after 60 seconds got error. That looks good.
Response:

Namespace:           flow-argo-stage
ServiceAccount:      flow-account
Status:              Failed
Message:             Pod was active on the node longer than the specified deadline
Conditions:          
 PodRunning          False
 Completed           True
Created:             Mon May 03 18:18:19 +0530 (3 minutes ago)
Started:             Mon May 03 18:18:19 +0530 (3 minutes ago)
Finished:            Mon May 03 18:19:28 +0530 (2 minutes ago)
Duration:            1 minute 9 seconds
ResourcesDuration:   0s*cpu,0s*memory

STEP               TEMPLATE      PODNAME         DURATION  MESSAGE
 ✖ retry-workflow  retryExample  retry-workflow  1m        Pod was active on the node longer than the specified deadline

But on retiring workflow, It's directly throwing error. Not creating pod as well.
Response:

Namespace:           flow-argo-stage
ServiceAccount:      flow-account
Status:              Failed
Message:             Step exceeded its deadline
Conditions:          
 PodRunning          False
 Completed           True
Created:             Mon May 03 18:18:19 +0530 (4 minutes ago)
Started:             Mon May 03 18:18:19 +0530 (4 minutes ago)
Finished:            Mon May 03 18:22:35 +0530 (1 second ago)
Duration:            4 minutes 16 seconds

STEP               TEMPLATE      PODNAME         DURATION  MESSAGE
 ✖ retry-workflow  retryExample  retry-workflow  11s       Step exceeded its deadline

what you expected to happen?

On workflow retry, Pod should again run for 60 secs then it should fail.

Diagnostics

What Kubernetes provider are you using?

clientVersion:
  buildDate: "2020-02-13T18:06:54Z"
  compiler: gc
  gitCommit: 06ad960bfd03b39c8310aaf92d1e7c12ce618213
  gitTreeState: clean
  gitVersion: v1.17.3
  goVersion: go1.13.8
  major: "1"
  minor: "17"
  platform: darwin/amd64
serverVersion:
  buildDate: "2020-06-17T11:33:59Z"
  compiler: gc
  gitCommit: c96aede7b5205121079932896c4ad89bb93260af
  gitTreeState: clean
  gitVersion: v1.18.4
  goVersion: go1.13.9
  major: "1"
  minor: "18"
  platform: linux/amd64

What version of Argo Workflows are you running?

argo: v2.12.7
  BuildDate: 2021-02-01T22:11:06Z
  GitCommit: 5f5150730c644865a5867bf017100732f55811dd
  GitTreeState: clean
  GitTag: v2.12.7
  GoVersion: go1.13
  Compiler: gc
  Platform: darwin/amd64

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: retry-workflow
spec:
  activeDeadlineSeconds: 60
  entrypoint: retryExample
  templates:
  - name: retryExample
    container:
      resources:
        limits:
          cpu: "1"
          memory: "2e9"
        requests:
          cpu: "1"
          memory: "2e9"
      image: apline
      imagePullPolicy: Always
      name: ""
      args:
      - |-
        echo sleeping for 600 sec;
        sleep 60;
        echo sleep complete;
      command:
      - /bin/sh
      - -c

Logs

https://gist.github.com/akshaybhatt14495/8a9f4f00cdb91f8f1f3bd840d9be1385

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

terrytangyuan · 2021-05-03T14:51:49Z

This is because activeDeadlineSeconds is used in conjunction with wf.Status.StartedAt which is still the timestamp when the workflow is initially submitted and is not reset to current time when retrying workflow. I've submitted #5798 to fix this.

…ixes #5796 (#5798) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

akshaybhatt14495 added the type/bug label May 3, 2021

terrytangyuan mentioned this issue May 3, 2021

fix: Reset workflow started time to current when retrying workflow. Fixes #5796 #5798

Merged

1 task

alexec closed this as completed in #5798 May 3, 2021

alexec pushed a commit that referenced this issue May 3, 2021

fix: Reset workflow started time to current when retrying workflow. F…

4b3a30f

…ixes #5796 (#5798) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

terrytangyuan mentioned this issue May 3, 2021

fix: Reset started time for each node to current when retrying workflow #5801

Merged

1 task

sarabala1979 mentioned this issue May 4, 2021

v3.0.3 cherry-pick #5820

Closed

33 tasks

sarabala1979 pushed a commit that referenced this issue May 5, 2021

fix: Reset workflow started time to current when retrying workflow. F…

0d3ad80

…ixes #5796 (#5798) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

activeDeadlineSeconds at workflow level not working as expected #5796

activeDeadlineSeconds at workflow level not working as expected #5796

akshaybhatt14495 commented May 3, 2021 •

edited

Loading

terrytangyuan commented May 3, 2021

activeDeadlineSeconds at workflow level not working as expected #5796

activeDeadlineSeconds at workflow level not working as expected #5796

Comments

akshaybhatt14495 commented May 3, 2021 • edited Loading

Summary

Diagnostics

Logs

terrytangyuan commented May 3, 2021

akshaybhatt14495 commented May 3, 2021 •

edited

Loading