Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activeDeadlineSeconds at workflow level not working as expected #5796

Closed
akshaybhatt14495 opened this issue May 3, 2021 · 1 comment · Fixed by #5798
Closed

activeDeadlineSeconds at workflow level not working as expected #5796

akshaybhatt14495 opened this issue May 3, 2021 · 1 comment · Fixed by #5798
Labels

Comments

@akshaybhatt14495
Copy link

akshaybhatt14495 commented May 3, 2021

Summary

What happened/
Let say activeDeadlineSeconds at workflow level is 60 sec, and step will sleep for 60 secs. then after 60 seconds got error. That looks good.
Response:

Namespace:           flow-argo-stage
ServiceAccount:      flow-account
Status:              Failed
Message:             Pod was active on the node longer than the specified deadline
Conditions:          
 PodRunning          False
 Completed           True
Created:             Mon May 03 18:18:19 +0530 (3 minutes ago)
Started:             Mon May 03 18:18:19 +0530 (3 minutes ago)
Finished:            Mon May 03 18:19:28 +0530 (2 minutes ago)
Duration:            1 minute 9 seconds
ResourcesDuration:   0s*cpu,0s*memory

STEP               TEMPLATE      PODNAME         DURATION  MESSAGE
 ✖ retry-workflow  retryExample  retry-workflow  1m        Pod was active on the node longer than the specified deadline

But on retiring workflow, It's directly throwing error. Not creating pod as well.
Response:

Namespace:           flow-argo-stage
ServiceAccount:      flow-account
Status:              Failed
Message:             Step exceeded its deadline
Conditions:          
 PodRunning          False
 Completed           True
Created:             Mon May 03 18:18:19 +0530 (4 minutes ago)
Started:             Mon May 03 18:18:19 +0530 (4 minutes ago)
Finished:            Mon May 03 18:22:35 +0530 (1 second ago)
Duration:            4 minutes 16 seconds

STEP               TEMPLATE      PODNAME         DURATION  MESSAGE
 ✖ retry-workflow  retryExample  retry-workflow  11s       Step exceeded its deadline  

what you expected to happen?

On workflow retry, Pod should again run for 60 secs then it should fail.

Diagnostics

What Kubernetes provider are you using?

clientVersion:
  buildDate: "2020-02-13T18:06:54Z"
  compiler: gc
  gitCommit: 06ad960bfd03b39c8310aaf92d1e7c12ce618213
  gitTreeState: clean
  gitVersion: v1.17.3
  goVersion: go1.13.8
  major: "1"
  minor: "17"
  platform: darwin/amd64
serverVersion:
  buildDate: "2020-06-17T11:33:59Z"
  compiler: gc
  gitCommit: c96aede7b5205121079932896c4ad89bb93260af
  gitTreeState: clean
  gitVersion: v1.18.4
  goVersion: go1.13.9
  major: "1"
  minor: "18"
  platform: linux/amd64

What version of Argo Workflows are you running?

argo: v2.12.7
  BuildDate: 2021-02-01T22:11:06Z
  GitCommit: 5f5150730c644865a5867bf017100732f55811dd
  GitTreeState: clean
  GitTag: v2.12.7
  GoVersion: go1.13
  Compiler: gc
  Platform: darwin/amd64
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: retry-workflow
spec:
  activeDeadlineSeconds: 60
  entrypoint: retryExample
  templates:
  - name: retryExample
    container:
      resources:
        limits:
          cpu: "1"
          memory: "2e9"
        requests:
          cpu: "1"
          memory: "2e9"
      image: apline
      imagePullPolicy: Always
      name: ""
      args:
      - |-
        echo sleeping for 600 sec;
        sleep 60;
        echo sleep complete;
      command:
      - /bin/sh
      - -c

Logs

https://gist.github.com/akshaybhatt14495/8a9f4f00cdb91f8f1f3bd840d9be1385


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@terrytangyuan
Copy link
Member

This is because activeDeadlineSeconds is used in conjunction with wf.Status.StartedAt which is still the timestamp when the workflow is initially submitted and is not reset to current time when retrying workflow. I've submitted #5798 to fix this.

alexec pushed a commit that referenced this issue May 3, 2021
…ixes #5796 (#5798)

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
@sarabala1979 sarabala1979 mentioned this issue May 4, 2021
33 tasks
sarabala1979 pushed a commit that referenced this issue May 5, 2021
…ixes #5796 (#5798)

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants