Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error (exit code 64): failed to read exit-code of dependency #11490

Closed
1 of 3 tasks
rajaie-sg opened this issue Jul 31, 2023 · 1 comment · Fixed by #11496
Closed
1 of 3 tasks

Error (exit code 64): failed to read exit-code of dependency #11490

rajaie-sg opened this issue Jul 31, 2023 · 1 comment · Fixed by #11496

Comments

@rajaie-sg
Copy link

rajaie-sg commented Jul 31, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

FYI I am not able to reproduce this issue. We run ~2k workflows a day and it randomly shows up ~1 time a day.

We have a containerSet with multiple containers that run in series (each container depends on the previous container to complete). Occasionally (about 1 time per every 2000 workflow runs), the Workflow will fail with a similar message as below:

MESSAGE
Error (exit code 64): failed to read exit-code of dependency "my-container": strconv.Atoi: parsing "": invalid syntax

I have not been able to find any consistent pattern for when this happens. I checked the relevant code at

return fmt.Errorf("failed to read exit-code of dependency %q: %w", y, err)
but can't think of why the exitcode file would be empty.

Version

v3.4.3

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

n/a

Logs from the workflow controller

Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-78471234 phase Running -> Error" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-78471234 message: Error (exit code 64): failed to read exit-code of dependency \"init-crawler\": strconv.Atoi: parsing \"\": invalid syntax" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-78471234 finished: 2023-07-30 17:11:00.401717151 +0000 UTC" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-2749256213 phase Running -> Succeeded" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-2749256213 finished: 2023-07-30 17:11:00.401748698 +0000 UTC" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-4278990123 phase Running -> Error" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-4278990123 message: Error (exit code 64): dependency \"upload-pex\" exited with non-zero code: 64" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-4278990123 finished: 2023-07-30 17:11:00.401781632 +0000 UTC" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-2357450446 phase Running -> Error" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-2357450446 message: Error (exit code 64): dependency \"upload-pex\" exited with non-zero code: 64" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-2357450446 finished: 2023-07-30 17:11:00.401815432 +0000 UTC" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-578399247 phase Running -> Error" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-578399247 message: Error (exit code 64): dependency \"build-pex\" exited with non-zero code: 64" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node workflowname-578399247 finished: 2023-07-30 17:11:00.401849414 +0000 UTC" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.401Z" level=info msg="node changed" namespace=argo new.message="Error (exit code 64): failed to read exit-code of dependency \"init-crawler\": strconv.Atoi: parsing \"\": invalid syntax" new.phase=Failed new.progress=0/1 nodeID=workflowname-312986104 old.message= old.phase=Running old.progress=0/1 workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="Node not set to be retried after status: Failed" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="node workflowname phase Running -> Failed" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="node workflowname message: Error (exit code 64): failed to read exit-code of dependency \"init-crawler\": strconv.Atoi: parsing \"\": invalid syntax" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="node workflowname finished: 2023-07-30 17:11:00.402826076 +0000 UTC" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg=reconcileAgentPod namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="Updated phase Running -> Failed" namespace=argo workflow=workflowname
  Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="Updated message  -> Error (exit code 64): failed to read exit-code of dependency \"init-crawler\": strconv.Atoi: parsing \"\": invalid syntax" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="Marking workflow completed" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.402Z" level=info msg="Marking workflow as pending archiving" namespace=argo workflow=workflowname
    Jul 30 13:11:00 argo-workflow-controller-5659bbf4c6-vs5bk controller info time="2023-07-30T17:11:00.408Z" level=info msg="cleaning up pod" action=deletePod key=argo/workflowname-1340600742-agent/deletePod


### Logs from in your workflow's wait container

```text
Was not able to capture these logs.
@alexec
Copy link
Contributor

alexec commented Aug 1, 2023

I imagine this is a race condition. File is created, but exit code not written to it by the time it is read.

To fix this, we should make that not just the file exists, but it has content.

alexec added a commit that referenced this issue Aug 1, 2023
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec alexec linked a pull request Aug 1, 2023 that will close this issue
alexec added a commit that referenced this issue Aug 1, 2023
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants