Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use workflows resource create on ARM #10789

Open
2 of 3 tasks
tico24 opened this issue Mar 31, 2023 · 4 comments
Open
2 of 3 tasks

Cannot use workflows resource create on ARM #10789

tico24 opened this issue Mar 31, 2023 · 4 comments
Labels
area/executor area/templates/resource P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug

Comments

@tico24
Copy link
Member

tico24 commented Mar 31, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

It seems that the resource creation step requires an AMD64 node to run from. It shouldn't do.

So running the below example, but adding a nodeselector to ensure it starts on arm, results in the workflow failing.

main time="2023-04-03T06:32:19.477Z" level=info msg="capturing logs" argo=true
main time="2023-04-03T06:32:19.508Z" level=info msg="Starting Workflow Executor" version=v3.4.6
main time="2023-04-03T06:32:19.511Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
main time="2023-04-03T06:32:19.511Z" level=info msg="Executor initialized" deadline="2023-04-03 07:32:17 +0000 UTC" includeScriptOutput=false namespace=ci podName=k8s-owner-reference-rwh9g-main-1526760976 template="{\"name\":\"main\",\"inputs\":{},\"outputs\":{},\"nodeSelector\":{\
init time="2023-04-03T06:32:19.095Z" level=info msg="Starting Workflow Executor" version=v3.4.6
init time="2023-04-03T06:32:19.099Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
init time="2023-04-03T06:32:19.099Z" level=info msg="Executor initialized" deadline="2023-04-03 07:32:17 +0000 UTC" includeScriptOutput=false namespace=ci podName=k8s-owner-reference-rwh9g-main-1526760976 template="{\"name\":\"main\",\"inputs\":{},\"outputs\":{},\"nodeSelector\":{\
init time="2023-04-03T06:32:19.139Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
init time="2023-04-03T06:32:19.139Z" level=info msg="Start loading input artifacts..."
init time="2023-04-03T06:32:19.139Z" level=info msg="Alloc=6701 TotalAlloc=12554 Sys=30418 NumGC=4 Goroutines=4"
main time="2023-04-03T06:32:19.511Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
main time="2023-04-03T06:32:19.512Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o json"
main time="2023-04-03T06:32:19.513Z" level=warning msg="Non-transient error: fork/exec /bin/kubectl: exec format error"
main time="2023-04-03T06:32:19.513Z" level=error msg="executor error: no more retries fork/exec /bin/kubectl: exec format error"
main time="2023-04-03T06:32:19.513Z" level=fatal msg="no more retries fork/exec /bin/kubectl: exec format error"
main time="2023-04-03T06:32:20.478Z" level=info msg="sub-process exited" argo=true error="<nil>"
main Error: exit status 1

I would summise that the kubectl installation within workflows is pinned to AMD64, and should instead be dynamic depending on whether the executor pod is ARM or AMD64 (or probably windows?)

Version

v3.4.6., latest

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: k8s-owner-reference-
spec:
  entrypoint: main
  templates:
    - name: main
      nodeSelector:
        kubernetes.io/arch: arm64
      resource:
        action: create
        setOwnerReference: true
        manifest: |
          apiVersion: v1
          kind: Service
          metadata:
            name: "test"
          spec:
            selector:
              workflows.argoproj.io/workflow: "test"
            clusterIP: None

Logs from the workflow controller

time="2023-04-03T06:33:27.544Z" level=info msg="Update leases 200"
time="2023-04-03T06:33:28.583Z" level=info msg="Processing workflow" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.584Z" level=info msg="Task-result reconciliation" namespace=ci numObjs=0 workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.584Z" level=info msg="Pod failed: Error (exit code 1): no more retries fork/exec /bin/kubectl: exec format error" displayName="k8s-owner-reference-h5mj2(0)" namespace=ci pod=k8s-owner-reference-h5mj2-main-366178901 templateName=main workflow=k8s-owner-refe
rence-h5mj2
time="2023-04-03T06:33:28.584Z" level=info msg="node changed" namespace=ci new.message="Error (exit code 1): no more retries fork/exec /bin/kubectl: exec format error" new.phase=Failed new.progress=0/1 nodeID=k8s-owner-reference-h5mj2-366178901 old.message= old.phase=Pending old.pr
ogress=0/1 workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="Node not set to be retried after status: Failed" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="node k8s-owner-reference-h5mj2 phase Running -> Failed" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="node k8s-owner-reference-h5mj2 message: Error (exit code 1): no more retries fork/exec /bin/kubectl: exec format error" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="node k8s-owner-reference-h5mj2 finished: 2023-04-03 06:33:28.585154318 +0000 UTC" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="TaskSet Reconciliation" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg=reconcileAgentPod namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="Updated phase Running -> Failed" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="Updated message  -> Error (exit code 1): no more retries fork/exec /bin/kubectl: exec format error" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="Marking workflow completed" namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="Checking daemoned children of " namespace=ci workflow=k8s-owner-reference-h5mj2
time="2023-04-03T06:33:28.585Z" level=info msg="Workflow to be dehydrated" Workflow Size=2248

Logs from in your workflow's wait container

No wait container is created.
@tico24
Copy link
Member Author

tico24 commented Apr 3, 2023

Re-tested on 3.4.6 and this is not resolved. It was supposed to have been fixed in #10550 but this seems to not be the case.

see also # #10538

@tico24 tico24 reopened this Apr 3, 2023
@terrytangyuan
Copy link
Member

a862ea1 was not able to be cherry-picked in 3.4.6 due to too many conflicts.

@kahirokunn
Copy link

I got same issue.

@sarabala1979 sarabala1979 added the P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important label Apr 6, 2023
@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the problem/stale This has not had a response in some time label Sep 17, 2023
@terrytangyuan terrytangyuan removed the problem/stale This has not had a response in some time label Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/executor area/templates/resource P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug
Projects
None yet
Development

No branches or pull requests

5 participants