-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong Pod name in argo get
command result from CLI
#9906
Comments
I am having the same issue here. The pod names when performing |
@JPZ13 @rohankmr414 Can you take a look? |
I am having the same issue here 👍🏻 |
I'm OOO this week @sarabala1979. How's your capacity @rohankmr414 or @isubasinghe? |
I have the same issue here. This is happening since version 3.4.0 (I unfortunately only upgraded this week and to 3.4.3 directly but traced it back to 3.4.0). The case seems to only happen when a retry strategy is set. The hello-world.yaml example does not suffer from the same issue.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
annotations:
workflows.argoproj.io/pod-name-format: v2
creationTimestamp: "2022-11-04T12:08:12Z"
generateName: retry-on-error-
generation: 2
labels:
workflows.argoproj.io/phase: Running
name: retry-on-error-khzpg
namespace: default
resourceVersion: "16229"
uid: 1d4e6dc4-e2be-475c-9c32-f3aaaef1cdf1
spec:
arguments: {}
entrypoint: error-container
templates:
- container:
args:
- import random; import sys; exit_code = random.choice(range(0, 5)); sys.exit(exit_code)
command:
- python
- -c
image: python
name: ""
resources: {}
inputs: {}
metadata: {}
name: error-container
outputs: {}
retryStrategy:
limit: "2"
retryPolicy: Always
status:
artifactGCStatus:
notSpecified: true
artifactRepositoryRef:
artifactRepository: {}
default: true
finishedAt: null
nodes:
retry-on-error-khzpg:
children:
- retry-on-error-khzpg-550301540
displayName: retry-on-error-khzpg
finishedAt: null
id: retry-on-error-khzpg
name: retry-on-error-khzpg
phase: Running
progress: 0/1
startedAt: "2022-11-04T12:08:12Z"
templateName: error-container
templateScope: local/retry-on-error-khzpg
type: Retry
retry-on-error-khzpg-550301540:
displayName: retry-on-error-khzpg(0)
finishedAt: null
id: retry-on-error-khzpg-550301540
name: retry-on-error-khzpg(0)
phase: Pending
progress: 0/1
startedAt: "2022-11-04T12:08:12Z"
templateName: error-container
templateScope: local/retry-on-error-khzpg
type: Pod
phase: Running
progress: 0/1
startedAt: "2022-11-04T12:08:12Z" It might be related to #6712 and #8748 but I'm not sure why it only happens for retry enabled workflows. FWIW: Pretty important for us, since we gather data based on the status of workflows and we can't match them to pods right now. |
I believe the retry strategy to be relevant because of And I believe the nodeID in status does get calculated wrongly here: Since I'm not sure what the proper course of action is to fix this, I won't create a PR for it. |
@JPZ13 Should be able to handle it first thing Monday. @sarabala1979 feel free to assign me if that timeline is okay with you |
commit cc9d14c introduces the bug I believe or rather makes the bug appear (this could just be the canary in the coal mine), I checked this with This is interesting because the json output from |
@isubasinghe @terrytangyuan unfortunately there is still a bug with the workflow status. example workflow: apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: nodename-
spec:
arguments: {}
entrypoint: render
templates:
- inputs: {}
metadata: {}
name: render
steps:
- - arguments:
parameters:
- name: frames
value: '{{item.frames}}'
name: run-blender
template: blender
withItems:
- frames: 1
- container:
image: argoproj/argosay:v2
command: ["/bin/sh", "-c"]
args:
- /argosay echo 0/100 $ARGO_PROGRESS_FILE && /argosay sleep 10s && /argosay echo 50/100 $ARGO_PROGRESS_FILE && /argosay sleep 10s
name: ""
inputs:
parameters:
- name: frames
name: blender
retryStrategy:
limit: 2
retryPolicy: Always yields the following status: apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
annotations:
workflows.argoproj.io/pod-name-format: v2
creationTimestamp: "2022-11-25T11:33:41Z"
generateName: nodename-
generation: 3
labels:
workflows.argoproj.io/phase: Running
name: nodename-bvd45
namespace: argo
resourceVersion: "15649"
uid: ea233eef-210d-4394-a238-ef847b104458
spec:
activeDeadlineSeconds: 300
arguments: {}
entrypoint: render
podSpecPatch: |
terminationGracePeriodSeconds: 3
templates:
- inputs: {}
metadata: {}
name: render
outputs: {}
steps:
- - arguments:
parameters:
- name: frames
value: '{{item.frames}}'
name: run-blender
template: blender
withItems:
- frames: 1
- container:
args:
- /argosay echo 0/100 $ARGO_PROGRESS_FILE && /argosay sleep 10s && /argosay
echo 50/100 $ARGO_PROGRESS_FILE && /argosay sleep 10s
command:
- /bin/sh
- -c
image: argoproj/argosay:v2
name: ""
resources: {}
inputs:
parameters:
- name: frames
metadata: {}
name: blender
outputs: {}
retryStrategy:
limit: 2
retryPolicy: Always
status:
artifactGCStatus:
notSpecified: true
artifactRepositoryRef:
artifactRepository:
archiveLogs: true
s3:
accessKeySecret:
key: accesskey
name: my-minio-cred
bucket: my-bucket
endpoint: minio:9000
insecure: true
secretKeySecret:
key: secretkey
name: my-minio-cred
configMap: artifact-repositories
key: default-v1
namespace: argo
conditions:
- status: "False"
type: PodRunning
finishedAt: null
nodes:
nodename-bvd45:
children:
- nodename-bvd45-701773242
displayName: nodename-bvd45
finishedAt: null
id: nodename-bvd45
name: nodename-bvd45
phase: Running
progress: 0/1
startedAt: "2022-11-25T11:33:41Z"
templateName: render
templateScope: local/nodename-bvd45
type: Steps
nodename-bvd45-701773242:
boundaryID: nodename-bvd45
children:
- nodename-bvd45-3728066428
displayName: '[0]'
finishedAt: null
id: nodename-bvd45-701773242
name: nodename-bvd45[0]
phase: Running
progress: 0/1
startedAt: "2022-11-25T11:33:41Z"
templateScope: local/nodename-bvd45
type: StepGroup
nodename-bvd45-3728066428:
boundaryID: nodename-bvd45
children:
- nodename-bvd45-3928099255
displayName: run-blender(0:frames:1)
finishedAt: null
id: nodename-bvd45-3728066428
inputs:
parameters:
- name: frames
value: "1"
name: nodename-bvd45[0].run-blender(0:frames:1)
phase: Running
progress: 0/1
startedAt: "2022-11-25T11:33:41Z"
templateName: blender
templateScope: local/nodename-bvd45
type: Retry
nodename-bvd45-3928099255:
boundaryID: nodename-bvd45
displayName: run-blender(0:frames:1)(0)
finishedAt: null
hostNodeName: k3d-argowf-server-0
id: nodename-bvd45-3928099255
inputs:
parameters:
- name: frames
value: "1"
message: PodInitializing
name: nodename-bvd45[0].run-blender(0:frames:1)(0)
phase: Pending
progress: 0/1
startedAt: "2022-11-25T11:33:41Z"
templateName: blender
templateScope: local/nodename-bvd45
type: Pod
phase: Running
progress: 0/1
startedAt: "2022-11-25T11:33:41Z" The pod is named Can you please reopen or should I create a new issue? |
@mweibel could you please tell me what the desired pod name should be? I have strong suspicions this is a controller/operator issue and different to the issue initially created, which was formatting based. If so this issue is distinct from the original issue, is it better to create a new issue to keep them atomic? |
Yeah I suspected that the issue at hand is because the argo workflow status doesn't contain the right node IDs and that's why the CLI is unable to access it. I'll create a new issue with the details. |
See #10107 |
Pre-requisites
:latest
What happened/what you expected to happen?
Run an example workflow https://github.com/argoproj/argo-workflows/blob/master/examples/retry-on-error.yaml
The Pod name in the result of argo get are wrong.
argo get retry-on-error-v2pk2 -n workflow
Name: retry-on-error-v2pk2
...
STEP TEMPLATE PODNAME DURATION MESSAGE
✖ retry-on-error-v2pk2 error-container No more retries left
├─⚠ retry-on-error-v2pk2(0) error-container retry-on-error-v2pk2-error-container-2869263017 26s Error (exit code 1): failed to put file: 404 Not Found
├─✖ retry-on-error-v2pk2(1) error-container retry-on-error-v2pk2-error-container-2427568992 4s Error (exit code 3)
└─✖ retry-on-error-v2pk2(2) error-container retry-on-error-v2pk2-error-container-816476283 4s Error (exit code 4)
kubectl get pods -n workflow
NAME READY STATUS RESTARTS AGE
retry-on-error-v2pk2-error-container-1195955417 0/2 Completed 0 6m17s
retry-on-error-v2pk2-error-container-1800096796 0/2 Error 0 5m41s
retry-on-error-v2pk2-error-container-3410203767 0/2 Error 0 5m31s
The UI works good
NAME
retry-on-error-v2pk2(0)
ID
retry-on-error-v2pk2-1195955417
POD NAME
retry-on-error-v2pk2-error-container-1195955417
Version
v3.4.1
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Not related
Logs from in your workflow's wait container
Not related
The text was updated successfully, but these errors were encountered: