Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lifecycle hooks are broken when applying via ConfigMap #11095

Closed
2 of 3 tasks
adrvolan opened this issue May 17, 2023 · 3 comments · Fixed by #11214
Closed
2 of 3 tasks

Lifecycle hooks are broken when applying via ConfigMap #11095

adrvolan opened this issue May 17, 2023 · 3 comments · Fixed by #11214
Assignees
Labels
P3 Low priority type/bug

Comments

@adrvolan
Copy link

adrvolan commented May 17, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Hi, I've been trying to add a new lifecycle hook to all my Workflows using ConfigMaps.

To keep it simple, when I try to add a deprecated onExit hook, everything works smoothly, but when I try to apply it via the new hooks: {exit:{...}} all of the Workflows return a status code Error: merging object in json but data type is not struct, instead is: map

You can see the hooks in the manifest, and in the Argo UI, but Tasks return with an Error and never go out of the Pending state
I've managed to dig a little bit into the code to find out it is an error in setExecWorkflow and its call stack but that's as far as my non existent Go knowledge let me.

The above is just an example of two implementations that do the same but one works, the other does not, and I would prefer to stick to the new one, as new hooks with expressions are something that would interest me the most.

I will attach the ConfigMap instead of a Workflow, as you can test it with any simple or default one that the UI proposes. Any other ConfigMap outside of hooks in workflowDefaults works just fine on my local setup.

If there is a workaround to run a separate node 'next to' the main workflow without knowing what's inside of it, I would be more than glad to hear about it.

Version

Tested on 3.4.7 and 3.9.9

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
  namespace: argo
data:
  workflowDefaults: |
    metadata:
      labels:
        abc: abc
      annotations:
        argo: workflows
      namespace: argo
    spec:
      hooks:
        running: 
          template: heads
          expression: workflow.status == "Pending"
      templates:
        - name: heads
          container:
            image: alpine:3.6
            command: [sh, -c]
            args: ["echo \"it was heads\""]

Logs from the workflow controller

time="2023-05-17T12:19:08.196Z" level=info msg="Processing workflow" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.203Z" level=info msg="Get configmaps 404"
time="2023-05-17T12:19:08.204Z" level=warning msg="Non-transient error: configmaps \"artifact-repositories\" not found"
time="2023-05-17T12:19:08.204Z" level=info msg="resolved artifact repository" artifactRepositoryRef=default-artifact-repository
time="2023-05-17T12:19:08.204Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.204Z" level=info msg="Pod node lovely-python initialized Pending" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.214Z" level=info msg="Create events 201"
time="2023-05-17T12:19:08.219Z" level=info msg="Create pods 201"
time="2023-05-17T12:19:08.220Z" level=info msg="Created pod: lovely-python (lovely-python)" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.221Z" level=info msg="Running workflow level hooks" lifeCycleHook=running namespace=argo node=lovely-python.hooks.running workflow=lovely-python
time="2023-05-17T12:19:08.221Z" level=info msg="Pod node lovely-python-2604268888 initialized Pending" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.231Z" level=info msg="Create pods 201"
time="2023-05-17T12:19:08.232Z" level=info msg="Created pod: lovely-python.hooks.running (lovely-python-2604268888)" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.232Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.232Z" level=info msg=reconcileAgentPod namespace=argo workflow=lovely-python
time="2023-05-17T12:19:08.232Z" level=info msg="Workflow to be dehydrated" Workflow Size=1392
time="2023-05-17T12:19:08.242Z" level=info msg="Update workflows 200"
time="2023-05-17T12:19:08.244Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=40795 workflow=lovely-python
time="2023-05-17T12:19:10.523Z" level=info msg="Get leases 200"
time="2023-05-17T12:19:10.526Z" level=info msg="Update leases 200"
time="2023-05-17T12:19:15.529Z" level=info msg="Get leases 200"
time="2023-05-17T12:19:15.533Z" level=info msg="Update leases 200"
time="2023-05-17T12:19:16.307Z" level=info msg="List workflows 200"
time="2023-05-17T12:19:16.307Z" level=info msg=healthz age=5m0s err="<nil>" instanceID= labelSelector="!workflows.argoproj.io/phase,!workflows.argoproj.io/controller-instanceid" managedNamespace=
time="2023-05-17T12:19:18.228Z" level=info msg="Processing workflow" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:18.228Z" level=info msg="Updated phase Running -> Error" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:18.228Z" level=info msg="Updated message  -> merging an object in json but data type is not struct, instead is: map" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:18.228Z" level=info msg="Marking workflow completed" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:18.228Z" level=error msg="Unable to set ExecWorkflow" error="merging an object in json but data type is not struct, instead is: map" namespace=argo workflow=lovely-python
time="2023-05-17T12:19:18.228Z" level=info msg="Checking daemoned children of " namespace=argo workflow=lovely-python
time="2023-05-17T12:19:18.228Z" level=info msg="Workflow to be dehydrated" Workflow Size=1768
time="2023-05-17T12:19:18.233Z" level=info msg="Create events 201"
time="2023-05-17T12:19:18.233Z" level=info msg="cleaning up pod" action=deletePod key=argo/lovely-python-1340600742-agent/deletePod
time="2023-05-17T12:19:18.234Z" level=info msg="Update workflows 200"
time="2023-05-17T12:19:18.235Z" level=info msg="Workflow update successful" namespace=argo phase=Error resourceVersion=40840 workflow=lovely-python
time="2023-05-17T12:19:18.235Z" level=info msg="Queueing Error workflow argo/lovely-python for delete in 5m0s due to TTL"
time="2023-05-17T12:19:18.235Z" level=info msg="Delete pods 404"
time="2023-05-17T12:19:18.237Z" level=info msg="DeleteCollection workflowtaskresults 200"
time="2023-05-17T12:19:19.013Z" level=info msg="Watch configmaps 200"

Logs from in your workflow's wait container

time="2023-05-17T12:19:10.310Z" level=info msg="Creating a emissary executor"
time="2023-05-17T12:19:10.310Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-05-17T12:19:10.310Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo podName=lovely-python template="{\"name\":\"argosay\",\"inputs\":{\"parameters\":[{\"name\":\"message\",\"value\":\"hello argo\"}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"main\",\"image\":\"argoproj/argosay:v2\",\"command\":[\"/argosay\"],\"args\":[\"echo\",\"hello argo\"],\"resources\":{}}}" version="&Version{Version:v3.3.9,BuildDate:2022-08-09T23:59:38Z,GitCommit:5db53aa0ca54e51ca69053e1d3272e37064559d7,GitTag:v3.3.9,GitTreeState:clean,GoVersion:go1.17.13,Compiler:gc,Platform:linux/amd64,}"
time="2023-05-17T12:19:10.310Z" level=info msg="Starting deadline monitor"
time="2023-05-17T12:19:12.311Z" level=info msg="Main container completed"
time="2023-05-17T12:19:12.311Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-05-17T12:19:12.311Z" level=info msg="No output parameters"
time="2023-05-17T12:19:12.311Z" level=info msg="No output artifacts"
time="2023-05-17T12:19:12.311Z" level=info msg="Killing sidecars []"
time="2023-05-17T12:19:12.311Z" level=info msg="Alloc=6055 TotalAlloc=11379 Sys=19666 NumGC=4 Goroutines=6"
time="2023-05-17T12:19:10.308Z" level=info msg="Creating a emissary executor"
time="2023-05-17T12:19:10.308Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-05-17T12:19:10.308Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo podName=lovely-python-2604268888 template="{\"name\":\"heads\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"alpine:3.6\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo \\\"it was heads\\\"\"],\"resources\":{}}}" version="&Version{Version:v3.3.9,BuildDate:2022-08-09T23:59:38Z,GitCommit:5db53aa0ca54e51ca69053e1d3272e37064559d7,GitTag:v3.3.9,GitTreeState:clean,GoVersion:go1.17.13,Compiler:gc,Platform:linux/amd64,}"
time="2023-05-17T12:19:10.308Z" level=info msg="Starting deadline monitor"
time="2023-05-17T12:19:12.310Z" level=info msg="Main container completed"
time="2023-05-17T12:19:12.310Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-05-17T12:19:12.310Z" level=info msg="No output parameters"
time="2023-05-17T12:19:12.310Z" level=info msg="No output artifacts"
time="2023-05-17T12:19:12.310Z" level=info msg="Killing sidecars []"
time="2023-05-17T12:19:12.310Z" level=info msg="Alloc=6395 TotalAlloc=11321 Sys=19666 NumGC=3 Goroutines=6"
@sarabala1979 sarabala1979 added the P3 Low priority label May 25, 2023
@sarabala1979
Copy link
Member

@adrvolan it looks like a merging issue with default workflow and original workflow.
Can you provide the original workflow spec?

@adrvolan
Copy link
Author

Hi @sarabala1979, thank you for your reply
It's the default workflow from Argo UI (to test the configuration) like here:

metadata:
  name: fantastic-bear
  namespace: argo
  labels:
    example: 'true'
spec:
  arguments:
    parameters:
      - name: message
        value: hello argo
  entrypoint: argosay
  templates:
    - name: argosay
      inputs:
        parameters:
          - name: message
            value: '{{workflow.parameters.message}}'
      container:
        name: main
        image: 'argoproj/argosay:v2'
        command:
          - /argosay
        args:
          - echo
          - '{{inputs.parameters.message}}'
  ttlStrategy:
    secondsAfterCompletion: 300
  podGC:
    strategy: OnPodCompletion

And after creating it:

spec:
  templates:
    - name: argosay
      inputs:
        parameters:
          - name: message
            value: '{{workflow.parameters.message}}'
      outputs: {}
      metadata: {}
      container:
        name: main
        image: 'argoproj/argosay:v2'
        command:
          - /argosay
        args:
          - echo
          - '{{inputs.parameters.message}}'
        resources: {}
    - name: heads
      inputs: {}
      outputs: {}
      metadata: {}
      container:
        name: ''
        image: 'alpine:3.6'
        command:
          - sh
          - '-c'
        args:
          - echo "it was heads"
        resources: {}
  entrypoint: argosay
  arguments:
    parameters:
      - name: message
        value: hello argo
  ttlStrategy:
    secondsAfterCompletion: 300
  podGC:
    strategy: OnPodCompletion
  hooks:
    running:
      template: heads
      arguments: {}
      expression: workflow.status == "Pending"

The hook node even shows in the UI as Pending and the manifest doesn't seem to have any obvious errors with it. If instead of this hook implementation I want to add any other template like DAG and its tasks, or the mentioned deprecated onExit it works just fine, simply as long as I don't apply hooks (even if they are not used).

@Joibel
Copy link
Member

Joibel commented Jun 13, 2023

Error: merging object in json but data type is not struct, instead is: map is the key problem here - the merging of the default workflow into the actual workflow. This is caused by golang/go#33487 (other issues documenting the problem are available). I have a fix for it that needs test cases writin - it is necessarily a bit ugly though

Joibel added a commit to Joibel/argo-workflows that referenced this issue Jun 14, 2023
Any hook in workflowDefaults will cause `Error: merging object in json
but data type is not struct, instead is: map`.

This is down to how StrategicMergePatch handles maps of objects and
limitations in merging them (a starting point for understading this is
golang/go#33487).

Instead, we copy the map of hooks, patch it out of the patch, perform
the merge as before and then manually apply it.

fixes: argoproj#11095

Signed-off-by: Alan Clucas <alan@clucas.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low priority type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants