Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argo submit slow when calling multiple templates. suggest improving caching #7418

Open
rwong2888 opened this issue Dec 15, 2021 · 6 comments
Labels

Comments

@rwong2888
Copy link
Contributor

I'm trying argo -n argo submit --from clusterworkflowtemplate/<template> -v --gloglevel=9 ... and it is taking about 10-12 seconds to complete. Looks like it spends most time resolving all the templates...

Submit from UI is also slow
https://slack-files.com/T08PSQ7BQ-F02QV220QM8-f33032ca57

Logs
https://cloud-native.slack.com/archives/C01QW9QSSSK/p1639579998354400?thread_ts=1639420502.300600&cid=C01QW9QSSSK

Slack Archive for reference
https://cloud-native.slack.com/archives/C01QW9QSSSK/p1639420502300600

@rwong2888 rwong2888 added the type/feature Feature request label Dec 15, 2021
@crenshaw-dev
Copy link
Member

A couple things I noticed from the logs:

  1. Requests are being throttled. Maybe tweaking the --qps and --burst flags on the workflow-controller could help?
  2. Templates are being fetched multiple times. Is there a bug in the caching mechanism / is any caching happening?

@terrytangyuan
Copy link
Member

Which version are you using? Are you also seeing this issue in older versions?

@rwong2888
Copy link
Contributor Author

@terrytangyuan v3.2.4

@alexec alexec added the area/controller Controller issues, panics label Feb 7, 2022
@pnerg
Copy link

pnerg commented Jul 5, 2024

A ping to bump the interest of this issue.
It is still after these years very much a problem.

Don't need any load to re-produce the issue.
Just create a simple template that refers to n other templates. For each reference added the submit time is increased linearly so it's definitely something with the lookup/cache of ref:ed templates.

I'd be willing to take a stab at helping out if someone would point me in the right direction.

@rwong2888
Copy link
Contributor Author

hi @pnerg, this is a 3 year old ticket. I have since refactored my workflow, gone through many argo workflow upgrades, and rarely submit from the UI. I am not sure if this is still applicable. I actually intend on tweaking the default qps/burst which Michael has recommended as part of improving the overall workflow speed.

@emilebui
Copy link

emilebui commented Aug 15, 2024

Hi @rwong2888 @terrytangyuan

The issue still exists, I've tested this on the latest version v3.5.10.

Here is the test case.

I've created 3 simple templates like this

apiVersion: argoproj.io/v1alpha1
metadata:
  name: template-ref-1
spec:
  templates:
    - name: runner
      container:
        image: argoproj/argosay:v2
        command:
          - /argosay
        args:
          - echo
          - finished running template 1
  entrypoint: runner

And then create other templates that use these 3 templates for reference

Template using-template-ref

apiVersion: argoproj.io/v1alpha1
metadata:
  name: using-template-ref
spec:
  templates:
    - name: runner
      steps:
        - - name: template-run
            templateRef:
              name: template-ref-1
              template: runner
        - - name: template-run2
            templateRef:
              name: template-ref-2
              template: runner
        - - name: template-run3
            templateRef:
              name: template-ref-3
              template: runner
  entrypoint: runner

And another one for the scale up test
Template using-template-ref-x15

apiVersion: argoproj.io/v1alpha1
metadata:
  name: using-ref-template-x15
spec:
  templates:
    - name: runner
      steps:
        - - name: template-run
            templateRef:
              name: template-ref-1
              template: runner
        - - name: template-run2
            templateRef:
              name: template-ref-2
              template: runner
        - - name: template-run3
            templateRef:
              name: template-ref-3
              template: runner
        - - name: template-run4
            templateRef:
              name: template-ref-1
              template: runner
        - - name: template-run5
            templateRef:
              name: template-ref-2
              template: runner
        - - name: template-run6
            templateRef:
              name: template-ref-3
              template: runner
        - - name: template-run7
            templateRef:
              name: template-ref-1
              template: runner
        - - name: template-run8
            templateRef:
              name: template-ref-2
              template: runner
        - - name: template-run9
            templateRef:
              name: template-ref-3
              template: runner
        - - name: template-run10
            templateRef:
              name: template-ref-1
              template: runner
        - - name: template-run11
            templateRef:
              name: template-ref-2
              template: runner
        - - name: template-run12
            templateRef:
              name: template-ref-3
              template: runner
        - - name: template-run13
            templateRef:
              name: template-ref-1
              template: runner
        - - name: template-run14
            templateRef:
              name: template-ref-2
              template: runner
        - - name: template-run15
            templateRef:
              name: template-ref-3
              template: runner
  entrypoint: runner

Here is the result when I run these templates using template-ref:

The API POST /v1/workflows/argo/submit took around 40ms (34ms/42ms/45ms) for the using-template-ref template

image

And when I run the using-ref-template-x15 template, it took around 430ms (422ms/426ms/450ms) for the API POST /v1/workflows/argo/submit to finish

image
image

This prove that the performance decreases quite significantly the greater the number of template-refs used. Even though, in this test case, it is just the same 3 templates being referenced multiple times.

However, the weird thing is that if we just use inline template rather than using template-refs. The performance of the API is significantly. Here is the test case

Template without-using-templateref-x15

apiVersion: argoproj.io/v1alpha1
metadata:
  name: without-using-templateref-x15
spec:
  templates:
    - name: runner
      steps:
        - - name: run1
            template: template-1
        - - name: run2
            template: template-2
        - - name: run3
            template: template-3
        - - name: run4
            template: template-1
        - - name: run5
            template: template-2
        - - name: run6
            template: template-3
        - - name: run7
            template: template-1
        - - name: run8
            template: template-2
        - - name: run9
            template: template-3
        - - name: run10
            template: template-1
        - - name: run11
            template: template-2
        - - name: run12
            template: template-3
        - - name: run13
            template: template-1
        - - name: run14
            template: template-2
        - - name: run15
            template: template-3
    - name: template-1
      container:
        image: argoproj/argosay:v2
        command:
          - /argosay
        args:
          - echo
          - finished running template 1
    - name: template-2
      container:
        image: argoproj/argosay:v2
        command:
          - /argosay
        args:
          - echo
          - finished running template 2
    - name: template-3
      container:
        image: argoproj/argosay:v2
        command:
          - /argosay
        args:
          - echo
          - finished running template 3
  entrypoint: runner

Here is the result of running this template:

image
image

It only took around ~50ms (40ms/60ms/48ms) to finish submitting the workflow which is only 1/10 of the time using template-refs and it does literally the same thing.

I have look at the code and I think these lines may be the cause as I don't see any cache in here, it may validate the same template-ref again and again but please do correct me if I'm wrong

// Check if all templates can be resolved.
// If the Workflow is using a WorkflowTemplateRef, then the templates of the referred WorkflowTemplate will be validated.
if hasWorkflowTemplateRef {
for _, template := range wfSpecHolder.GetWorkflowSpec().Templates {
_, err := ctx.validateTemplateHolder(&wfv1.WorkflowStep{TemplateRef: wf.Spec.WorkflowTemplateRef.ToTemplateRef(template.Name)}, tmplCtx, &FakeArguments{}, opts.WorkflowTemplateValidation)
if err != nil {
return errors.Errorf(errors.CodeBadRequest, "templates.%s %s", template.Name, err.Error())
}
}
return nil
}

Could you please take another look at this issue again @terrytangyuan @rwong2888
Thank you so much in advance!

@agilgur5 agilgur5 changed the title argo submit slow when calling multiple templates. suggest improving caching argo submit slow when calling multiple templates. suggest improving caching Aug 18, 2024
@agilgur5 agilgur5 added area/workflow-templates area/api Argo Server API and removed area/controller Controller issues, panics labels Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants