Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Link memoization and work-avoidance #11257

Merged
merged 3 commits into from
Jun 30, 2023

Conversation

Joibel
Copy link
Member

@Joibel Joibel commented Jun 23, 2023

Despite #10769 and #10426 both having examples of memoization not working with the examples having no output, no-one has picked up on this.

To address this improve the documentation for memoization and work-avoidance, linking the two ideas and pointing people who want to skip steps towards work-avoidance unless they are really doing what memoization was designed to do.

Issue #10426 is problematic in that some steps get memoized when perhaps they should't, so this commit shouldn't close it.

Fixes #10769

Despite argoproj#10769 and argoproj#10426 both having examples of memoization not
working with the examples having no output, no-one has picked up on
this.

To address this improve the documentation for memoization and
work-avoidance, linking the two ideas and pointing people who want to
skip steps towards work-avoidance unless they are really doing what
memoization was designed to do.

Issue argoproj#10426 is problematic in that some steps get memoized when
perhaps they should't, so this commit shouldn't close it.

Fixes argoproj#10769

Signed-off-by: Alan Clucas <alan@clucas.org>
@Joibel Joibel marked this pull request as ready for review June 26, 2023 13:44
Copy link
Member

@tico24 tico24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

it stores the outputs of a template into a specified cache with a variable key.

Memoization only works for steps which have outputs, if you attempt to use it on steps which do not it should not work (there are some cases where it does, but they shouldn't). It is designed for 'pure' steps, where the purpose of running the step is to calculate some outputs based upon the steps inputs, and only the inputs. Pure steps should not interact with the outside world, but workflows won't enforce this on you.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"steps inputs" needs an appropriate apostrophe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2,7 +2,11 @@

> v2.9 and after
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we ever work out why this is a v2.9 feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, unless PVCs were introduced then.

Signed-off-by: Alan Clucas <alan@clucas.org>
@moleskin-smile
Copy link

Thanks for clarifying on intended usage of Argo Workflows features.

Since an issue I've submitted triggered this change, let me describe our use case.

We're using Argo Workflows for CI and we're using memoization for work avoidance and I think it's valid usage. We don't want to run tests again for the same sets of inputs. So our expensive to compute output is... the information that tests pass for the given set of inputs. And we're happy that hooks run for memoized steps, because what we really need is green GitHub status check.

"Work avoidance" is not really Argo Workflows feature. It's just a technique that could be used to implement caching anywhere (which we also use with Argo in some places). But in order to not "abuse" memoization (replace memoization with "work avoidance"), we would have to repeat the same (part of) shell script in every template we want to cache, add input and output artifacts, handle artifact lifecycle etc. That's a lot of repeated YAML and shell code. And also pods would still be scheduled, run, they would fetch all input artifacts (cloning repositories for example) and would waste requested resources, increasing costs by a lot (especially for templates requesting a lot of cpu and memory). Why do that if we could just say "don't run this again for this set of inputs" which is exactly what memoization does?

Why not treat the success of template run part of its outputs and allow to memoize templates without outputs?

Memoization works really good when it works (when we don't run into 1MB limit of configmaps or this). It makes cached steps to be instant and makes Argo Workflows possible with monorepos.

(Let me know if this is not the right place to raise this concern).

@mnarozny
Copy link

I would also argue that this default approach is a bit odd to me. The memoization feature seems extremely powerful and restricting it to only steps with output seems counterintuitive when in a lot of cases it is the outcome with a set of parameters that matters, not necessarily the pure output. So if I memoized on my inputs and knew that the steps will produce the same output every time, I was expecting it not to have to run but give me a stamp of approval on the top level.

I mean I could put all the steps inside into a single template and memoize it but it seems counter intuitive when I can divide it into smaller reusable templates and then mix and match them and memoize the whole thing because the outcome of that whole template will never change for the same set of inputs.

@Joibel
Copy link
Member Author

Joibel commented Jun 30, 2023

I understand that people are unhappy that memoization doesn't work without outputs, and would like it to work.

This PR doesn't change any behaviour, it just tries to document the current behaviour, which even in the event of a PR to change the behaviour of memoziation needs documenting for users prior to that change.

@terrytangyuan, I'd like you to consider merging this as is unless you feel it's factually incorrect.

I have raised #11280 as a proposal to implement memoization as @moleskin-smile and @mnarozny would like it to work to see what the feelings of the community are.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@terrytangyuan terrytangyuan merged commit 476cd7a into argoproj:master Jun 30, 2023
JPZ13 pushed a commit to pipekit/argo-workflows that referenced this pull request Jul 4, 2023
You can make workflows faster and more robust by employing **work avoidance**. A workflow that utilizes this is simply a workflow containing steps that do not run if the work has already been done. This simplest way to do this is to use **marker files**.
You can make workflows faster and more robust by employing **work avoidance**. A workflow that utilizes this is simply a workflow containing steps that do not run if the work has already been done.

This technique is similar to [memoization](memoization.md) but they have distinct use cases. Work avoidance is totally in your control and you make the decisions as to have to skip the work. [Memoization](memoization.md) is a feature of Argo Workflows to automatically skip steps which generate outputs - it is designed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence here was incomplete; it ends with "it is designed". Not sure how it was intended to be written

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, erm... yeah, I'll finish that off. Thanks @agilgur5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #11357 . Thanks!

Joibel added a commit to Joibel/argo-workflows that referenced this pull request Jul 14, 2023
Complete my sentence from PR argoproj#11257

Signed-off-by: Alan Clucas <alan@clucas.org>
Joibel added a commit to Joibel/argo-workflows that referenced this pull request Jul 14, 2023
Complete my sentence from PR argoproj#11257

Signed-off-by: Alan Clucas <alan@clucas.org>
@Joibel Joibel deleted the memo-workavoidance branch November 20, 2023 21:38
@agilgur5 agilgur5 added the area/docs Incorrect, missing, or mistakes in docs label Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docs Incorrect, missing, or mistakes in docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memoization doesn't work in templates that are steps
6 participants