Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More aggressive cleanup of old AnalysisRuns and Experiments #1214

Closed
jessesuen opened this issue May 22, 2021 · 7 comments · Fixed by #1342
Closed

More aggressive cleanup of old AnalysisRuns and Experiments #1214

jessesuen opened this issue May 22, 2021 · 7 comments · Fixed by #1342
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@jessesuen
Copy link
Member

Summary

The current default behavior for deleting old AnalysisRuns and Experiments, is that a Rollout will keep around the old objects for the same amount of spec.revisionHistoryLimit, which defaults to 10. This seems to be too much for users, who don't really care to keep these around that long.

I think the default should be changed to delete the old objects when the Rollout reaches Healthy (a.k.a. Completed) state. This will declutter the namespace and things like the Argo UI.

If we change the default to delete old objects more aggressively, one question is if should we provide a knob to increase the retention of old AnalysisRuns/Experiments (e.g. for debugging purposes)?


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@jessesuen jessesuen added the enhancement New feature or request label May 22, 2021
@jessesuen jessesuen added this to the v1.1 milestone May 22, 2021
@MarkSRobinson
Copy link
Contributor

IMO, a successful experiments/analysis has basically zero value after the service has been rolled out. A failed run actually does. I would prefer something like keep up to X failed runs before recycling them, and just delete successful runs soon after a deployment has completed.

@huikang
Copy link
Member

huikang commented May 25, 2021

Sometimes we may need to keep some successful analysis run for debugging purpose (e.g., to compare the results between successful run and failed one). The proposed change could be

  1. decouple the retention of analysisrun from spec.revisionHistoryLimit
  2. introduce two retention knobs:
        - analysis:
               retentionSuccessfulRuns: 1 # number of successful run to keep, default is 0
               retentionFailedRuns: 2 # number of failed run to keep, default is 2

What do you think?

@MarkSRobinson
Copy link
Contributor

That would work for me

@huikang
Copy link
Member

huikang commented May 25, 2021

That would work for me

Cool! Let me try drafting a PR.

@huikang
Copy link
Member

huikang commented May 25, 2021

Here is one example

Suppose that a rollout has 6 revisions (each revision contains some analysis runs), and revision history is 3.

  • Before reconciling the revisions
rev-6 (3 analysis run)
rev-5 (3 analysis run)
rev-4 (3 analysis run)
rev-3 (3 analysis run)
rev-2 (3 analysis run)
rev-1 (3 analysis run)

The pseudo-code of reconciling the revision would be

 1. since the revisionhistorylimit is 3, the current code deletes the replicasets of rev-1, rev-2, and rev3
     along with their analysis runs.
 2. if the rollout status is healthy
         for rev := range {rev-6, 5, 4} {
             keep retentionSuccessfulRuns for successful annalysis runs;
             keep retentionUnSuccessfulRuns for runs of other types;
         }

If retentionSuccessfulRuns and retentionUnSuccessfulRuns are 0, the code will remove analysis runs for the retained revision.

I am not sure if having two knobs is an overkill, maybe a single analysisRunHistoryLimit is enough.

@jessesuen, what do you think?

@perenesenko
Copy link
Member

I'll take this one

@perenesenko
Copy link
Member

PR #1342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants