More aggressive cleanup of old AnalysisRuns and Experiments #1214

jessesuen · 2021-05-22T00:25:05Z

Summary

The current default behavior for deleting old AnalysisRuns and Experiments, is that a Rollout will keep around the old objects for the same amount of spec.revisionHistoryLimit, which defaults to 10. This seems to be too much for users, who don't really care to keep these around that long.

I think the default should be changed to delete the old objects when the Rollout reaches Healthy (a.k.a. Completed) state. This will declutter the namespace and things like the Argo UI.

If we change the default to delete old objects more aggressively, one question is if should we provide a knob to increase the retention of old AnalysisRuns/Experiments (e.g. for debugging purposes)?

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

MarkSRobinson · 2021-05-23T03:12:26Z

IMO, a successful experiments/analysis has basically zero value after the service has been rolled out. A failed run actually does. I would prefer something like keep up to X failed runs before recycling them, and just delete successful runs soon after a deployment has completed.

huikang · 2021-05-25T02:09:58Z

Sometimes we may need to keep some successful analysis run for debugging purpose (e.g., to compare the results between successful run and failed one). The proposed change could be

decouple the retention of analysisrun from spec.revisionHistoryLimit
introduce two retention knobs:

        - analysis:
               retentionSuccessfulRuns: 1 # number of successful run to keep, default is 0
               retentionFailedRuns: 2 # number of failed run to keep, default is 2

What do you think?

MarkSRobinson · 2021-05-25T05:39:51Z

That would work for me

huikang · 2021-05-25T14:17:11Z

That would work for me

Cool! Let me try drafting a PR.

huikang · 2021-05-25T18:17:06Z

Here is one example

Suppose that a rollout has 6 revisions (each revision contains some analysis runs), and revision history is 3.

Before reconciling the revisions

rev-6 (3 analysis run)
rev-5 (3 analysis run)
rev-4 (3 analysis run)
rev-3 (3 analysis run)
rev-2 (3 analysis run)
rev-1 (3 analysis run)

The pseudo-code of reconciling the revision would be

 1. since the revisionhistorylimit is 3, the current code deletes the replicasets of rev-1, rev-2, and rev3
     along with their analysis runs.
 2. if the rollout status is healthy
         for rev := range {rev-6, 5, 4} {
             keep retentionSuccessfulRuns for successful annalysis runs;
             keep retentionUnSuccessfulRuns for runs of other types;
         }

If retentionSuccessfulRuns and retentionUnSuccessfulRuns are 0, the code will remove analysis runs for the retained revision.

I am not sure if having two knobs is an overkill, maybe a single analysisRunHistoryLimit is enough.

@jessesuen, what do you think?

perenesenko · 2021-06-17T20:24:02Z

I'll take this one

perenesenko · 2021-07-14T19:30:21Z

PR #1342

jessesuen added the enhancement New feature or request label May 22, 2021

jessesuen added this to the v1.1 milestone May 22, 2021

huikang mentioned this issue May 23, 2021

AR doesn't clean up old analysis run objects #1181

Closed

jessesuen assigned perenesenko Jun 17, 2021

jessesuen mentioned this issue Aug 5, 2021

feat: configurable and more aggressive cleanup of old AnalysisRuns and Experiments #1342

Merged

6 tasks

jessesuen closed this as completed in #1342 Aug 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More aggressive cleanup of old AnalysisRuns and Experiments #1214

More aggressive cleanup of old AnalysisRuns and Experiments #1214

jessesuen commented May 22, 2021

MarkSRobinson commented May 23, 2021

huikang commented May 25, 2021

MarkSRobinson commented May 25, 2021

huikang commented May 25, 2021

huikang commented May 25, 2021

perenesenko commented Jun 17, 2021

perenesenko commented Jul 14, 2021

More aggressive cleanup of old AnalysisRuns and Experiments #1214

More aggressive cleanup of old AnalysisRuns and Experiments #1214

Comments

jessesuen commented May 22, 2021

Summary

MarkSRobinson commented May 23, 2021

huikang commented May 25, 2021

MarkSRobinson commented May 25, 2021

huikang commented May 25, 2021

huikang commented May 25, 2021

perenesenko commented Jun 17, 2021

perenesenko commented Jul 14, 2021