-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More aggressive cleanup of old AnalysisRuns and Experiments #1214
Comments
IMO, a successful experiments/analysis has basically zero value after the service has been rolled out. A failed run actually does. I would prefer something like keep up to X failed runs before recycling them, and just delete successful runs soon after a deployment has completed. |
Sometimes we may need to keep some successful analysis run for debugging purpose (e.g., to compare the results between successful run and failed one). The proposed change could be
- analysis:
retentionSuccessfulRuns: 1 # number of successful run to keep, default is 0
retentionFailedRuns: 2 # number of failed run to keep, default is 2 What do you think? |
That would work for me |
Cool! Let me try drafting a PR. |
Here is one example Suppose that a rollout has 6 revisions (each revision contains some analysis runs), and revision history is 3.
The pseudo-code of reconciling the revision would be
If retentionSuccessfulRuns and retentionUnSuccessfulRuns are 0, the code will remove analysis runs for the retained revision. I am not sure if having two knobs is an overkill, maybe a single @jessesuen, what do you think? |
I'll take this one |
PR #1342 |
Summary
The current default behavior for deleting old AnalysisRuns and Experiments, is that a Rollout will keep around the old objects for the same amount of
spec.revisionHistoryLimit
, which defaults to 10. This seems to be too much for users, who don't really care to keep these around that long.I think the default should be changed to delete the old objects when the Rollout reaches Healthy (a.k.a. Completed) state. This will declutter the namespace and things like the Argo UI.
If we change the default to delete old objects more aggressively, one question is if should we provide a knob to increase the retention of old AnalysisRuns/Experiments (e.g. for debugging purposes)?
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: