Skip to content

Conversation

david-thrower
Copy link
Owner

Item 1: Bug fix:

When a Keras metric returns a non-float value like "inf", it would break casting to float / computing .min() / .max() for finding the best performing model.

Item 2:

  • A model for each sub-trial are stored in the experiment folder Cerebros creates.
  • In most cases only the best one from that meta-trial needs preserved.
  • The models cached can hold up a lot of disk space, especially for NLP / CV problems.
  • When running in a container [unless you have a mounted volume] these are stored in memory / in RAM, and the RAM pressure, therefore becomes cumulative.
  • With the new feature, by default, this cache is not purged to maintain backward compatibility. When you explicitly set purge_model_storage_files=True when calling cerebros.simile_rabdom_search.SimpleRandomSearch().get_best_model(), the the storage will be purged after returning the best model.

Item 3:

Appended the 2 Ames CICD tests with a positive case and negative case test to endure this feature works when explicitly called and is not triggered unless called explicitly / proving backward compatibility.

Duplicate vetted bug fix (keras metrics returning "inf" or other non-float types) from #234  , #230 without other non-vetted changes.
Trigger CICD suite to run tests.
Add functionality from #230 to purge model storage after each meta - trail.
Add negative case test for purge_model_storage.
Trigger tests.
Add positive case test for assert purge_model_storage_files.
Added corrected CICD test for positive case for purge_model_storage_files.
Add better CICD test for negative case for purge_model_storage_files.
Refactor best metric error handling to allow +/- inf metrics (these actually will work in  current versions of pd.Series.min() / ....max() ), but still exclude other arbitrary types (str, Exception)...
Copy link
Collaborator

@Thunderblok Thunderblok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ What’s good

Fix for non-float metrics (e.g., "inf") avoids best-metric crashes.

New purge_model_storage_files=True option frees disk/RAM after picking best sub-trial model.

Tests include positive & negative cases to prove opt-in and back-compat.

⚠️ Small asks before merge

Metric safety

Coerce values via a helper that treats NaN/±inf as non-finite and falls back cleanly.

Add a tiny table test for ["1.0","inf","-inf","nan", None].

Purge safety

Guard against symlinks and .. paths; don’t follow links.

Wrap purge in try/except so failure doesn’t break get_best_model().

Log simple stats: {dirs_deleted, files_deleted, bytes_freed, duration_ms}.

DX/Docs

One line in README: “Use purge_model_storage_files=True to reclaim space (especially in containers).”

👍 Nice-to-have (don’t block)

Warn if artifacts are on Docker overlay (no volume mount).

Env toggle CEREBROS_PURGE_DEFAULT for ops environments.

✅ Verdict

Approve with the above nits. The change is backward-compatible, fixes a real footgun, and gives operators a clean way to control disk/RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

copy-purge-model-storage-functionality-for-main
2 participants