Skip to content

Resumable osmosis eval with Cache Management #77

@JoyboyBrian

Description

@JoyboyBrian

Problem or Use Case

osmosis eval runs are often long (large datasets, multiple runs per row, external model calls) and can be interrupted by network issues, process restarts, machine shutdowns, or quota/auth failures.
When that happens, users may need to restart from scratch, which causes:

  • Repeated API/computation cost for already-completed runs
  • Longer experiment iteration cycles
  • Higher risk of conflicting state when duplicate evals run @concurrently
  • No clear built-in workflow to inspect, filter, and clean old cache entries

Proposed Solution

Add first-class resumable execution and cache lifecycle management for osmosis eval:

  • Persist eval progress/results to disk with a stable task ID derived from config + source/data fingerprints
  • Auto-resume when re-running the same command after interruption
  • Add --fresh to force a clean rerun and --retry-failed to rerun only failed runs
  • Add osmosis eval cache subcommands for cache inspection and cleanup (dir, ls, rm)
  • Use file locking + atomic writes to ensure consistency and prevent concurrent corruption
  • Detect dataset changes during/after runs and warn or fail with actionable guidance
  • Support --log-samples and structured output directories for better debugging/auditing

Alternatives Considered

No response

SDK Component

None

Additional Context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestevalEval/Rubric evaluation

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions