You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ENH]: Dead letter queuing for compaction jobs (#5023)
## Description of changes
This change adds a dead letter queueing system to the compaction scheduler. If a compaction job on a collection fails `max_failure_count` times, it will be moved to a dead set that disables this collection from being compacted while it is in this set. As of this change, the only way to clear this set is by restarting the compaction process.
- Improvements & Bug fixes
- Added a failing_jobs map in the CompactionManager to help keep track of jobs that have failed on consecutive attempts.
- Added a dead_jobs set in the CompactionManager to record "dead" jobs.
- New functionality
- Described above.
- Added a metric `compactor_dead_jobs_count` to track the size of the dead jobs set.
## Test plan
Added a test in scheduler.rs.
Also manually tested by injecting failures in certain compaction jobs and tracking the dead set size metric locally.
- [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust
## Documentation Changes
_Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_
0 commit comments