-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[BUG] Coalesce when multiple collections return the same info to compact #4946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Deduplicate Collection Info During Compaction Logic in grpc_log.rs This PR addresses a bug in the compaction process where duplicate collection information could result in non-deterministic behavior. The core logic in grpc_log.rs has been modified to consistently coalesce and deduplicate collection info entries, ensuring that each collection appears only once, and the entry with the smallest (oldest) log offset is selected if duplicates exist. Additional comments have been added to clarify the deduplication approach, and unit testing has been improved to cover these changes. Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
all_collections.sort_by_key(|x| (x.collection_id, x.first_log_offset)); | ||
all_collections.dedup_by_key(|x| x.collection_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind leave a note on what is happening here?
…act (#4946) ## Description of changes If the same collection was returned twice, the results would be non-deterministic. ## Test plan CI + a new passing unit test. - [X] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes N/A
## Description of changes Included changes - **[ENH] Purge dirty log in background at the end of scheduled compaction (#4915)** - **[ENH] Move Log GC to operator (#4919)** - **[BUG] Do not leak tokio tasks in the log service. (#4936)** - **[BUG] Log GC offset should be one above minimum compaction offset (#4938)** - **[ENH] Make roll dirty log always converge to coalesce everything. (#4927)** - **[BUG] Coalesce when multiple collections return the same info to compact (#4946)** - **[BUG] Enrich from the manifest if a cursor doesn't exist. (#4947)** - **[ENH] If the dirty log fails with LogContentionDurable, do not fail the operation. (#4953)** - **[ENH] Warn, not error, if dirty log has no cursor. (#4952)** - **[ENH] Cancellation safety for append_batch. (#4959)** ## Test plan CI ## Documentation Changes N/A --------- Co-authored-by: Macronova <60079945+Sicheng-Pan@users.noreply.github.com>
…act (chroma-core#4946) ## Description of changes If the same collection was returned twice, the results would be non-deterministic. ## Test plan CI + a new passing unit test. - [X] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes N/A
Description of changes
If the same collection was returned twice, the results would be non-deterministic.
Test plan
CI + a new passing unit test.
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
N/A