Skip to content

[ENH]: add config param to garbage collector to control how many collections are fetched from SysDb #5275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 14, 2025

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented Aug 14, 2025

Description of changes

When max_collections_to_gc is small, all collections fetched by GC may be disabled by the GC config, preventing any non-disabled collections from being GC'ed. This adds a new config parameter to allow GC to initially overfetch collections before then filtering and limiting to max_collections_to_gc.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Backwards compatible because new config param is optional.

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb marked this pull request as ready for review August 14, 2025 18:32
Copy link
Contributor

propel-code-bot bot commented Aug 14, 2025

Add Config Parameter to Control SysDb Collection Fetch Limit in GC

This pull request introduces a new optional configuration parameter, max_collections_to_fetch, to the garbage collector service. This parameter determines the maximum number of collections that can be initially fetched from SysDb for garbage collection, which may be larger than the existing max_collections_to_gc, allowing the GC to improve its chances of finding eligible collections when many are disabled or filtered. The implementation updates both the configuration struct and the collection-fetching logic, ensuring backward compatibility by defaulting to the prior behavior if the new parameter is unset.

Key Changes

• Added max_collections_to_fetch`: Option<u32>` to GarbageCollectorConfigstruct. • Updated fetch logic ingarbage_collector_component.rs to use max_collections_to_fetch (or fallback to max_collections_to_gc).
• Altered post-filtering logic to apply max_collections_to_gc after filtering collections.
• Updated all internal struct initializations and default configs to support the new parameter.
• Renamed and consistently used the plural form (max_collections_to_fetch) for improved clarity.

Affected Areas

• rust/garbage_collector/src/config.rs
• rust/garbage_collector/src/garbage_collector_component.rs

This summary was automatically generated by @propel-code-bot

@@ -41,6 +41,7 @@ pub struct GarbageCollectorConfig {
)]
pub(super) version_cutoff_time: Duration,
pub(super) max_collections_to_gc: u32,
pub(super) max_collection_to_fetch: Option<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

For consistency with the existing max_collections_to_gc parameter, consider renaming this to max_collections_to_fetch (plural collections). This would improve readability and maintain consistency in naming conventions for related configuration parameters. This change would need to be applied in rust/garbage_collector/src/garbage_collector_component.rs as well.

Suggested change
pub(super) max_collection_to_fetch: Option<u32>,
pub(super) max_collections_to_fetch: Option<u32>,

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

@codetheweb codetheweb enabled auto-merge (squash) August 14, 2025 19:47
@codetheweb codetheweb merged commit e8ed50f into main Aug 14, 2025
59 checks passed
@codetheweb codetheweb deleted the feat-gc-max-collection-fetch-config-param branch August 14, 2025 20:17
chroma-droid pushed a commit that referenced this pull request Aug 14, 2025
codetheweb added a commit that referenced this pull request Aug 14, 2025
Co-authored-by: Max Isom <codetheweb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants