Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filestorage] - Make things simpler in compactino #31024

Closed
VihasMakwana opened this issue Feb 3, 2024 · 3 comments
Closed

[filestorage] - Make things simpler in compactino #31024

VihasMakwana opened this issue Feb 3, 2024 · 3 comments
Labels
enhancement New feature or request extension/storage/filestorage needs triage New item requiring triage

Comments

@VihasMakwana
Copy link
Contributor

VihasMakwana commented Feb 3, 2024

Component(s)

extension/storage/filestorage

Is your feature request related to a problem? Please describe.

  • The two params rebound_trigger_threshold_mib and rebound_needed_threshold_mib are confusing for users not familiar with the terminology.
  • They talk about setting/clearing a flag, but in the backend, there's no such flag.

Describe the solution you'd like

  • Instead of maintaining two different parameters to check for compaction, we can achieve similar things more simply.
  • Instead of checking for totalSize against rebound_needed_threshold_mib and dataSize against rebound_trigger_threshold_mib we can check for free space (i.e. total size - data size)
  • It would mean:
    • If claimable space reaches above xyz MiBs, trigger a compaction.
    • Else, no need.

Describe alternatives you've considered

Haven't thought of any alternative, any thoughts are welcomed.

Additional context

No response

@VihasMakwana VihasMakwana added enhancement New feature or request needs triage New item requiring triage labels Feb 3, 2024
Copy link
Contributor

github-actions bot commented Feb 3, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

The two params rebound_trigger_threshold_mib and rebound_needed_threshold_mib are confusing for users not familiar with the terminology.

I agree it requires some understanding but the documentation is quite clear about how it works. Feel free to suggest improvements to the documentation if you think there is a clearer way to present the model.

They talk about setting/clearing a flag, but in the backend, there's no such flag.

This is not explicit flag variable in the code but the effective behavior is described accurately. The code just computes the conditions as needed, which would be a lot less clear if explained in documentation.

Instead of maintaining two different parameters to check for compaction, we can achieve similar things more simply.
Instead of checking for totalSize against rebound_needed_threshold_mib and dataSize against rebound_trigger_threshold_mib we can check for free space (i.e. total size - data size)
It would mean:
If claimable space reaches above xyz MiBs, trigger a compaction.
Else, no need.

There are downsides to your approach which are worth noting.

  1. Compaction may run excessively. Consider for example if someone is using the current default settings (10 and 100 mibs). If the data size grows to 1000 mibs, then falls below 10 mibs, compaction will trigger exactly once. However, with your suggestion, they might set the "free size trigger" to be 100mibs, which would trigger at least 9 times as it falls below 10mibs. Additionally, any fluctuation as it falls would potentially add additional compactions. e.g Sequential data sizes of 1000, 900, 950, 850, etc would cause ~20 compactions.
  2. Compaction may run at the worst possible time. Say the user expects typically there is little to no data in the file, and any accumulation of data essentially indicates that something is wrong and the system is under excess load. In the current design, they can set rebound_trigger_threshold_mib and know that compaction is not going to run until the system has effectively recovered. In your proposal, there is no way to be sure. If they set a "free size trigger" of 100 mibs, but the data grows to 1000 mibs, then compaction will run when the data size is 900 mibs, which is still under stress according to their expectations. You might suggest as an alternative that they set the "free size trigger" to ~900 mibs, but see the next point.
  3. Compaction may be difficult to trigger. In the previous example, if the user expects that under stress the data may grow to 1000 mibs, they may want to prevent compaction during this stressed time by setting the "free size trigger" to ~900 mibs. Unfortunately, then an event which leads to "only" 800 mibs for a brief time will never trigger compaction. Additionally, if their system were to ever reach 5000 mibs, they still run into problems 1 and 2.

@djaglowski
Copy link
Member

Closing as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request extension/storage/filestorage needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

2 participants