[filestorage] - Make things simpler in compactino #31024

VihasMakwana · 2024-02-03T10:59:13Z

Component(s)

extension/storage/filestorage

Is your feature request related to a problem? Please describe.

The two params rebound_trigger_threshold_mib and rebound_needed_threshold_mib are confusing for users not familiar with the terminology.
They talk about setting/clearing a flag, but in the backend, there's no such flag.

Describe the solution you'd like

Instead of maintaining two different parameters to check for compaction, we can achieve similar things more simply.
Instead of checking for totalSize against rebound_needed_threshold_mib and dataSize against rebound_trigger_threshold_mib we can check for free space (i.e. total size - data size)
It would mean:
- If claimable space reaches above xyz MiBs, trigger a compaction.
- Else, no need.

Describe alternatives you've considered

Haven't thought of any alternative, any thoughts are welcomed.

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-02-03T10:59:28Z

Pinging code owners:

extension/storage/filestorage: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2024-02-05T15:47:20Z

The two params rebound_trigger_threshold_mib and rebound_needed_threshold_mib are confusing for users not familiar with the terminology.

I agree it requires some understanding but the documentation is quite clear about how it works. Feel free to suggest improvements to the documentation if you think there is a clearer way to present the model.

They talk about setting/clearing a flag, but in the backend, there's no such flag.

This is not explicit flag variable in the code but the effective behavior is described accurately. The code just computes the conditions as needed, which would be a lot less clear if explained in documentation.

Instead of maintaining two different parameters to check for compaction, we can achieve similar things more simply.
Instead of checking for totalSize against rebound_needed_threshold_mib and dataSize against rebound_trigger_threshold_mib we can check for free space (i.e. total size - data size)
It would mean:
If claimable space reaches above xyz MiBs, trigger a compaction.
Else, no need.

There are downsides to your approach which are worth noting.

Compaction may run excessively. Consider for example if someone is using the current default settings (10 and 100 mibs). If the data size grows to 1000 mibs, then falls below 10 mibs, compaction will trigger exactly once. However, with your suggestion, they might set the "free size trigger" to be 100mibs, which would trigger at least 9 times as it falls below 10mibs. Additionally, any fluctuation as it falls would potentially add additional compactions. e.g Sequential data sizes of 1000, 900, 950, 850, etc would cause ~20 compactions.
Compaction may run at the worst possible time. Say the user expects typically there is little to no data in the file, and any accumulation of data essentially indicates that something is wrong and the system is under excess load. In the current design, they can set rebound_trigger_threshold_mib and know that compaction is not going to run until the system has effectively recovered. In your proposal, there is no way to be sure. If they set a "free size trigger" of 100 mibs, but the data grows to 1000 mibs, then compaction will run when the data size is 900 mibs, which is still under stress according to their expectations. You might suggest as an alternative that they set the "free size trigger" to ~900 mibs, but see the next point.
Compaction may be difficult to trigger. In the previous example, if the user expects that under stress the data may grow to 1000 mibs, they may want to prevent compaction during this stressed time by setting the "free size trigger" to ~900 mibs. Unfortunately, then an event which leads to "only" 800 mibs for a brief time will never trigger compaction. Additionally, if their system were to ever reach 5000 mibs, they still run into problems 1 and 2.

djaglowski · 2024-02-16T18:35:12Z

Closing as not planned.

VihasMakwana added enhancement New feature or request needs triage New item requiring triage labels Feb 3, 2024

github-actions bot added the extension/storage/filestorage label Feb 3, 2024

github-actions bot mentioned this issue Feb 6, 2024

Weekly Report: 2024-01-30 - 2024-02-06 #31055

Closed

github-actions bot mentioned this issue Feb 13, 2024

Weekly Report: 2024-02-06 - 2024-02-13 #31192

Closed

djaglowski closed this as completed Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[filestorage] - Make things simpler in compactino #31024

[filestorage] - Make things simpler in compactino #31024

VihasMakwana commented Feb 3, 2024 •

edited

Loading

github-actions bot commented Feb 3, 2024

djaglowski commented Feb 5, 2024

djaglowski commented Feb 16, 2024

[filestorage] - Make things simpler in compactino #31024

[filestorage] - Make things simpler in compactino #31024

Comments

VihasMakwana commented Feb 3, 2024 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Feb 3, 2024

djaglowski commented Feb 5, 2024

djaglowski commented Feb 16, 2024

VihasMakwana commented Feb 3, 2024 •

edited

Loading