Skip to content

Modify AdaptiveSizeBasedWriter based on available capacity on disk#16803

Merged
KKcorps merged 11 commits intoapache:masterfrom
krishan1390:adaptive_disk_usage_for_mappers
Sep 23, 2025
Merged

Modify AdaptiveSizeBasedWriter based on available capacity on disk#16803
KKcorps merged 11 commits intoapache:masterfrom
krishan1390:adaptive_disk_usage_for_mappers

Conversation

@krishan1390
Copy link
Contributor

@krishan1390 krishan1390 commented Sep 12, 2025

Description:
Currently AdaptiveSizeBasedWriter (added in #12220) checks based on configured max bytes to be written. This config doesn't take into account other writers writing to the same file store. And thus, it becomes harder to manage.

In this PR, I am adding a check while takes into account the free disk space available and thus is more adaptive and easier to manage. The config added defines the maximum allowed disk usage percentage for the underlying file store.

To preserve backward compatibility the default is kept to 100%

Release Notes:

  1. New configuration per task type - maxDiskUsagePercentage (0-100) - which stops mapper phase of merge rollup task from adding new files if the minion disk usage has cross the defined threshold

Testing Notes :

Logs from test run

2025/09/18 08:31:25.589 INFO [SegmentMapper] [TaskStateModelFactory-task_thread-0] Initialized mapper with 30 record readers, output dir: <outputdir>, timeHandler: class org.apache.pinot.core.segment.processing.timehandler.NoOpTimeHandler, partitioners:

2025/09/18 08:31:45.597 WARN [AdaptiveSizeBasedWriter] [TaskStateModelFactory-task_thread-0] Disk usage percentage: 2% has exceeded the max limit: 2%. Will stop writing more data

2025/09/18 08:31:45.597 INFO [SegmentMapper] [TaskStateModelFactory-task_thread-0] Stopping record readers at index: 24 out of 30 passed to mapper as size limit reached, bytes written = 532326285, bytes limit = 9223372036854775807

2025/09/18 08:33:03.036 INFO [SegmentMapper] [TaskStateModelFactory-task_thread-0] Initialized mapper with 7 record readers, output dir: <outputdir>, timeHandler: class org.apache.pinot.core.segment.processing.timehandler.NoOpTimeHandler, partitioners:

@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 56.36364% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.40%. Comparing base (6182260) to head (e56730f).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...processing/genericrow/AdaptiveSizeBasedWriter.java 32.25% 19 Missing and 2 partials ⚠️
...re/segment/processing/framework/SegmentConfig.java 66.66% 2 Missing ⚠️
...rocessing/framework/SegmentProcessorFramework.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16803      +/-   ##
============================================
+ Coverage     63.38%   63.40%   +0.02%     
  Complexity     1410     1410              
============================================
  Files          3061     3062       +1     
  Lines        179692   179781      +89     
  Branches      27510    27517       +7     
============================================
+ Hits         113893   113994     +101     
+ Misses        57011    56997      -14     
- Partials       8788     8790       +2     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.37% <56.36%> (+0.03%) ⬆️
java-21 63.36% <56.36%> (+0.02%) ⬆️
temurin 63.40% <56.36%> (+0.02%) ⬆️
unittests 63.40% <56.36%> (+0.02%) ⬆️
unittests1 56.37% <45.45%> (+0.03%) ⬆️
unittests2 33.47% <20.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise !

@KKcorps KKcorps merged commit 4abe0ac into apache:master Sep 23, 2025
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants