Skip to content

Throttle SSE & MSE Tasks if Server heap usage is above a threshold#16271

Merged
gortiz merged 5 commits intoapache:masterfrom
vrajat:rv-tracing-throttle
Jul 4, 2025
Merged

Throttle SSE & MSE Tasks if Server heap usage is above a threshold#16271
gortiz merged 5 commits intoapache:masterfrom
vrajat:rv-tracing-throttle

Conversation

@vrajat
Copy link
Contributor

@vrajat vrajat commented Jul 3, 2025

PerQueryCPUMemAccountant defines critical heap usage based on its configuration. This PR adds the ability to disallow anymore SSE & MSE tasks if heap usage is above the critical level.

A new configuration parameter is added pinot.server.query.executor.enableThrottlingOnHeapUsage. Default is false.

If enabled, the executor checks with ThreadResourceUsageAccountant if heap level is alarm. In the default implementation, false is returned.

PerQueryCPUMemAccountant compares the configuration with current heap usage. Heap usage is updated every time watcher task runs. By default, it runs every 30ms.

Closes #16270

@vrajat vrajat requested review from Copilot, gortiz and yashmayya July 3, 2025 10:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new mechanism to stop submitting multi-stage execution (MSE) tasks when heap usage exceeds a configured critical threshold.

  • Introduce ThrottleOnCriticalHeapUsageExecutor that throws when throttleQuerySubmission() is true.
  • Add a default throttleQuerySubmission in ThreadResourceUsageAccountant and implement it in PerQueryCPUMemAccountantFactory.
  • Wire the new executor through server startup (BaseServerStarterServerInstanceWorkerQueryServerQueryRunner) and expose a config flag.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pinot-spi/src/test/java/org/apache/pinot/spi/executor/ThrottleOnCriticalHeapUsageExecutorTest.java Add test for throttling logic
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java Define new config key and default
pinot-spi/src/main/java/org/apache/pinot/spi/executor/ThrottleOnCriticalHeapUsageExecutor.java Implement executor wrapper that checks heap usage threshold
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageAccountant.java Provide a default throttleQuerySubmission()
pinot-server/src/main/java/org/apache/pinot/server/worker/WorkerQueryServer.java Pass ThreadResourceUsageAccountant into QueryRunner
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java Inject ThreadResourceUsageAccountant when creating ServerInstance
pinot-server/src/main/java/org/apache/pinot/server/starter/ServerInstance.java Update constructor and WorkerQueryServer instantiation to accept accountant
pinot-query-runtime/src/test/java/org/apache/pinot/query/QueryServerEnclosure.java Update test to supply a default accountant instance
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java Extend init signature and wrap executor with ThrottleOnCriticalHeapUsageExecutor based on config
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java Implement throttleQuerySubmission() and expose current heap usage
Comments suppressed due to low confidence (1)

pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java:213

  • Server.CONFIG_OF_ENABLE_MSE_THROTTLING_ON_CRITICAL_HEAP_USAGE is not in scope; it should be referenced as CommonConstants.Server.CONFIG_OF_ENABLE_MSE_THROTTLING_ON_CRITICAL_HEAP_USAGE.
    if (serverConf.getProperty(Server.CONFIG_OF_ENABLE_MSE_THROTTLING_ON_CRITICAL_HEAP_USAGE,


@Override
public boolean throttleQuerySubmission() {
return _numCalls.getAndIncrement() > 1;
Copy link

Copilot AI Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stubbed throttleQuerySubmission() returns true only after two submissions, but the test expects throttling on the second submission. Change the condition to > 0 to block the second task as intended.

Suggested change
return _numCalls.getAndIncrement() > 1;
return _numCalls.getAndIncrement() > 0;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. The function is called twice for each task in the executor

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 50.94340% with 26 lines in your changes missing coverage. Please review.

Project coverage is 63.21%. Comparing base (1a476de) to head (77c05dd).
Report is 410 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/server/starter/helix/BaseServerStarter.java 0.00% 7 Missing ⚠️
...ore/query/scheduler/resources/ResourceManager.java 55.55% 2 Missing and 2 partials ⚠️
.../executor/ThrottleOnCriticalHeapUsageExecutor.java 73.33% 4 Missing ⚠️
...va/org/apache/pinot/query/runtime/QueryRunner.java 0.00% 2 Missing and 1 partial ⚠️
...re/accounting/PerQueryCPUMemAccountantFactory.java 0.00% 2 Missing ⚠️
...rg/apache/pinot/server/starter/ServerInstance.java 0.00% 2 Missing ⚠️
.../apache/pinot/server/worker/WorkerQueryServer.java 0.00% 2 Missing ⚠️
...ot/core/query/scheduler/QuerySchedulerFactory.java 80.00% 1 Missing ⚠️
.../spi/accounting/ThreadResourceUsageAccountant.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16271      +/-   ##
============================================
+ Coverage     62.90%   63.21%   +0.31%     
+ Complexity     1386     1365      -21     
============================================
  Files          2867     2963      +96     
  Lines        163354   171523    +8169     
  Branches      24952    26277    +1325     
============================================
+ Hits         102755   108431    +5676     
- Misses        52847    54848    +2001     
- Partials       7752     8244     +492     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.19% <50.94%> (+0.32%) ⬆️
java-21 63.17% <50.94%> (+0.35%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.21% <50.94%> (+0.31%) ⬆️
unittests 63.21% <50.94%> (+0.31%) ⬆️
unittests1 56.35% <64.28%> (+0.53%) ⬆️
unittests2 33.38% <0.00%> (-0.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vrajat vrajat changed the title Throttle MSE Tasks if Server heap usage is above critical level. Throttle SSE & MSE Tasks if Server heap usage is above critical level. Jul 3, 2025
vrajat added 5 commits July 4, 2025 09:43
PerQueryCPUMemAccountant defines critical heap usage based on its configuration. This PR adds the ability to disallow anymore MSE tasks if heap usage is above the critical level.

A new configuration parameter is added `pinot.server.query.executor.mse.enableThrottlingOnCriticalHeapUsage`. Default is false.

If enabled, the executor checks with `ThreadResourceUsageAccountant` if heap level is critical. In the default implementation, `false` is returned.

`PerQueryCPUMemAccountant` compares the configuration with current heap usage. Heap usage is updated every time watcher task runs. By default, it runs every 30ms.

Closes apache#16270
@vrajat vrajat force-pushed the rv-tracing-throttle branch from 68c3862 to 77c05dd Compare July 4, 2025 04:14
@vrajat vrajat changed the title Throttle SSE & MSE Tasks if Server heap usage is above critical level. Throttle SSE & MSE Tasks if Server heap usage is above a threshold Jul 4, 2025
@gortiz gortiz merged commit d40731f into apache:master Jul 4, 2025
18 checks passed
@Jackie-Jiang Jackie-Jiang added documentation release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) query multi-stage Related to the multi-stage query engine labels Jul 7, 2025
@vrajat vrajat deleted the rv-tracing-throttle branch July 9, 2025 04:49
Tracing.getThreadAccountant().getClusterConfigChangeListener());
}

SendStatsPredicate sendStatsPredicate = SendStatsPredicate.create(_serverConf, _helixManager);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not move this behind thread accountant initialization since server metrics need to be initialized.
cc @priyen-stripe who helped find the issue

mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

* [refactor] Switching to RoutingManager for broker request handlers (apache#16442)

Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>

* Fix broker request id generator to avoid generating same id (apache#16661)

* Introduce QueryExecutionContext to manage query life cycle (apache#16728)

* Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Configuration Config changes (addition/deletion/change in behavior) documentation multi-stage Related to the multi-stage query engine query release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Throttle Task Submission in MSE if heap usage is at critical level

5 participants