Throttle SSE & MSE Tasks if Server heap usage is above a threshold#16271
Throttle SSE & MSE Tasks if Server heap usage is above a threshold#16271gortiz merged 5 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
Adds a new mechanism to stop submitting multi-stage execution (MSE) tasks when heap usage exceeds a configured critical threshold.
- Introduce
ThrottleOnCriticalHeapUsageExecutorthat throws whenthrottleQuerySubmission()is true. - Add a default
throttleQuerySubmissioninThreadResourceUsageAccountantand implement it inPerQueryCPUMemAccountantFactory. - Wire the new executor through server startup (
BaseServerStarter→ServerInstance→WorkerQueryServer→QueryRunner) and expose a config flag.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/test/java/org/apache/pinot/spi/executor/ThrottleOnCriticalHeapUsageExecutorTest.java | Add test for throttling logic |
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java | Define new config key and default |
| pinot-spi/src/main/java/org/apache/pinot/spi/executor/ThrottleOnCriticalHeapUsageExecutor.java | Implement executor wrapper that checks heap usage threshold |
| pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageAccountant.java | Provide a default throttleQuerySubmission() |
| pinot-server/src/main/java/org/apache/pinot/server/worker/WorkerQueryServer.java | Pass ThreadResourceUsageAccountant into QueryRunner |
| pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java | Inject ThreadResourceUsageAccountant when creating ServerInstance |
| pinot-server/src/main/java/org/apache/pinot/server/starter/ServerInstance.java | Update constructor and WorkerQueryServer instantiation to accept accountant |
| pinot-query-runtime/src/test/java/org/apache/pinot/query/QueryServerEnclosure.java | Update test to supply a default accountant instance |
| pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java | Extend init signature and wrap executor with ThrottleOnCriticalHeapUsageExecutor based on config |
| pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java | Implement throttleQuerySubmission() and expose current heap usage |
Comments suppressed due to low confidence (1)
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java:213
Server.CONFIG_OF_ENABLE_MSE_THROTTLING_ON_CRITICAL_HEAP_USAGEis not in scope; it should be referenced asCommonConstants.Server.CONFIG_OF_ENABLE_MSE_THROTTLING_ON_CRITICAL_HEAP_USAGE.
if (serverConf.getProperty(Server.CONFIG_OF_ENABLE_MSE_THROTTLING_ON_CRITICAL_HEAP_USAGE,
|
|
||
| @Override | ||
| public boolean throttleQuerySubmission() { | ||
| return _numCalls.getAndIncrement() > 1; |
There was a problem hiding this comment.
The stubbed throttleQuerySubmission() returns true only after two submissions, but the test expects throttling on the second submission. Change the condition to > 0 to block the second task as intended.
| return _numCalls.getAndIncrement() > 1; | |
| return _numCalls.getAndIncrement() > 0; |
There was a problem hiding this comment.
This is fine. The function is called twice for each task in the executor
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16271 +/- ##
============================================
+ Coverage 62.90% 63.21% +0.31%
+ Complexity 1386 1365 -21
============================================
Files 2867 2963 +96
Lines 163354 171523 +8169
Branches 24952 26277 +1325
============================================
+ Hits 102755 108431 +5676
- Misses 52847 54848 +2001
- Partials 7752 8244 +492
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PerQueryCPUMemAccountant defines critical heap usage based on its configuration. This PR adds the ability to disallow anymore MSE tasks if heap usage is above the critical level. A new configuration parameter is added `pinot.server.query.executor.mse.enableThrottlingOnCriticalHeapUsage`. Default is false. If enabled, the executor checks with `ThreadResourceUsageAccountant` if heap level is critical. In the default implementation, `false` is returned. `PerQueryCPUMemAccountant` compares the configuration with current heap usage. Heap usage is updated every time watcher task runs. By default, it runs every 30ms. Closes apache#16270
68c3862 to
77c05dd
Compare
| Tracing.getThreadAccountant().getClusterConfigChangeListener()); | ||
| } | ||
|
|
||
| SendStatsPredicate sendStatsPredicate = SendStatsPredicate.create(_serverConf, _helixManager); |
There was a problem hiding this comment.
We should not move this behind thread accountant initialization since server metrics need to be initialized.
cc @priyen-stripe who helped find the issue
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) * [refactor] Switching to RoutingManager for broker request handlers (apache#16442) Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com> * Fix broker request id generator to avoid generating same id (apache#16661) * Introduce QueryExecutionContext to manage query life cycle (apache#16728) * Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
PerQueryCPUMemAccountant defines critical heap usage based on its configuration. This PR adds the ability to disallow anymore SSE & MSE tasks if heap usage is above the critical level.
A new configuration parameter is added
pinot.server.query.executor.enableThrottlingOnHeapUsage. Default is false.If enabled, the executor checks with
ThreadResourceUsageAccountantif heap level is alarm. In the default implementation,falseis returned.PerQueryCPUMemAccountantcompares the configuration with current heap usage. Heap usage is updated every time watcher task runs. By default, it runs every 30ms.Closes #16270