Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant#16142
Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant#16142gortiz merged 7 commits intoapache:masterfrom
Conversation
f7c132c to
6ec81e7
Compare
6ec81e7 to
70d19e2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16142 +/- ##
============================================
+ Coverage 62.90% 63.12% +0.22%
+ Complexity 1386 1363 -23
============================================
Files 2867 2973 +106
Lines 163354 172548 +9194
Branches 24952 26429 +1477
============================================
+ Hits 102755 108926 +6171
- Misses 52847 55294 +2447
- Partials 7752 8328 +576
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a callback-based mechanism for cancelling queries via the ThreadResourceUsageAccountant, falling back to thread interrupts only when no callback is registered.
- Adds a
registerQueryCancelCallbackmethod to the SPI interface and implements it inPerQueryCPUMemResourceUsageAccountant. - Hooks up callback registration in
QueryRunnerandQuerySchedulerand wires the accountant through constructors. - Extends tests and test utilities to verify callback registration.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/.../ThreadResourceUsageAccountant.java | Added default registerQueryCancelCallback method |
| pinot-spi/.../QueryCancelCallback.java | Introduced new QueryCancelCallback interface |
| pinot-query-runtime/.../QueryRunnerAccountingTest.java | Removed unnecessary ServerMetrics.register call in test |
| pinot-query-runtime/.../QueryAccountantCancelTest.java | New test exercising per-query callback registration |
| pinot-query-runtime/.../QueryServerEnclosure.java | Overloaded constructors to accept the accountant |
| pinot-query-runtime/.../QueryRunner.java | Injects and registers cancel callback in processQuery |
| pinot-core/.../ResourceManager.java | Stores and exposes the accountant via getter |
| pinot-core/.../QueryScheduler.java | Registers interrupt callback on the anchor thread |
| pinot-core/.../PerQueryCPUMemAccountantFactory.java | Implements callback storage, cleanup, and cancelQuery |
Comments suppressed due to low confidence (3)
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageAccountant.java:97
- The Javadoc for
registerQueryCancelCallbackis missing a@param queryIddescription. Please add it to explain the purpose of thequeryIdparameter.
default void registerQueryCancelCallback(String queryId, QueryCancelCallback queryCancelCallback) {
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/QueryCancelCallback.java:23
- The Javadoc refers to a
queryIdparameter, but thecancelQuery()method takes no arguments. Consider updating the comment to clarify that the callback is bound to a specific query when registered.
* Cancels the query with the given queryId.
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerAccountingTest.java:179
- Removing the
ServerMetrics.registercall may lead to uninitialized metrics in this test and cause failures. If metrics are used in the accountant, re-add or mock registration here.
HashMap<String, Object> configs = new HashMap<>();
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * Cancels the query with the given queryId. | ||
| */ | ||
| void cancelQuery(); |
There was a problem hiding this comment.
I like the idea of using an interface here, but I think we need to specify whether this method is blocking (and therefore once the call finishes the query is guarantee to be killed) or could be async
There was a problem hiding this comment.
It will be async. I'll add javadocs.
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/QueryRunner.java
Outdated
Show resolved
Hide resolved
| ResultTable resultTable = queryRunner("SELECT * FROM a LIMIT 2", false).getResultTable(); | ||
| Assert.assertEquals(resultTable.getRows().size(), 2); | ||
| // TODO: How to programmatically get the request id ? | ||
| assertNotNull(_accountant.getQueryCancelCallback("0")); |
There was a problem hiding this comment.
We are using corretlations id here, right? in that case you can set the CID by using query options. See https://docs.pinot.apache.org/users/user-guide-query/query-correlation-id
There was a problem hiding this comment.
Not yet. The feature works with correlation id. However the test dont work with correlation id - sigh
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java
Show resolved
Hide resolved
|
There is a conflict that has to be resolved, mark it as ready for review and add that nullable notation. Apart from these minor tasks, I think it is ready to merge |
Will fix. I'll get #16299 merged first and then fix conflicts and get the branch ready for merge. |
remove setupRunner from MSE Use QueryCancelCallback Minor cleanups Cleanup unnecessary code. Undo removal of parentContext More undo work Implement query cancel callback in PerQueryCPUMemAccountantFactory Address review comments Remove callback for SSE
1574448 to
af6d04f
Compare
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) * [refactor] Switching to RoutingManager for broker request handlers (apache#16442) Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com> * Fix broker request id generator to avoid generating same id (apache#16661) * Introduce QueryExecutionContext to manage query life cycle (apache#16728) * Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
PerQueryCPUMemAccountant uses callbacks to cancel queries. The main motivation is that MSE executor threads have no anchor thread.
OpchainSchedulerService.cancel.Map<String, QueryCancelCallback>.Closes #16060