Make ThreadResourceUsageProvider a Helper/Utility Class.#16051
Make ThreadResourceUsageProvider a Helper/Utility Class.#16051swaminathanmanish merged 14 commits intoapache:masterfrom
Conversation
440c792 to
cee9e56
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR refactors ThreadResourceUsageProvider into a purely static helper and introduces ThreadResourceSnapshot for start/end snapshots, removing the thread-local provider and simplifying per-thread accounting.
- Split provider into a static utility (
ThreadResourceUsageProvider) and a snapshot class (ThreadResourceSnapshot) - Updated all call sites to create/use a snapshot and pass both CPU and memory deltas explicitly
- Removed the thread-local provider plumbing in
PerQueryCPUMemAccountantFactoryand related classes
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/trace/Tracing.java | Removed provider setter; updated concurrent update signature |
| pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageProvider.java | Made constructor private; added static getters |
| pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageAccountant.java | Deprecated setter; extended updateConcurrent signature |
| pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceSnapshot.java | New snapshot class for start/end resource captures |
| pinot-core/src/test/java/org/apache/pinot/core/accounting/TestThreadMXBean.java | Tests updated to use snapshot instead of provider |
| pinot-core/src/main/java/org/apache/pinot/core/transport/DataTableHandler.java | Inserted snapshot before updating query usage |
| pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java | Replaced provider with snapshot in combine operator |
| pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java | Replaced provider with snapshot in response operator |
| pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java | Removed thread-local provider; use passed deltas |
| pinot-core/src/main/java/org/apache/pinot/core/accounting/CPUMemThreadLevelAccountingObjects.java | Added snapshot field; reset/update methods |
| pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java | Switched to snapshot for serialization metadata |
Comments suppressed due to low confidence (2)
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceSnapshot.java:51
- The Javadoc for
getCpuTimeNs()suggests it auto-snapshots if not closed, but the method simply returns the last snapshot delta. Consider updating the comment to clarify that callers must invoketakeSnapshot()before calling this.
public long getCpuTimeNs() {
pinot-core/src/test/java/org/apache/pinot/core/accounting/TestThreadMXBean.java:58
- This test is named
testThreadMXBeanSimpleMemAllocTrackingbut readsgetCpuTimeNs()instead ofgetAllocatedBytes(). It should assert on memory allocation, e.g., usegetAllocatedBytes().
long result = threadResourceSnapshot.getCpuTimeNs();
b8e6ad3 to
a14968b
Compare
| /** | ||
| * Gets the CPU time used so far in nanoseconds. | ||
| * Takes a current snapshot if not yet closed. | ||
| */ | ||
| public long getCpuTimeNs() { | ||
| return _endCpuTime - _startCpuTime; | ||
| } |
There was a problem hiding this comment.
Javadoc is not correct, right? It isn't taking a snapshot. Also, what is the meaning of closed here?
There was a problem hiding this comment.
I've fixed the javadoc and the code. This was left over from an initial version.
| @Deprecated | ||
| public long getThreadTimeNs() { | ||
| return 0; | ||
| } | ||
|
|
||
| @Deprecated | ||
| public long getThreadAllocatedBytes() { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Perhaps we should consider breaking backward compatibility with this change.
There was a problem hiding this comment.
I had a similar conversation with @vvivekiyer as well. I had suggested a separate PR to remove all the unnecessary code in a single PR. There will be a few more code blocks deprecated as well.
|
I have approved the PR, but I would like to address some comments before merging. Also tests are failing. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16051 +/- ##
============================================
+ Coverage 62.90% 63.15% +0.25%
+ Complexity 1386 1352 -34
============================================
Files 2867 2952 +85
Lines 163354 169619 +6265
Branches 24952 25940 +988
============================================
+ Hits 102755 107123 +4368
- Misses 52847 54384 +1537
- Partials 7752 8112 +360
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4bee47e to
4839b97
Compare
|
Test failures seem unrelated to the diff. in I am seeing the same failures in |
…tracks resource usage. Fix updateConcurrently
4839b97 to
9fc7356
Compare
|
All tests have passed |
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) * [refactor] Switching to RoutingManager for broker request handlers (apache#16442) Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com> * Fix broker request id generator to avoid generating same id (apache#16661) * Introduce QueryExecutionContext to manage query life cycle (apache#16728) * Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
ThreadResourceUsageProviderhas two functions:The dual functionality complicated its usage in
PerQueryCpuMemAccountant. AThreadResourceUsageProviderobject has to be stored in a thread local to remember the initial resource values. When a thread is assigned a task this thread local also has to be initialized. Else queries will fail with NPE. #15045 is an example.This PR splits the functionality into
ThreadResourceUsageProvideris now a pure utility class with only static functionsThreadResourceSnapshotmaintains start values.With these changes, the thread-local for
ThreadResourceUsageProviderhas been removed. The snapshot of resources is stored in theThreadEntrythread-local itself. So a class of issues have been eliminated along with yet another thread-local.Closes #16042