Skip to content

Make ThreadResourceUsageProvider a Helper/Utility Class.#16051

Merged
swaminathanmanish merged 14 commits intoapache:masterfrom
vrajat:rv-tracing-to-helper
Jun 20, 2025
Merged

Make ThreadResourceUsageProvider a Helper/Utility Class.#16051
swaminathanmanish merged 14 commits intoapache:masterfrom
vrajat:rv-tracing-to-helper

Conversation

@vrajat
Copy link
Contributor

@vrajat vrajat commented Jun 9, 2025

ThreadResourceUsageProvider has two functions:

  • Provide utility functions to capture thread cpu and allocated bytes.
  • Snapshot start values for cpu and allocated bytes since the mxbean functions provide cumulative values.

The dual functionality complicated its usage in PerQueryCpuMemAccountant. A ThreadResourceUsageProvider object has to be stored in a thread local to remember the initial resource values. When a thread is assigned a task this thread local also has to be initialized. Else queries will fail with NPE. #15045 is an example.

This PR splits the functionality into

  • ThreadResourceUsageProvider is now a pure utility class with only static functions
  • ThreadResourceSnapshot maintains start values.

With these changes, the thread-local for ThreadResourceUsageProvider has been removed. The snapshot of resources is stored in the ThreadEntry thread-local itself. So a class of issues have been eliminated along with yet another thread-local.

Closes #16042

@vrajat vrajat force-pushed the rv-tracing-to-helper branch 2 times, most recently from 440c792 to cee9e56 Compare June 11, 2025 16:38
@vrajat vrajat marked this pull request as ready for review June 12, 2025 05:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors ThreadResourceUsageProvider into a purely static helper and introduces ThreadResourceSnapshot for start/end snapshots, removing the thread-local provider and simplifying per-thread accounting.

  • Split provider into a static utility (ThreadResourceUsageProvider) and a snapshot class (ThreadResourceSnapshot)
  • Updated all call sites to create/​use a snapshot and pass both CPU and memory deltas explicitly
  • Removed the thread-local provider plumbing in PerQueryCPUMemAccountantFactory and related classes

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pinot-spi/src/main/java/org/apache/pinot/spi/trace/Tracing.java Removed provider setter; updated concurrent update signature
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageProvider.java Made constructor private; added static getters
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageAccountant.java Deprecated setter; extended updateConcurrent signature
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceSnapshot.java New snapshot class for start/end resource captures
pinot-core/src/test/java/org/apache/pinot/core/accounting/TestThreadMXBean.java Tests updated to use snapshot instead of provider
pinot-core/src/main/java/org/apache/pinot/core/transport/DataTableHandler.java Inserted snapshot before updating query usage
pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java Replaced provider with snapshot in combine operator
pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java Replaced provider with snapshot in response operator
pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java Removed thread-local provider; use passed deltas
pinot-core/src/main/java/org/apache/pinot/core/accounting/CPUMemThreadLevelAccountingObjects.java Added snapshot field; reset/update methods
pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java Switched to snapshot for serialization metadata
Comments suppressed due to low confidence (2)

pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceSnapshot.java:51

  • The Javadoc for getCpuTimeNs() suggests it auto-snapshots if not closed, but the method simply returns the last snapshot delta. Consider updating the comment to clarify that callers must invoke takeSnapshot() before calling this.
public long getCpuTimeNs() {

pinot-core/src/test/java/org/apache/pinot/core/accounting/TestThreadMXBean.java:58

  • This test is named testThreadMXBeanSimpleMemAllocTracking but reads getCpuTimeNs() instead of getAllocatedBytes(). It should assert on memory allocation, e.g., use getAllocatedBytes().
long result = threadResourceSnapshot.getCpuTimeNs();

@vrajat vrajat force-pushed the rv-tracing-to-helper branch from b8e6ad3 to a14968b Compare June 13, 2025 12:38
Comment on lines +47 to +48
/**
* Gets the CPU time used so far in nanoseconds.
* Takes a current snapshot if not yet closed.
*/
public long getCpuTimeNs() {
return _endCpuTime - _startCpuTime;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc is not correct, right? It isn't taking a snapshot. Also, what is the meaning of closed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the javadoc and the code. This was left over from an initial version.

Comment on lines +54 to +62
@Deprecated
public long getThreadTimeNs() {
return 0;
}

@Deprecated
public long getThreadAllocatedBytes() {
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should consider breaking backward compatibility with this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar conversation with @vvivekiyer as well. I had suggested a separate PR to remove all the unnecessary code in a single PR. There will be a few more code blocks deprecated as well.

@gortiz
Copy link
Contributor

gortiz commented Jun 16, 2025

I have approved the PR, but I would like to address some comments before merging. Also tests are failing.

@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 67.39130% with 15 lines in your changes missing coverage. Please review.

Project coverage is 63.15%. Comparing base (1a476de) to head (9fc7356).
Report is 290 commits behind head on master.

Files with missing lines Patch % Lines
...re/accounting/PerQueryCPUMemAccountantFactory.java 14.28% 4 Missing and 2 partials ⚠️
...ot/spi/accounting/ThreadResourceUsageProvider.java 42.85% 4 Missing ⚠️
...accounting/CPUMemThreadLevelAccountingObjects.java 66.66% 2 Missing ⚠️
...e/pinot/spi/accounting/ThreadResourceSnapshot.java 80.00% 2 Missing ⚠️
.../main/java/org/apache/pinot/spi/trace/Tracing.java 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16051      +/-   ##
============================================
+ Coverage     62.90%   63.15%   +0.25%     
+ Complexity     1386     1352      -34     
============================================
  Files          2867     2952      +85     
  Lines        163354   169619    +6265     
  Branches      24952    25940     +988     
============================================
+ Hits         102755   107123    +4368     
- Misses        52847    54384    +1537     
- Partials       7752     8112     +360     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.13% <67.39%> (+0.26%) ⬆️
java-21 63.13% <67.39%> (+0.31%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.15% <67.39%> (+0.25%) ⬆️
unittests 63.15% <67.39%> (+0.25%) ⬆️
unittests1 64.66% <67.39%> (+8.84%) ⬆️
unittests2 33.29% <0.00%> (-0.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vrajat vrajat force-pushed the rv-tracing-to-helper branch from 4bee47e to 4839b97 Compare June 17, 2025 10:36
@vrajat
Copy link
Contributor Author

vrajat commented Jun 17, 2025

Test failures seem unrelated to the diff.

RefreshSegmentMinionClusterIntegrationTest.checkColumnAddition:386 expected [717.6666666666666] but found [164.66666666666666]
[INFO] 

in Pinot Integration Test Set 2 passes when run locally. I think its flaky.

I am seeing the same failures in Pinot Unit Test Set 1 in master as well.

@vrajat vrajat force-pushed the rv-tracing-to-helper branch from 4839b97 to 9fc7356 Compare June 18, 2025 05:32
@vrajat
Copy link
Contributor Author

vrajat commented Jun 18, 2025

All tests have passed

@vrajat vrajat changed the title Make ThreadResourceUsageProvide a Helper/Utility Class. Make ThreadResourceUsageProvider a Helper/Utility Class. Jun 18, 2025
@swaminathanmanish swaminathanmanish merged commit b5afb89 into apache:master Jun 20, 2025
18 checks passed
@vrajat vrajat deleted the rv-tracing-to-helper branch June 20, 2025 05:34
mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

* [refactor] Switching to RoutingManager for broker request handlers (apache#16442)

Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>

* Fix broker request id generator to avoid generating same id (apache#16661)

* Introduce QueryExecutionContext to manage query life cycle (apache#16728)

* Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change ThreadResourceUsageProvider to a utility or helper class

5 participants