Skip to content

[Query Resource Isolation]Interface for Workload Stats Collection#16340

Merged
vvivekiyer merged 6 commits intoapache:masterfrom
praveenc7:interface
Aug 28, 2025
Merged

[Query Resource Isolation]Interface for Workload Stats Collection#16340
vvivekiyer merged 6 commits intoapache:masterfrom
praveenc7:interface

Conversation

@praveenc7
Copy link
Contributor

@praveenc7 praveenc7 commented Jul 14, 2025

Summary

Adding an interface for workload stats collection that could be used to create a custom implementation on how and where the stats could be stored for historical evaluation to estimate the right workload cost over a period before enforcing a set cost

Key points

  • Broker/Server starters create and pass the WorkloadBudgetManager into the thread accountant; this provides the hook for over-ride on custom implementation.

Testing

Only interface addition no additional logic added inside WorkloadBudgetManager

@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2025

Codecov Report

❌ Patch coverage is 60.71429% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.37%. Comparing base (72bf3bf) to head (8551696).
⚠️ Report is 47 commits behind head on master.

Files with missing lines Patch % Lines
...he/pinot/spi/accounting/WorkloadBudgetManager.java 55.55% 7 Missing and 1 partial ⚠️
.../pinot/server/starter/helix/BaseServerStarter.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16340      +/-   ##
============================================
+ Coverage     63.32%   63.37%   +0.04%     
+ Complexity     1380     1379       -1     
============================================
  Files          3026     3036      +10     
  Lines        176434   176924     +490     
  Branches      27073    27139      +66     
============================================
+ Hits         111734   112123     +389     
- Misses        56133    56190      +57     
- Partials       8567     8611      +44     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.33% <60.71%> (+0.04%) ⬆️
java-21 63.33% <60.71%> (+0.03%) ⬆️
temurin 63.37% <60.71%> (+0.04%) ⬆️
unittests 63.37% <60.71%> (+0.04%) ⬆️
unittests1 56.48% <60.00%> (+0.02%) ⬆️
unittests2 33.13% <17.85%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@praveenc7 praveenc7 marked this pull request as ready for review July 30, 2025 20:12
@praveenc7 praveenc7 force-pushed the interface branch 2 times, most recently from 0f8e989 to f517662 Compare August 20, 2025 00:20
* Represents budget stats.
*/
public static class BudgetStats {
public final long _initialCpuBudget;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be added since we need to expose budgetProvisoned as part of cost collection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Some comments need to be updated to reflect this change.

}

/**
* Collects workload stats for CPU and memory usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a WorkloadBudgetManager interface with this being the DefaultWorkloadManager implementation?

The internal implementation can have logic for stats collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thoughts, on prematurely creating an interface when we don't additional implementation yet.

I did explore this commit 6be3218#diff-3c4497dbfe973a90e7a8cc7465bd99d71fc0e930bebe315c409e16036347754e but felt it added additional code without the need for it now.

Given this can be expanded later when we have a additional implementation

* Can be overridden to create a custom WorkloadBudgetManager.
*/
protected WorkloadBudgetManager createWorkloadBudgetManager(PinotConfiguration brokerConf) {
return new WorkloadBudgetManager(brokerConf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to create a factory here that returns the right BudgetManager implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we currently have only one implementation, I avoided introducing a factory prematurely. I was thinking a factory can be added later if we introduce additional implementations.

As a reference, PinotHelixResourceManager also defines multiple custom interfaces without relying on a factory. I borrowed a similar approach here to keep things simpler until we actually need multiple implementations.

_resourceUsageAccountant = Tracing.ThreadAccountantOps.createThreadAccountant(
_serverConf.subset(CommonConstants.PINOT_QUERY_SCHEDULER_PREFIX), _instanceId,
org.apache.pinot.spi.config.instance.InstanceType.SERVER);
org.apache.pinot.spi.config.instance.InstanceType.SERVER, workloadBudgetManager);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this interface change needed? Why can't createAccountant() create an instance of the appropriate BudgetManager based on configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is how we define custom implementations of WorkloadBudgetManager. The trade-off here is between:
• Tweaking the interface a little to take in additional parameter by passing in the WorkloadBudgetManager explicitly),
vs.
• Adding new configs (to define which WorkloadBudgetManager to use) and internally handling this in createAccountant() .

Since we only have a single implementation today and don’t anticipate adding another in the near future, introducing config toggles would add unnecessary complexity and maintenance overhead. Passing the WorkloadBudgetManager directly avoids this extra config surface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. We can make this change later when we deem necessary.

* Represents budget stats.
*/
public static class BudgetStats {
public final long _initialCpuBudget;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Some comments need to be updated to reflect this change.

@vvivekiyer vvivekiyer merged commit 7b1d881 into apache:master Aug 28, 2025
33 of 36 checks passed
mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

* [refactor] Switching to RoutingManager for broker request handlers (apache#16442)

Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>

* Fix broker request id generator to avoid generating same id (apache#16661)

* Introduce QueryExecutionContext to manage query life cycle (apache#16728)

* Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants