Skip to content

Add multistage thread limiting configs at the broker and server level#16080

Merged
Jackie-Jiang merged 12 commits intoapache:masterfrom
satwik-pachigolla:github-fork/satwik/configurable-thread-limiting
Jun 18, 2025
Merged

Add multistage thread limiting configs at the broker and server level#16080
Jackie-Jiang merged 12 commits intoapache:masterfrom
satwik-pachigolla:github-fork/satwik/configurable-thread-limiting

Conversation

@satwik-pachigolla
Copy link
Contributor

@satwik-pachigolla satwik-pachigolla commented Jun 11, 2025

Summary

Making existing functionality to control multistage thread limits at the broker and server that is configurable with cluster configs now configurable with broker and server configs. This adds support for thread limiting heterogeneous infra rather than a single value to apply across the entire cluster.

The previously existing cluster configs are:

  • pinot.beta.multistage.engine.max.server.query.threads (broker limit)
  • pinot.beta.multistage.engine.max.server.query.threads.hardlimit.factor (server limit)

and we are now adding:

  • pinot.broker.mse.max.server.query.threads (broker limit)
  • pinot.server.query.executor.mse.max.execution.threads (server limit)

Comptability

This is backwards compatible since this is an opt-in feature. Limiting is only in effect when one of the configs is enabled. If both configs are enabled, the instance level ones take precedence over the cluster configs.

This is not forwards compatible once opted-in to using either of the new configs.

Testing

  1. Set brokers and server side limits to 100 using the new configs
  2. Load tested
  3. Observed expected limits being hit

Screenshot 2025-06-11 at 11 51 11 AM

Broker metric limited to under 222 (100 config * 20 servers / 9 brokers). Limits at ~170 since any further queries we were sending would cross the 222 limit.

Screenshot 2025-06-11 at 11 51 49 AM

Server JVM thread count increases by ~130. With the extra 30% accounted for by thread overhead apart from just the threads performing query execution.

@Jackie-Jiang Jackie-Jiang added enhancement Configuration Config changes (addition/deletion/change in behavior) labels Jun 11, 2025
@gortiz gortiz requested review from gortiz and yashmayya June 12, 2025 12:37
Comment on lines +385 to +402
public static final String CONFIG_OF_MSE_MAX_SERVER_QUERY_THREADS = "pinot.broker.mse.max.server.query.threads";
public static final int DEFAULT_MSE_MAX_SERVER_QUERY_THREADS = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are not 100% committed to the current max thread solution, which is better than the previous unlimited mode but still has some issues.

Therefore I would recommend keeping the beta label on the new configs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not adding beta in the config name to avoid migrations like Jackie said below.

@gortiz
Copy link
Contributor

gortiz commented Jun 13, 2025

I'm fine with this PR but I would suggest to include the beta prefix, given this is something we may want to change and deprecate in the future. We are discussing how to apply this limit locally (per server) instead of per broker. Additionally, in the future, we should utilize custom coroutines or virtual threads instead of hundreds of native threads, which are always problematic

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gortiz IMO we can keep the config key as is without beta, but calling this feature out as beta in the documentation. We can still remove the config if desired, but save the effort of migration if we want to keep it

@Jackie-Jiang
Copy link
Contributor

@satwik-pachigolla Can you rebase to the latest master? The test failures are not related to this PR, and are already fixed

@satwik-pachigolla satwik-pachigolla force-pushed the github-fork/satwik/configurable-thread-limiting branch from 77f3c62 to bf62557 Compare June 16, 2025 19:10
@satwik-pachigolla
Copy link
Contributor Author

@Jackie-Jiang I rebased to master and added the last commit to improve code style.

@satwik-pachigolla satwik-pachigolla requested a review from gortiz June 17, 2025 15:45
@codecov-commenter
Copy link

codecov-commenter commented Jun 18, 2025

Codecov Report

Attention: Patch coverage is 75.30864% with 20 lines in your changes missing coverage. Please review.

Project coverage is 63.18%. Comparing base (1a476de) to head (adcd6f0).
Report is 296 commits behind head on master.

Files with missing lines Patch % Lines
...va/org/apache/pinot/query/runtime/QueryRunner.java 68.42% 11 Missing and 1 partial ⚠️
.../apache/pinot/server/worker/WorkerQueryServer.java 0.00% 5 Missing ⚠️
...g/apache/pinot/spi/executor/HardLimitExecutor.java 76.92% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16080      +/-   ##
============================================
+ Coverage     62.90%   63.18%   +0.27%     
+ Complexity     1386     1357      -29     
============================================
  Files          2867     2951      +84     
  Lines        163354   169774    +6420     
  Branches      24952    25958    +1006     
============================================
+ Hits         102755   107268    +4513     
- Misses        52847    54390    +1543     
- Partials       7752     8116     +364     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.16% <75.30%> (+0.29%) ⬆️
java-21 63.15% <75.30%> (+0.33%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.18% <75.30%> (+0.27%) ⬆️
unittests 63.17% <75.30%> (+0.27%) ⬆️
unittests1 64.69% <70.58%> (+8.87%) ⬆️
unittests2 33.33% <30.86%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jackie-Jiang Jackie-Jiang merged commit 21b50af into apache:master Jun 18, 2025
18 checks passed
@Jackie-Jiang Jackie-Jiang added documentation release-notes Referenced by PRs that need attention when compiling the next release notes labels Jun 18, 2025
@Jackie-Jiang
Copy link
Contributor

Can you help update the pinot documentation about this new config?

mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
mqliang pushed a commit to mqliang/pinot that referenced this pull request Feb 10, 2026
* [Query Resource Isolation] Workload Configs (apache#15109)

* Workload Configs

* workload config

* Add API

* config

* Change config structure

* Propagation strategy

* Fix style check

* Cost spliting on update

* Table addition propagation

* perf

* Tests

* test

* test 2

* Review comments 1

* review comments 3

* review comments 3

* name change

* review comments 4

* Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102)

* Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051)

* ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage.

Fix updateConcurrently

* Rename to ThreadResourceSnapshot

* Clean up

* Add javadoc

* Done use auto closeable

* Checkstyle

* Fix compilation error

* Add back removed functions in SPI

* Remove private constructor because japicmp complains.

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Add setThreadResourceUsageProvider because of backward-incompatible checks

* Fix test

* Fix ThreadResourceSnapshot usage and tests

* Store cpu sample in nanoseconds.

* Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172)

* Dynamic PerQueryCPUMemAccountant Config on Servers  (apache#16219)

* Checkpoint

* Register change handler

* Fix bugs. Manually tested

* Checkstyle

* Tests

* Add pre-check that values are default

* Undo typo fix

* Update QueryRunner to make use of window function overflow handling server configurations (apache#16108)

* Add multistage thread limiting configs at the broker and server level (apache#16080)

* Adding changes for supporting RLS (apache#16043)

* Use stats cache on error instead of the chained mechanism (apache#15992)

* Improve broker error messaging when broker is the one reporting the failure (apache#16076)

* Introduce MSE active and passive timeouts (apache#16075)

* Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271)

* Fix QueryScheduler constructor using class name. (apache#16280)

* Fix QueryScheduler constructor using class name.

* Fix test

* [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* Remove singleton & signature fix

* Fix compatibility checker

* Review comments

* Move WorkloadBudgetManager to core.

---------

Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com>

* Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299)

* Add basic 1 query tests

* Add more tests

* Add ability to remember cancel queries.

* Clean up if conditions in killMostExpensiveQuery

* Fix test failures.

* Address review comments.

* Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142)

* Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374)

* Initialize thread accountant just before serving queries (apache#16326)

* Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360)

* Queries now self terminate if in panic mode. (apache#16380)

* Queries now self terminate if in panic mode.

* Add config test

* Hard kill on critical level.

* Fix configs

* Separate anchor thread interruption.

* Checkstyle

* Review comments

* remove code for critical level

---------

Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>

* [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164)

* fix

* sampling

* Broker sampling

* revert integ-test

* Fix test failures

* review comments

* remove MSE

* broker auth

* remove per pruner & planner sample

* Use Broker's accountant to sample in the request handler. (apache#16439)

* [Query Resource Isolation] Workload Scheduler (apache#16018)

* QRI - WorkloadBudgetManager implementation

* Address review comments

* scheduler

* unit test

* review comments: metrics, secondary, resource-manager

* remove broker admission

* Remove default budget

---------

Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479)

* Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490)

* Add tests for OOM Termination of MSE queries. (apache#16514)

* Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533)

* Disable Flaky Tests (apache#16554)

This is a follow-up to apache#16533
The fix for a flaky test did not work. This PR disables these tests temporarily.

* Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040)

* [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340)

* Interface for Stats Collection

* Additional comments

* inherit

* additional class comments

* [Query Resource Isolation] Fix Refresh message (apache#16636)

* Fix Refresh message

* delete queryworkload message handler

* info -> debug logs

* reduce logging (apache#16698)

* style check

* [Query Workload Isolation] Cost-split support  (apache#16672)

* splits

* Cost split

* test

* propagation entity change & java doc

* Propagation scheme review comments

* empty commit to trigger build

* Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642)

* [refactor] Switching to RoutingManager for broker request handlers (apache#16442)

Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>

* Fix broker request id generator to avoid generating same id (apache#16661)

* Introduce QueryExecutionContext to manage query life cycle (apache#16728)

* Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728)

---------

Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com>
Co-authored-by: Yash Mayya <yash.mayya@gmail.com>
Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com>
Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com>
Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com>
Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>
Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com>
Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com>
Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Configuration Config changes (addition/deletion/change in behavior) documentation enhancement release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants