[Query Resource Isolation] WorkloadBudgetManager and Host enforcement by vvivekiyer · Pull Request #15798 · apache/pinot

vvivekiyer · 2025-05-14T21:17:59Z

Summary

In #14551, we proposed a feature to build Query Isolation in Pinot.

#15109 contains the logic for

Creating Workload Configs with Costs
Splitting the Cost across various instances
Propagating the costs to individual hosts.

This PR contains the the logic for Workload Cost Enforcement on each individual host. It contains:

WorkloadBudgetManager - a budget tracker per workload
ResourceUsageAccountant - an accountant that consistently samples query execution costs and charges it against the query workload.

Implementation Notes

Created a generic accountant called ResourceUsageAccountant that can support both query based accounting and Workload based accounting. It contains all function of the existing account (PerQueryCPUMemAccountantFactory) in addition to workload support. The reason for creating a generic accountant is to make sure Thread&Task state is managed in one place.
Benchmarked the performance of WorkloadBudgetManager implementation. On my mac, it can support 15K ops/ms at a concurrency of 10 threads. This should be sufficient to handle the calls. Note that during query execution, multistage query processing and broker reduce, one call is made every few ms to the WorkloadBudgetManager. In additional, we'll add more sampling and accounting for other parts of query execution like planning, pruning, etc.

TODOs (in subsequent PRs):

Add cost estimation and enforcement for missing pieces in broker/server. eg: broker compilation/optimization, server pruning, etc.
Workload Aware Scheduler that constrains CPU resources for workloads depending on budget usage.
Add relevant metric/debug APIs

praveenc7

Did a initial pass, agree with the overall approach

pinot-core/src/main/java/org/apache/pinot/core/accounting/QueryAggregator.java

praveenc7 · 2025-05-15T16:58:27Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java

    @Override
-    public void updateQueryUsageConcurrently(String queryId) {
+    public void updateResourceUsageConcurrently(String queryId, TrackingScope resourceType) {
+      if (!resourceType.equals(TrackingScope.QUERY)) {


Can we assert?

We can't assert here. This function is called for all scopes. It's a no-op if the associated scope is not enabled.

Can you please help me understand scopes ? IIUC, scopes will be relevant to the accountant added in this pR ? Why will that affect this implementation ?

pinot-core/src/main/java/org/apache/pinot/core/accounting/ResourceUsageAccountantFactory.java

praveenc7 · 2025-05-15T17:32:35Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/ResourceUsageAccountantFactory.java

+              int sleepTime = Integer.MAX_VALUE;
+              for (ResourceAggregator resourceAggregator : _resourceAggregators.values()) {
+                sleepTime = Math.min(sleepTime, resourceAggregator.getAggregationSleepTimeMs());
+              }


PerQueryCPUMemResourceUsageAccountant emitted metrics like _publishHeapUsageMetric , are we planning on having some metrics like those in a later PR?

Yes, plan to create a followup PR for this as stated in the description.

praveenc7 · 2025-05-15T17:55:25Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/WorkloadBudgetManager.java

+  }
+
+  private WorkloadBudgetManager(PinotConfiguration config) {
+    _isEnabled = config.getProperty(CommonConstants.Accounting.CONFIG_OF_WORKLOAD_ENABLE_COST_COLLECTION,


Do we need a config for this given we are only calling this internally in WorkloadAggregator when ResourceUsageAccountant is used ?

We need a config to not enable the BudgetResetTask. This WorkloadBudgetManager will be used in other areas of our code like Scheduler when we build a scheduler based on workload budgets.

praveenc7 · 2025-05-15T18:08:51Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/WorkloadBudgetManager.java

+    public Budget(long cpuBudgetNs, long memoryBudgetBytes) {
+      _initialCpuBudget = cpuBudgetNs;
+      _initialMemoryBudget = memoryBudgetBytes;
+      _cpuRemaining = new AtomicLong(cpuBudgetNs);


Because CPU and memory are tracked in two different AtomicLongs, updates to one budget are invisible to the other at the exact moment they happen.

So the guarantee to “admit a query only when both CPU and memory are available" might be less. So in some cases we might oversubscribe. But it is fine, since this is best-effort

Yes, that's a known compromise where we are trading off accuracy for performance.

vrajat · 2025-05-16T10:12:29Z

Couple of requests w.r.t architecture:

I plan to propose moving away from using globals. It hard to test with globals. For example, it is hard to inject an error/cancellation/budget reached in a server in an integration test. Ref: #15231

Going forward, can you use QueryContext to set thread locals in query execution ? This is provably correct in making sure that the thread locals are setup correctly. It avoids issues like #15045 and safeguards like #15051

vrajat · 2025-05-16T10:17:29Z

...-core/src/main/java/org/apache/pinot/core/accounting/CPUMemThreadLevelAccountingObjects.java


+
+    public String getWorkloadName() {
+      return _workloadName == null ? CommonConstants.Accounting.DEFAULT_WORKLOAD_NAME : _workloadName;


Can the default workload name be forced for a query when a runner/worker is setup ? That will remove this null-check.

I read QueryOptionsUtil after this file. default workload name is chosen if not set. So is this null check required ?

vrajat · 2025-05-16T10:24:35Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/ResourceUsageAccountantFactory.java

+import org.slf4j.LoggerFactory;
+
+
+public class ResourceUsageAccountantFactory implements ThreadAccountantFactory {


Is the watcher task and its usage of WorkloadAggregator and QueryAggregator the main difference wrt to PerQueryCPUMemAccountantFactory ?

vrajat · 2025-05-30T10:59:59Z

pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java

            timeSeriesRequestHandler, _responseStore);
    _brokerRequestHandler.start();

+    // Initialize WorkloadBudgetManager for Query Workload Isolation.


As a follow up to our offline discussion, what is the effort to NOT make the workload manager a global ? Can it be instantiated in the broker or server and then propogated through dependency injection or function parameters ?

One more follow up question. Is the budget manager available only in the broker ?

Initializing it as part of Tracing class. Given it would kind of go hand-in-hand with Tracing. When we decide to refactor the Tracing class to be a non-singlenton we can move this along as well

vrajat · 2025-06-02T14:34:24Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java

 * Design and algorithm see
 * https://docs.google.com/document/d/1Z9DYAfKznHQI9Wn8BjTWZYTcNRVGiPP0B8aEP3w_1jQ
+ *
+ * TODO: Functionalities in this Accountant are now supported in a more generic ResourceUsageAccountantFactory. Keeping


Based on the offline conversation, lets make sure we collaborate as we make improvements to the framework.

vrajat · 2025-06-02T14:35:58Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java

    @Override
-    public void updateQueryUsageConcurrently(String queryId) {
+    public void updateResourceUsageConcurrently(String queryId, TrackingScope resourceType) {
+      if (!resourceType.equals(TrackingScope.QUERY)) {


Can you please help me understand scopes ? IIUC, scopes will be relevant to the accountant added in this pR ? Why will that affect this implementation ?

vrajat · 2025-06-02T14:39:09Z

pinot-spi/src/main/java/org/apache/pinot/spi/trace/Tracing.java

    public final void createExecutionContext(@Nullable String queryId, int taskId,
-        ThreadExecutionContext.TaskType taskType, @Nullable ThreadExecutionContext parentContext) {
+        ThreadExecutionContext.TaskType taskType, @Nullable ThreadExecutionContext parentContext,
+        @Nullable String workloadName) {


Can you added a overloaded function that does not accept workloadName ? That will reduce the diff in files that are not affected by this change.

vrajat · 2025-06-02T14:39:46Z

pinot-spi/src/main/java/org/apache/pinot/spi/trace/Tracing.java


-    public static void setupRunner(@Nonnull String queryId) {
-      setupRunner(queryId, ThreadExecutionContext.TaskType.SSE);
+    public static void setupRunner(@Nonnull String queryId, @Nullable String workloadName) {


Same request here. If you add an overloaded fn. that does not accept workloadName, that will reduce the diff.

vrajat · 2025-06-02T14:40:33Z

pinot-spi/src/main/java/org/apache/pinot/spi/trace/Tracing.java


-    public static void updateQueryUsageConcurrently(String queryId) {
-      Tracing.getThreadAccountant().updateQueryUsageConcurrently(queryId);
+    public static void updateResourceUsageConcurrently(String resourceName, TrackingScope trackingScope) {


Is TrackingScope a feature specific to WorkloadBudgetManager ? Can it be restricted to APIs for that class ?

Our goal is to migrate PerQueryCpuMemAccountant into ResourceManagementAccountant, then retire the former entirely. During the transition we’ll keep both in place so nothing breaks.

TrackingScope lets each sample tell the system which aggregator should record its resource usage. Other way is to have if (accountant instanceof …) branching; the scope tag routes the data to the right place without extra runtime checks.

codecov-commenter · 2025-06-21T00:00:32Z

Codecov Report

Attention: Patch coverage is 57.76772% with 280 lines in your changes missing coverage. Please review.

Project coverage is 63.17%. Comparing base (1a476de) to head (1d7ea11).
Report is 416 commits behind head on master.

Files with missing lines	Patch %	Lines
.../apache/pinot/core/accounting/QueryAggregator.java	47.49%	116 Missing and 41 partials ⚠️
...ore/accounting/ResourceUsageAccountantFactory.java	71.89%	31 Missing and 12 partials ⚠️
...ache/pinot/core/accounting/WorkloadAggregator.java	51.31%	34 Missing and 3 partials ⚠️
...he/pinot/spi/accounting/WorkloadBudgetManager.java	65.78%	22 Missing and 4 partials ⚠️
...re/accounting/PerQueryCPUMemAccountantFactory.java	30.76%	5 Missing and 4 partials ⚠️
...che/pinot/core/query/scheduler/QueryScheduler.java	63.63%	4 Missing ⚠️
...requesthandler/MultiStageBrokerRequestHandler.java	0.00%	2 Missing ⚠️
.../main/java/org/apache/pinot/spi/trace/Tracing.java	80.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #15798      +/-   ##
============================================
+ Coverage     62.90%   63.17%   +0.27%     
+ Complexity     1386     1365      -21     
============================================
  Files          2867     2970     +103     
  Lines        163354   172288    +8934     
  Branches      24952    26379    +1427     
============================================
+ Hits         102755   108841    +6086     
- Misses        52847    55132    +2285     
- Partials       7752     8315     +563

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`63.14% <57.61%> (+0.27%)`	⬆️
java-21	`63.15% <57.76%> (+0.33%)`	⬆️
skip-bytebuffers-false	`?`
skip-bytebuffers-true	`?`
temurin	`63.17% <57.76%> (+0.27%)`	⬆️
unittests	`63.17% <57.76%> (+0.27%)`	⬆️
unittests1	`56.33% <57.81%> (+0.51%)`	⬆️
unittests2	`33.26% <0.30%> (-0.31%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

vrajat · 2025-06-26T04:55:25Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java

+    @Override
+    public void updateQueryUsageConcurrently(String queryId, long cpuTimeNs, long memoryAllocatedBytes,
+                                             TrackingScope trackingScope) {
+      if (trackingScope != TrackingScope.QUERY) {


Is this change related to this feature ?

Yes, this is needed to make sure the existing query killing works. Once we make the ResourceUsageAccountant the default, this entire file can be nuked. Some explanation for why this is needed - #15798 (comment)

I think I finally the grok the code. Watcher task calls two aggregators that will both decide to kill queries independently - one after the other. QueryAggregator continues to be triggered similar to PerQuery...WatcherTask. WorkloadAggregator is triggered by completely different rules.

Another option was to leave QueryAggregator or classic query monitoring code as is and have watch task call workload aggregator after the current code is called in each run. Do you see the possibility of having more aggregators ?

We are heavily invested in the way these classes work and improving it in steps. A switch to the ResourceUsageAccountant along with QA will be a big effort.

Yeah, the switch will involve Q&A effort. So, we'll be doing the Q&A and enabling it in all of our production. We'll deprecate the PerQueryCPUMem..Acountant only after there is sufficient confidence that it works as expected.

pinot-core/src/main/java/org/apache/pinot/core/accounting/ResourceUsageAccountantFactory.java

pinot-core/src/main/java/org/apache/pinot/core/accounting/WorkloadAggregator.java

pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java

vrajat · 2025-06-27T05:31:38Z

pinot-spi/src/main/java/org/apache/pinot/spi/trace/Tracing.java

    public static void initializeThreadAccountant(PinotConfiguration config, String instanceId,
        InstanceType instanceType) {
      String factoryName = config.getProperty(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME);
+      _workloadBudgetManager = new WorkloadBudgetManager(config);


Why is _workLoadBudgetManager instantiated here. IIUC it is used only in WorkloadAggregator. Can it be initialized in the the WorkloadAggregator ?

Also Tracing.java doesn't need to know about it as well. It is specific to ResourceUsageAccountantManager

Initialized it here because it needs to be accessed from QuerySchedulers and other places. Of course it can be moved to WorkloadAggregator and indirectly accessed from there. But given that we are going to eventually replace PerQuery..Acountant, it makes sense to keep it here?

I'll start an offline thread about deprecating PerQuery.... Even if it is removed, the only place it is used is in WorkloadAggregator. Will it be used in other places ? Right now, the code is laid such that SPI only focuses on accounting. startWatcherTask is the only exception (and unnecessary in hindsight). Workload and budget management are implementation dependent and seems like core package is better place.

It seems like we will be using PerQuery... for sometime. Initializing workload budget manager is not required for it. Even if we move away from PerQuery..., there is a good chance that we may adopt a system that doesnt use workload budget manager. The option to not used should be available.

We discussed this offline. Summarizing here:

We'll be exercising caution and ensure that ResourceUsage..Acountant has feature parity and stability before deprecating PerQuery...Accountant

Moved WorkloadBudgetManager to core.

WorkloadBudgetManager is not initialized. It fast exits if the config for workload budget collection is not enabled.

pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageAccountant.java

pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java

vrajat · 2025-06-27T05:51:32Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java

+    @Override
+    public void updateQueryUsageConcurrently(String queryId, long cpuTimeNs, long memoryAllocatedBytes,
+                                             TrackingScope trackingScope) {
+      if (trackingScope != TrackingScope.QUERY) {


I think I finally the grok the code. Watcher task calls two aggregators that will both decide to kill queries independently - one after the other. QueryAggregator continues to be triggered similar to PerQuery...WatcherTask. WorkloadAggregator is triggered by completely different rules.

gortiz · 2025-06-30T16:04:27Z

Created a generic accountant called ResourceUsageAccountant that can support both query based accounting and Workload based accounting. It contains all function of the existing account (PerQueryCPUMemAccountantFactory) in addition to workload support. The reason for creating a generic accountant is to make sure Thread&Task state is managed in one place.

Why do you need that? Couldn't apply a decorate pattern where PerQueryCPUMemAccountantFactory keeps the logic per query and then a WorkloadAccountant groups the per query map into a per workload map?

vvivekiyer · 2025-07-07T18:23:17Z

Couldn't apply a decorate pattern where PerQueryCPUMemAccountantFactory

Discussed this with Rajat. Decided against decorate pattern because - Both query and workload accountants rely on the same underlying state. Managing this shared state across multiple decorator objects risks race conditions, duplicated storage and increases complexity unnecessarily. We'll eventually be replacing PerQuery..Acountant with ResourceUsage..Acountant

vrajat · 2025-07-08T10:30:31Z

pinot-core/src/main/java/org/apache/pinot/core/accounting/PerQueryCPUMemAccountantFactory.java

    @Override
-    public void setupRunner(@Nullable String queryId, int taskId, ThreadExecutionContext.TaskType taskType) {
+    public void setupRunner(String queryId, int taskId, ThreadExecutionContext.TaskType taskType) {
+      setupRunner(queryId, taskId, taskType, null);


If workloadName should never be null (mentioned in other comments), then the last para, should be default workload name ?

vrajat · 2025-07-08T10:44:25Z

pinot-spi/src/main/java/org/apache/pinot/spi/trace/Tracing.java

    public static void initializeThreadAccountant(PinotConfiguration config, String instanceId,
        InstanceType instanceType) {
      String factoryName = config.getProperty(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME);
+      _workloadBudgetManager = new WorkloadBudgetManager(config);


I'll start an offline thread about deprecating PerQuery.... Even if it is removed, the only place it is used is in WorkloadAggregator. Will it be used in other places ? Right now, the code is laid such that SPI only focuses on accounting. startWatcherTask is the only exception (and unnecessary in hindsight). Workload and budget management are implementation dependent and seems like core package is better place.

It seems like we will be using PerQuery... for sometime. Initializing workload budget manager is not required for it. Even if we move away from PerQuery..., there is a good chance that we may adopt a system that doesnt use workload budget manager. The option to not used should be available.

* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>

* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) * [refactor] Switching to RoutingManager for broker request handlers (apache#16442) Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com> * Fix broker request id generator to avoid generating same id (apache#16661) * Introduce QueryExecutionContext to manage query life cycle (apache#16728) * Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>

QRI - WorkloadBudgetManager implementation

5ee40a1

vvivekiyer force-pushed the QRI_Test branch from d68e502 to 5ee40a1 Compare May 14, 2025 21:26

vvivekiyer marked this pull request as ready for review May 14, 2025 21:26

vvivekiyer changed the title ~~Query Resource Isolation in Pinot - WorkloadBudgetManager and Host enforcement~~ Query Resource Isolation - WorkloadBudgetManager and Host enforcement May 14, 2025

vvivekiyer changed the title ~~Query Resource Isolation - WorkloadBudgetManager and Host enforcement~~ [Query Resource Isolation] WorkloadBudgetManager and Host enforcement May 14, 2025

praveenc7 reviewed May 15, 2025

View reviewed changes

vrajat reviewed May 16, 2025

View reviewed changes

Jackie-Jiang added the feature label May 19, 2025

Address review comments

09ca1ef

vrajat reviewed Jun 2, 2025

View reviewed changes

Merge branch 'master' into pr-15798

c753371

praveenc7 force-pushed the QRI_Test branch 4 times, most recently from 15ca036 to 1eff1b1 Compare June 23, 2025 23:42

Remove singleton & signature fix

bbdacab

praveenc7 force-pushed the QRI_Test branch from 1eff1b1 to bbdacab Compare June 25, 2025 00:13

Merge branch 'master' into pr-15798

ab415cf

vrajat reviewed Jun 26, 2025

View reviewed changes

vvivekiyer and others added 2 commits June 26, 2025 10:25

Fix compatibility checker

a9a6240

Merge branch 'apache:master' into QRI_Test

30846f0

vrajat reviewed Jun 27, 2025

View reviewed changes

Merge remote-tracking branch 'origin/master' into QRI_Test

08baa19

vvivekiyer force-pushed the QRI_Test branch 2 times, most recently from 2acdf4f to 1d7ea11 Compare July 7, 2025 23:08

vrajat reviewed Jul 8, 2025

View reviewed changes

vvivekiyer added 2 commits July 8, 2025 09:33

Review comments

051baad

Move WorkloadBudgetManager to core.

56f6c37

vvivekiyer force-pushed the QRI_Test branch from 1d7ea11 to 56f6c37 Compare July 9, 2025 16:33

siddharthteotia approved these changes Jul 9, 2025

View reviewed changes

siddharthteotia merged commit 75fb18b into apache:master Jul 9, 2025
35 of 36 checks passed

Jackie-Jiang added the documentation label Jul 16, 2025

Jackie-Jiang mentioned this pull request Feb 24, 2026

Fix wrong interval for OOM protection #17753

Merged



		public String getWorkloadName() {
		return _workloadName == null ? CommonConstants.Accounting.DEFAULT_WORKLOAD_NAME : _workloadName;

		import org.slf4j.LoggerFactory;


		public class ResourceUsageAccountantFactory implements ThreadAccountantFactory {

Conversation

vvivekiyer commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation Notes

TODOs (in subsequent PRs):

Uh oh!

praveenc7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vrajat commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vrajat Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vvivekiyer commented May 14, 2025 •

edited

Loading

vrajat commented May 16, 2025 •

edited

Loading

codecov-commenter commented Jun 21, 2025 •

edited

Loading

vrajat Jun 27, 2025 •

edited

Loading

vvivekiyer commented Jul 7, 2025 •

edited

Loading