[Query Resource Isolation] WorkloadBudgetManager and Host enforcement#15798
[Query Resource Isolation] WorkloadBudgetManager and Host enforcement#15798siddharthteotia merged 10 commits intoapache:masterfrom
Conversation
praveenc7
left a comment
There was a problem hiding this comment.
Did a initial pass, agree with the overall approach
pinot-core/src/main/java/org/apache/pinot/core/accounting/QueryAggregator.java
Show resolved
Hide resolved
| @Override | ||
| public void updateQueryUsageConcurrently(String queryId) { | ||
| public void updateResourceUsageConcurrently(String queryId, TrackingScope resourceType) { | ||
| if (!resourceType.equals(TrackingScope.QUERY)) { |
There was a problem hiding this comment.
We can't assert here. This function is called for all scopes. It's a no-op if the associated scope is not enabled.
There was a problem hiding this comment.
Can you please help me understand scopes ? IIUC, scopes will be relevant to the accountant added in this pR ? Why will that affect this implementation ?
pinot-core/src/main/java/org/apache/pinot/core/accounting/ResourceUsageAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/accounting/ResourceUsageAccountantFactory.java
Outdated
Show resolved
Hide resolved
| int sleepTime = Integer.MAX_VALUE; | ||
| for (ResourceAggregator resourceAggregator : _resourceAggregators.values()) { | ||
| sleepTime = Math.min(sleepTime, resourceAggregator.getAggregationSleepTimeMs()); | ||
| } |
There was a problem hiding this comment.
PerQueryCPUMemResourceUsageAccountant emitted metrics like _publishHeapUsageMetric , are we planning on having some metrics like those in a later PR?
There was a problem hiding this comment.
Yes, plan to create a followup PR for this as stated in the description.
| } | ||
|
|
||
| private WorkloadBudgetManager(PinotConfiguration config) { | ||
| _isEnabled = config.getProperty(CommonConstants.Accounting.CONFIG_OF_WORKLOAD_ENABLE_COST_COLLECTION, |
There was a problem hiding this comment.
Do we need a config for this given we are only calling this internally in WorkloadAggregator when ResourceUsageAccountant is used ?
There was a problem hiding this comment.
We need a config to not enable the BudgetResetTask. This WorkloadBudgetManager will be used in other areas of our code like Scheduler when we build a scheduler based on workload budgets.
| public Budget(long cpuBudgetNs, long memoryBudgetBytes) { | ||
| _initialCpuBudget = cpuBudgetNs; | ||
| _initialMemoryBudget = memoryBudgetBytes; | ||
| _cpuRemaining = new AtomicLong(cpuBudgetNs); |
There was a problem hiding this comment.
Because CPU and memory are tracked in two different AtomicLongs, updates to one budget are invisible to the other at the exact moment they happen.
So the guarantee to “admit a query only when both CPU and memory are available" might be less. So in some cases we might oversubscribe. But it is fine, since this is best-effort
There was a problem hiding this comment.
Yes, that's a known compromise where we are trading off accuracy for performance.
|
Couple of requests w.r.t architecture: I plan to propose moving away from using globals. It hard to test with globals. For example, it is hard to inject an error/cancellation/budget reached in a server in an integration test. Ref: #15231 Going forward, can you use QueryContext to set thread locals in query execution ? This is provably correct in making sure that the thread locals are setup correctly. It avoids issues like #15045 and safeguards like #15051 |
|
|
||
|
|
||
| public String getWorkloadName() { | ||
| return _workloadName == null ? CommonConstants.Accounting.DEFAULT_WORKLOAD_NAME : _workloadName; |
There was a problem hiding this comment.
Can the default workload name be forced for a query when a runner/worker is setup ? That will remove this null-check.
There was a problem hiding this comment.
I read QueryOptionsUtil after this file. default workload name is chosen if not set. So is this null check required ?
| import org.slf4j.LoggerFactory; | ||
|
|
||
|
|
||
| public class ResourceUsageAccountantFactory implements ThreadAccountantFactory { |
There was a problem hiding this comment.
Is the watcher task and its usage of WorkloadAggregator and QueryAggregator the main difference wrt to PerQueryCPUMemAccountantFactory ?
| timeSeriesRequestHandler, _responseStore); | ||
| _brokerRequestHandler.start(); | ||
|
|
||
| // Initialize WorkloadBudgetManager for Query Workload Isolation. |
There was a problem hiding this comment.
As a follow up to our offline discussion, what is the effort to NOT make the workload manager a global ? Can it be instantiated in the broker or server and then propogated through dependency injection or function parameters ?
There was a problem hiding this comment.
One more follow up question. Is the budget manager available only in the broker ?
There was a problem hiding this comment.
Initializing it as part of Tracing class. Given it would kind of go hand-in-hand with Tracing. When we decide to refactor the Tracing class to be a non-singlenton we can move this along as well
| * Design and algorithm see | ||
| * https://docs.google.com/document/d/1Z9DYAfKznHQI9Wn8BjTWZYTcNRVGiPP0B8aEP3w_1jQ | ||
| * | ||
| * TODO: Functionalities in this Accountant are now supported in a more generic ResourceUsageAccountantFactory. Keeping |
There was a problem hiding this comment.
Based on the offline conversation, lets make sure we collaborate as we make improvements to the framework.
| @Override | ||
| public void updateQueryUsageConcurrently(String queryId) { | ||
| public void updateResourceUsageConcurrently(String queryId, TrackingScope resourceType) { | ||
| if (!resourceType.equals(TrackingScope.QUERY)) { |
There was a problem hiding this comment.
Can you please help me understand scopes ? IIUC, scopes will be relevant to the accountant added in this pR ? Why will that affect this implementation ?
| public final void createExecutionContext(@Nullable String queryId, int taskId, | ||
| ThreadExecutionContext.TaskType taskType, @Nullable ThreadExecutionContext parentContext) { | ||
| ThreadExecutionContext.TaskType taskType, @Nullable ThreadExecutionContext parentContext, | ||
| @Nullable String workloadName) { |
There was a problem hiding this comment.
Can you added a overloaded function that does not accept workloadName ? That will reduce the diff in files that are not affected by this change.
|
|
||
| public static void setupRunner(@Nonnull String queryId) { | ||
| setupRunner(queryId, ThreadExecutionContext.TaskType.SSE); | ||
| public static void setupRunner(@Nonnull String queryId, @Nullable String workloadName) { |
There was a problem hiding this comment.
Same request here. If you add an overloaded fn. that does not accept workloadName, that will reduce the diff.
|
|
||
| public static void updateQueryUsageConcurrently(String queryId) { | ||
| Tracing.getThreadAccountant().updateQueryUsageConcurrently(queryId); | ||
| public static void updateResourceUsageConcurrently(String resourceName, TrackingScope trackingScope) { |
There was a problem hiding this comment.
Is TrackingScope a feature specific to WorkloadBudgetManager ? Can it be restricted to APIs for that class ?
There was a problem hiding this comment.
Our goal is to migrate PerQueryCpuMemAccountant into ResourceManagementAccountant, then retire the former entirely. During the transition we’ll keep both in place so nothing breaks.
TrackingScope lets each sample tell the system which aggregator should record its resource usage. Other way is to have if (accountant instanceof …) branching; the scope tag routes the data to the right place without extra runtime checks.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15798 +/- ##
============================================
+ Coverage 62.90% 63.17% +0.27%
+ Complexity 1386 1365 -21
============================================
Files 2867 2970 +103
Lines 163354 172288 +8934
Branches 24952 26379 +1427
============================================
+ Hits 102755 108841 +6086
- Misses 52847 55132 +2285
- Partials 7752 8315 +563
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
15ca036 to
1eff1b1
Compare
| @Override | ||
| public void updateQueryUsageConcurrently(String queryId, long cpuTimeNs, long memoryAllocatedBytes, | ||
| TrackingScope trackingScope) { | ||
| if (trackingScope != TrackingScope.QUERY) { |
There was a problem hiding this comment.
Is this change related to this feature ?
There was a problem hiding this comment.
Yes, this is needed to make sure the existing query killing works. Once we make the ResourceUsageAccountant the default, this entire file can be nuked. Some explanation for why this is needed - #15798 (comment)
There was a problem hiding this comment.
I think I finally the grok the code. Watcher task calls two aggregators that will both decide to kill queries independently - one after the other. QueryAggregator continues to be triggered similar to PerQuery...WatcherTask. WorkloadAggregator is triggered by completely different rules.
There was a problem hiding this comment.
Another option was to leave QueryAggregator or classic query monitoring code as is and have watch task call workload aggregator after the current code is called in each run. Do you see the possibility of having more aggregators ?
There was a problem hiding this comment.
We are heavily invested in the way these classes work and improving it in steps. A switch to the ResourceUsageAccountant along with QA will be a big effort.
There was a problem hiding this comment.
Yeah, the switch will involve Q&A effort. So, we'll be doing the Q&A and enabling it in all of our production. We'll deprecate the PerQueryCPUMem..Acountant only after there is sufficient confidence that it works as expected.
pinot-core/src/main/java/org/apache/pinot/core/accounting/ResourceUsageAccountantFactory.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/accounting/WorkloadAggregator.java
Show resolved
Hide resolved
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java
Show resolved
Hide resolved
| public static void initializeThreadAccountant(PinotConfiguration config, String instanceId, | ||
| InstanceType instanceType) { | ||
| String factoryName = config.getProperty(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME); | ||
| _workloadBudgetManager = new WorkloadBudgetManager(config); |
There was a problem hiding this comment.
Why is _workLoadBudgetManager instantiated here. IIUC it is used only in WorkloadAggregator. Can it be initialized in the the WorkloadAggregator ?
There was a problem hiding this comment.
Also Tracing.java doesn't need to know about it as well. It is specific to ResourceUsageAccountantManager
There was a problem hiding this comment.
Initialized it here because it needs to be accessed from QuerySchedulers and other places. Of course it can be moved to WorkloadAggregator and indirectly accessed from there. But given that we are going to eventually replace PerQuery..Acountant, it makes sense to keep it here?
There was a problem hiding this comment.
I'll start an offline thread about deprecating PerQuery.... Even if it is removed, the only place it is used is in WorkloadAggregator. Will it be used in other places ? Right now, the code is laid such that SPI only focuses on accounting. startWatcherTask is the only exception (and unnecessary in hindsight). Workload and budget management are implementation dependent and seems like core package is better place.
It seems like we will be using PerQuery... for sometime. Initializing workload budget manager is not required for it. Even if we move away from PerQuery..., there is a good chance that we may adopt a system that doesnt use workload budget manager. The option to not used should be available.
There was a problem hiding this comment.
We discussed this offline. Summarizing here:
- We'll be exercising caution and ensure that
ResourceUsage..Acountanthas feature parity and stability before deprecatingPerQuery...Accountant - Moved WorkloadBudgetManager to core.
- WorkloadBudgetManager is not initialized. It fast exits if the config for workload budget collection is not enabled.
pinot-spi/src/main/java/org/apache/pinot/spi/accounting/ThreadResourceUsageAccountant.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void updateQueryUsageConcurrently(String queryId, long cpuTimeNs, long memoryAllocatedBytes, | ||
| TrackingScope trackingScope) { | ||
| if (trackingScope != TrackingScope.QUERY) { |
There was a problem hiding this comment.
I think I finally the grok the code. Watcher task calls two aggregators that will both decide to kill queries independently - one after the other. QueryAggregator continues to be triggered similar to PerQuery...WatcherTask. WorkloadAggregator is triggered by completely different rules.
Why do you need that? Couldn't apply a decorate pattern where PerQueryCPUMemAccountantFactory keeps the logic per query and then a WorkloadAccountant groups the per query map into a per workload map? |
Discussed this with Rajat. Decided against decorate pattern because - Both query and workload accountants rely on the same underlying state. Managing this shared state across multiple decorator objects risks race conditions, duplicated storage and increases complexity unnecessarily. We'll eventually be replacing |
2acdf4f to
1d7ea11
Compare
| @Override | ||
| public void setupRunner(@Nullable String queryId, int taskId, ThreadExecutionContext.TaskType taskType) { | ||
| public void setupRunner(String queryId, int taskId, ThreadExecutionContext.TaskType taskType) { | ||
| setupRunner(queryId, taskId, taskType, null); |
There was a problem hiding this comment.
If workloadName should never be null (mentioned in other comments), then the last para, should be default workload name ?
| public static void initializeThreadAccountant(PinotConfiguration config, String instanceId, | ||
| InstanceType instanceType) { | ||
| String factoryName = config.getProperty(CommonConstants.Accounting.CONFIG_OF_FACTORY_NAME); | ||
| _workloadBudgetManager = new WorkloadBudgetManager(config); |
There was a problem hiding this comment.
I'll start an offline thread about deprecating PerQuery.... Even if it is removed, the only place it is used is in WorkloadAggregator. Will it be used in other places ? Right now, the code is laid such that SPI only focuses on accounting. startWatcherTask is the only exception (and unnecessary in hindsight). Workload and budget management are implementation dependent and seems like core package is better place.
It seems like we will be using PerQuery... for sometime. Initializing workload budget manager is not required for it. Even if we move away from PerQuery..., there is a good chance that we may adopt a system that doesnt use workload budget manager. The option to not used should be available.
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com>
* [Query Resource Isolation] Workload Configs (apache#15109) * Workload Configs * workload config * Add API * config * Change config structure * Propagation strategy * Fix style check * Cost spliting on update * Table addition propagation * perf * Tests * test * test 2 * Review comments 1 * review comments 3 * review comments 3 * name change * review comments 4 * Fix TableDoesNotExistError for hybrid tables in MSE queries in controller API (apache#16102) * Make ThreadResourceUsageProvider a Helper/Utility Class. (apache#16051) * ThreadResourceUsageProvider is a helper class. ThreadResourceContext tracks resource usage. Fix updateConcurrently * Rename to ThreadResourceSnapshot * Clean up * Add javadoc * Done use auto closeable * Checkstyle * Fix compilation error * Add back removed functions in SPI * Remove private constructor because japicmp complains. * Add setThreadResourceUsageProvider because of backward-incompatible checks * Add setThreadResourceUsageProvider because of backward-incompatible checks * Fix test * Fix ThreadResourceSnapshot usage and tests * Store cpu sample in nanoseconds. * Reduce logs and improve logging when queries are terminated due to OOM. (apache#16172) * Dynamic PerQueryCPUMemAccountant Config on Servers (apache#16219) * Checkpoint * Register change handler * Fix bugs. Manually tested * Checkstyle * Tests * Add pre-check that values are default * Undo typo fix * Update QueryRunner to make use of window function overflow handling server configurations (apache#16108) * Add multistage thread limiting configs at the broker and server level (apache#16080) * Adding changes for supporting RLS (apache#16043) * Use stats cache on error instead of the chained mechanism (apache#15992) * Improve broker error messaging when broker is the one reporting the failure (apache#16076) * Introduce MSE active and passive timeouts (apache#16075) * Throttle SSE & MSE Tasks if Server heap usage is above a threshold (apache#16271) * Fix QueryScheduler constructor using class name. (apache#16280) * Fix QueryScheduler constructor using class name. * Fix test * [Query Resource Isolation] WorkloadBudgetManager and Host enforcement (apache#15798) * QRI - WorkloadBudgetManager implementation * Address review comments * Remove singleton & signature fix * Fix compatibility checker * Review comments * Move WorkloadBudgetManager to core. --------- Co-authored-by: praveenc7 <praveenkchaganlal@gmail.com> * Eliminate duplicate cancel attempts in PerQueryCPUMemAccountant (apache#16299) * Add basic 1 query tests * Add more tests * Add ability to remember cancel queries. * Clean up if conditions in killMostExpensiveQuery * Fix test failures. * Address review comments. * Use QueryCancelCallback to cancel queries from ThreadResourceUsageAccountant (apache#16142) * Remove all calls to System.gc() in PerQueryCPUMemAccountantFactory (apache#16374) * Initialize thread accountant just before serving queries (apache#16326) * Allow Reset of ThreadResourceUsageAccountant in Tracing.java (apache#16360) * Queries now self terminate if in panic mode. (apache#16380) * Queries now self terminate if in panic mode. * Add config test * Hard kill on critical level. * Fix configs * Separate anchor thread interruption. * Checkstyle * Review comments * remove code for critical level --------- Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> * [Query Resource Isolation] Additonal Sampling for Broker and Server (apache#16164) * fix * sampling * Broker sampling * revert integ-test * Fix test failures * review comments * remove MSE * broker auth * remove per pruner & planner sample * Use Broker's accountant to sample in the request handler. (apache#16439) * [Query Resource Isolation] Workload Scheduler (apache#16018) * QRI - WorkloadBudgetManager implementation * Address review comments * scheduler * unit test * review comments: metrics, secondary, resource-manager * remove broker admission * Remove default budget --------- Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> * Cleanup deprecated methods in ThreadResourceUsageAccountant (apache#16479) * Remove unnecessary methods and config for ThreadResourceUsageAccountant (apache#16490) * Add tests for OOM Termination of MSE queries. (apache#16514) * Fix a flaky assert when testing OOM Cancellation of MSE Queries (apache#16533) * Disable Flaky Tests (apache#16554) This is a follow-up to apache#16533 The fix for a flaky test did not work. This PR disables these tests temporarily. * Use correlation ID instead of request id in PerQueryCpuMemAccountant (apache#16040) * [Query Resource Isolation]Interface for Workload Stats Collection (apache#16340) * Interface for Stats Collection * Additional comments * inherit * additional class comments * [Query Resource Isolation] Fix Refresh message (apache#16636) * Fix Refresh message * delete queryworkload message handler * info -> debug logs * reduce logging (apache#16698) * style check * [Query Workload Isolation] Cost-split support (apache#16672) * splits * Cost split * test * propagation entity change & java doc * Propagation scheme review comments * empty commit to trigger build * Reduce log for PerQueryCPUMemResourceUsageAccountant (apache#16642) * [refactor] Switching to RoutingManager for broker request handlers (apache#16442) Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com> * Fix broker request id generator to avoid generating same id (apache#16661) * Introduce QueryExecutionContext to manage query life cycle (apache#16728) * Introduce QueryExecutionContext to manage query life cycle 2 (apache#16728) --------- Co-authored-by: Rajat Venkatesh <1638298+vrajat@users.noreply.github.com> Co-authored-by: Yash Mayya <yash.mayya@gmail.com> Co-authored-by: Satwik Pachigolla <40644097+satwik-pachigolla@users.noreply.github.com> Co-authored-by: 9aman <35227405+9aman@users.noreply.github.com> Co-authored-by: Gonzalo Ortiz Jaureguizar <gortiz@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan <vvivekiyer@gmail.com> Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com> Co-authored-by: Rajat Venkatesh <vrajat@users.noreply.github.com> Co-authored-by: Vivek Iyer Vaidyanathan Iyer <vvaidyanathan@linkedin.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@gmail.com> Co-authored-by: Shaurya Chaturvedi <shauryachats@uber.com>
Summary
In #14551, we proposed a feature to build Query Isolation in Pinot.
#15109 contains the logic for
This PR contains the the logic for Workload Cost Enforcement on each individual host. It contains:
Implementation Notes
Created a generic accountant called
ResourceUsageAccountantthat can support both query based accounting and Workload based accounting. It contains all function of the existing account (PerQueryCPUMemAccountantFactory) in addition to workload support. The reason for creating a generic accountant is to make sure Thread&Task state is managed in one place.Benchmarked the performance of WorkloadBudgetManager implementation. On my mac, it can support 15K ops/ms at a concurrency of 10 threads. This should be sufficient to handle the calls. Note that during query execution, multistage query processing and broker reduce, one call is made every few ms to the WorkloadBudgetManager. In additional, we'll add more sampling and accounting for other parts of query execution like planning, pruning, etc.
TODOs (in subsequent PRs):