-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add maximumCapacity to taskRunner #17107
Add maximumCapacity to taskRunner #17107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, left some minor suggestions.
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
Outdated
Show resolved
Hide resolved
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
Outdated
Show resolved
Hide resolved
@@ -1648,6 +1649,30 @@ public int getTotalCapacity() | |||
return getWorkers().stream().mapToInt(workerInfo -> workerInfo.getWorker().getCapacity()).sum(); | |||
} | |||
|
|||
@Override | |||
public int getMaximumCapacity() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a javadoc here listing the cases when this method returns -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this method body should be commoned out between RemoteTaskRunner
and HttpTaskRunner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were you thinking of introducing a new class to do this?
Move logic for calculating maximumCapacity (tatal capacity based on max workers from autoscaling) to task runners
Description
It is possible to get weird responses from the /totalWorkerCapacity endpoint if mmless ingestion is enabled and the overlord dynamic.autoscaler config is set. This is because the TaskQueryTool.getTotalWorkerCapacity gets totalCapacity from the overlord's task runner but gets maximumCapacity directly from the dynamic config.
I think it makes sense to just expose a getMaximumCapacity method on TaskRunner that defaults to -1 (the default value of maximumCapacity) and gets overwritten.
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
In order to move getMaximumCapacity logic into each TaskRunner, i had to duplicate logic between HttpRemoteTaskRunner and RemoteTaskRunner since they have the same logic (check the overlord dynamic configuration), but I think this is okay in order for the task runners to be responsible for their own max capacity values.
Release note
Key changed/added classes in this PR
TaskRunner
HttpRemoteTaskRunner/RemoteTaskRunner
KubernetesAndWorkerTaskRunner/KubernetesTaskRunner
TaskQueryTool
This PR has: