Use resource.limit.cpu for --num_cpu gflag #198
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current code uses resources.requests.cpus for setting the gflag num_cpu hint. The problem is that it leaves us no choice but to reserve the cores for YB, which has a few limitations.
For background: Kubernetes can interact with the Linux QOS features of the task scheduler. It does so by interpetting resource declarations in a specific way.
For more information, see: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/
At a high level, there are three classes to consider:
'BestEffort', as the name implies, does nothing special: processes get time if time is available. 'Guaranteed' is the exact opposite, where a process has time reserved specifically for it. Burstable is a hybrid of both, where two thresholds are defined: A lower, guaranteed level (i.e. "requests") and a "burstable" upper bound (i.e., "limits") where the difference between the two is best effort.
In using 'requests' level to set the --num_cpu, we create a limitation: It becomes impossible to leverage any best-effort cycles that might be available on our system. Consider the following:
In this configuration, Kubernetes would assign the Burstable QOS to the tserver pod and reserve 8 cores, allowing it to burst up to 16. However, since the gflags would be set to --num_cpus=8, the tserver will not try to use more than 8, leaving potential headroom on the table.
The limitation makes things more challenging because the user is left to find the exact number of available millicores that can be dedicated to YB. While ample dedicated resources are a good idea for a production deployment, there are a few scenarios where this is undesirable:
Scenario 1 - Development
During development, the user likely doesn't need guaranteed performance but would like to have YB use as much as is free at a given moment. For an 8-core development cluster, the user may want to set requests to 1m, limit to 8, and let it compete for CPU time, depending on what is happening.
Scenario 2 - "Bin packing" in production
Say a user has 3 16-core nodes dedicated to YB via taints/tolerations. However, each node has a few system daemons (CSI, metric scrapers, etc.) that take up 765m CPU, leaving 15235m available.
Today, the user would have to figure out that number above and set YB to request 15235m. If something changes, the user might leave headroom on the table, or Kubernetes may fail to schedule the pod depending on the value shifts. Instead, we would like to be able to configure it simply with something like
This sets the pod to Burstable with 15 guaranteed cores, but with the option to burst to the full 16 YB is busy but other services are not.
So, with this proposal, users can still achieve the previous result by simply setting requests = limits, and effectively getting Guaranteed QOS for the full --num_cores, while allowing users to leverage Burstable for other scenarios where it has advantages.