Use resource.limit.cpu for --num_cpu gflag #198

ghaskins · 2025-03-14T18:58:42Z

The current code uses resources.requests.cpus for setting the gflag num_cpu hint. The problem is that it leaves us no choice but to reserve the cores for YB, which has a few limitations.

For background: Kubernetes can interact with the Linux QOS features of the task scheduler. It does so by interpetting resource declarations in a specific way.

For more information, see: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/

At a high level, there are three classes to consider:

Guaranteed
Burstable
BestEffort

'BestEffort', as the name implies, does nothing special: processes get time if time is available. 'Guaranteed' is the exact opposite, where a process has time reserved specifically for it. Burstable is a hybrid of both, where two thresholds are defined: A lower, guaranteed level (i.e. "requests") and a "burstable" upper bound (i.e., "limits") where the difference between the two is best effort.

In using 'requests' level to set the --num_cpu, we create a limitation: It becomes impossible to leverage any best-effort cycles that might be available on our system. Consider the following:

  tserver:
    requests:
      cpu: 8
    limit:
      cpu: 16

In this configuration, Kubernetes would assign the Burstable QOS to the tserver pod and reserve 8 cores, allowing it to burst up to 16. However, since the gflags would be set to --num_cpus=8, the tserver will not try to use more than 8, leaving potential headroom on the table.

The limitation makes things more challenging because the user is left to find the exact number of available millicores that can be dedicated to YB. While ample dedicated resources are a good idea for a production deployment, there are a few scenarios where this is undesirable:

Development systems
Reducing "bin-packing" complexity in production.

Scenario 1 - Development

During development, the user likely doesn't need guaranteed performance but would like to have YB use as much as is free at a given moment. For an 8-core development cluster, the user may want to set requests to 1m, limit to 8, and let it compete for CPU time, depending on what is happening.

Scenario 2 - "Bin packing" in production

Say a user has 3 16-core nodes dedicated to YB via taints/tolerations. However, each node has a few system daemons (CSI, metric scrapers, etc.) that take up 765m CPU, leaving 15235m available.

Today, the user would have to figure out that number above and set YB to request 15235m. If something changes, the user might leave headroom on the table, or Kubernetes may fail to schedule the pod depending on the value shifts. Instead, we would like to be able to configure it simply with something like

  tserver:
    requests:
      cpu: 15
    limit:
      cpu: 16

This sets the pod to Burstable with 15 guaranteed cores, but with the option to burst to the full 16 YB is busy but other services are not.

So, with this proposal, users can still achieve the previous result by simply setting requests = limits, and effectively getting Guaranteed QOS for the full --num_cores, while allowing users to leverage Burstable for other scenarios where it has advantages.

The current code uses resources.requests.cpus for setting the gflag num_cpu hint. The problem is that it leaves us no choice but to reserve the cores for YB, which has a few limitations. For background: Kubernetes can interact with the Linux QOS features of the task scheduler. It does so by interpetting resource declarations in a specific way. For more information, see: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/ At a high level, there are three classes to consider: - Guaranteed - Burstable - BestEffort 'BestEffort', as the name implies, does nothing special: processes get time if time is available. 'Guaranteed' is the exact opposite, where a process has time reserved specifically for it. Burstable is a hybrid of both, where two thresholds are defined: A lower, guaranteed level (i.e. "requests") and a "burstable" upper bound (i.e., "limits") where the difference between the two is best effort. In using 'requests' level to set the --num_cpu, we create a limitation: It becomes impossible to leverage any best-effort cycles that might be available on our system. Consider the following: ``` tserver: requests: cpu: 8 limit: cpu: 16 ``` In this configuration, Kubernetes would assign the Burstable QOS to the tserver pod and reserve 8 cores, allowing it to burst up to 16. However, since the gflags would be set to --num_cpus=8, the tserver will not try to use more than 8, leaving potential headroom on the table. The limitation makes things more challenging because the user is left to find the exact number of available millicores that can be dedicated to YB. While ample dedicated resources are a good idea for a production deployment, there are a few scenarios where this is undesirable: 1. Development systems 2. Reducing "bin-packing" complexity in production. ## Scenario 1 - Development During development, the user likely doesn't need guaranteed performance but would like to have YB use as much as is free at a given moment. For an 8-core development cluster, the user may want to set requests to 1m, limit to 8, and let it compete for CPU time, depending on what is happening. ## Scenario 2 - "Bin packing" in production Say a user has 3 16-core nodes dedicated to YB via taints/tolerations. However, each node has a few system daemons (CSI, metric scrapers, etc.) that take up 765m CPU, leaving 15235m available. Today, the user would have to figure out that number above and set YB to request 15235m. If something changes, the user might leave headroom on the table, or Kubernetes may fail to schedule the pod depending on the value shifts. Instead, we would like to be able to configure it simply with something like ``` tserver: requests: cpu: 15 limit: cpu: 16 ``` This sets the pod to Burstable with 15 guaranteed cores, but with the option to burst to the full 16 YB is busy but other services are not. So, with this proposal, users can still achieve the previous result by simply setting requests = limits, and effectively getting Guaranteed QOS for the full --num_cores, while allowing users to leverage Burstable for other scenarios where it has advantages. Signed-off-by: Greg Haskins <greg@manetu.com>

CLAassistant · 2025-03-14T18:58:48Z

All committers have signed the CLA.

iSignal requested review from amannijhawan and kv83821-yb April 2, 2025 18:00

amannijhawan approved these changes Apr 2, 2025

View reviewed changes

kv83821-yb approved these changes Apr 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use resource.limit.cpu for --num_cpu gflag #198

Use resource.limit.cpu for --num_cpu gflag #198

Uh oh!

ghaskins commented Mar 14, 2025

Uh oh!

CLAassistant commented Mar 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use resource.limit.cpu for --num_cpu gflag #198

Are you sure you want to change the base?

Use resource.limit.cpu for --num_cpu gflag #198

Uh oh!

Conversation

ghaskins commented Mar 14, 2025

Scenario 1 - Development

Scenario 2 - "Bin packing" in production

Uh oh!

CLAassistant commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Mar 14, 2025 •

edited

Loading