Skip to content

Conversation

@ghaskins
Copy link

The current code uses resources.requests.cpus for setting the gflag num_cpu hint. The problem is that it leaves us no choice but to reserve the cores for YB, which has a few limitations.

For background: Kubernetes can interact with the Linux QOS features of the task scheduler. It does so by interpetting resource declarations in a specific way.

For more information, see: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/

At a high level, there are three classes to consider:

  • Guaranteed
  • Burstable
  • BestEffort

'BestEffort', as the name implies, does nothing special: processes get time if time is available. 'Guaranteed' is the exact opposite, where a process has time reserved specifically for it. Burstable is a hybrid of both, where two thresholds are defined: A lower, guaranteed level (i.e. "requests") and a "burstable" upper bound (i.e., "limits") where the difference between the two is best effort.

In using 'requests' level to set the --num_cpu, we create a limitation: It becomes impossible to leverage any best-effort cycles that might be available on our system. Consider the following:

  tserver:
    requests:
      cpu: 8
    limit:
      cpu: 16

In this configuration, Kubernetes would assign the Burstable QOS to the tserver pod and reserve 8 cores, allowing it to burst up to 16. However, since the gflags would be set to --num_cpus=8, the tserver will not try to use more than 8, leaving potential headroom on the table.

The limitation makes things more challenging because the user is left to find the exact number of available millicores that can be dedicated to YB. While ample dedicated resources are a good idea for a production deployment, there are a few scenarios where this is undesirable:

  1. Development systems
  2. Reducing "bin-packing" complexity in production.

Scenario 1 - Development

During development, the user likely doesn't need guaranteed performance but would like to have YB use as much as is free at a given moment. For an 8-core development cluster, the user may want to set requests to 1m, limit to 8, and let it compete for CPU time, depending on what is happening.

Scenario 2 - "Bin packing" in production

Say a user has 3 16-core nodes dedicated to YB via taints/tolerations. However, each node has a few system daemons (CSI, metric scrapers, etc.) that take up 765m CPU, leaving 15235m available.

Today, the user would have to figure out that number above and set YB to request 15235m. If something changes, the user might leave headroom on the table, or Kubernetes may fail to schedule the pod depending on the value shifts. Instead, we would like to be able to configure it simply with something like

  tserver:
    requests:
      cpu: 15
    limit:
      cpu: 16

This sets the pod to Burstable with 15 guaranteed cores, but with the option to burst to the full 16 YB is busy but other services are not.

So, with this proposal, users can still achieve the previous result by simply setting requests = limits, and effectively getting Guaranteed QOS for the full --num_cores, while allowing users to leverage Burstable for other scenarios where it has advantages.

The current code uses resources.requests.cpus for setting the gflag num_cpu hint.  The problem
is that it leaves us no choice but to reserve the cores for YB, which has a few limitations.

For background: Kubernetes can interact with the Linux QOS features of the task scheduler.  It
does so by interpetting resource declarations in a specific way.

For more information, see:  https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/

At a high level, there are three classes to consider:

- Guaranteed
- Burstable
- BestEffort

'BestEffort', as the name implies, does nothing special: processes get time if time is available.
'Guaranteed' is the exact opposite, where a process has time reserved specifically for it.
Burstable is a hybrid of both, where two thresholds are defined:  A lower, guaranteed level
(i.e. "requests") and a "burstable" upper bound (i.e., "limits") where the difference between
the two is best effort.

In using 'requests' level to set the --num_cpu, we create a limitation: It becomes impossible
to leverage any best-effort cycles that might be available on our system.  Consider the following:

```
  tserver:
    requests:
      cpu: 8
    limit:
      cpu: 16
```

In this configuration, Kubernetes would assign the Burstable QOS to the tserver pod and reserve
8 cores, allowing it to burst up to 16.  However, since the gflags would be set to --num_cpus=8,
the tserver will not try to use more than 8, leaving potential headroom on the table.

The limitation makes things more challenging because the user is left to find the exact number
of available millicores that can be dedicated to YB.  While ample dedicated resources are a
good idea for a production deployment, there are a few scenarios where this is undesirable:

1.  Development systems
2.  Reducing "bin-packing" complexity in production.

## Scenario 1 - Development

During development, the user likely doesn't need guaranteed performance but would like to have
YB use as much as is free at a given moment.  For an 8-core development cluster, the user may
want to set requests to 1m, limit to 8, and let it compete for CPU time, depending on what is
happening.

## Scenario 2 - "Bin packing" in production

Say a user has 3 16-core nodes dedicated to YB via taints/tolerations.  However, each node has
a few system daemons (CSI, metric scrapers, etc.) that take up 765m CPU, leaving 15235m available.

Today, the user would have to figure out that number above and set YB to request 15235m.  If
something changes, the user might leave headroom on the table, or Kubernetes may fail to
schedule the pod depending on the value shifts.  Instead, we would like to be able to configure
it simply with something like

```
  tserver:
    requests:
      cpu: 15
    limit:
      cpu: 16
```

This sets the pod to Burstable with 15 guaranteed cores, but with the option to burst to the
full 16 YB is busy but other services are not.

So, with this proposal, users can still achieve the previous result by simply setting
requests = limits, and effectively getting Guaranteed QOS for the full --num_cores, while
allowing users to leverage Burstable for other scenarios where it has advantages.

Signed-off-by: Greg Haskins <greg@manetu.com>
@CLAassistant
Copy link

CLAassistant commented Mar 14, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants