-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Is your feature request related to a problem? Please describe.
Balancing CPU usage of a store's replicas rather than QPS has been shown to provide improvement cluster performance.
This issue is to add CPU based balancing in the allocator, as a replacement for QPS.
Describe the solution you'd like
Using the sum of replica's CPU on a store, which is added in #92858, instrument cpu balancing using the same policy structure as QPS.
- Add an additional
kv.allocator.load_based_rebalancing_dimensionthat supports CPU. - Add an additional CPU field to
StoreCapacitythat is used when comparing a store to the cluster for balance. - Instrument the thresholds, minimums and logic to use CPU rather than QPS when selected.
- Update the storepool
UpdateLocalStoreAfterXXXto include CPU.
Describe alternatives you've considered
One alternative explored was to use the runtime CPU rather than balancing the sum of replica CPU. This is closer to the value we actually care about. However, it is not as "closed" of an objective and requires assumptions regarding the impact of actions taken since we are not able to fully attribute the runtime CPU to replicas.
Additional considerations
One additional consideration is mixed version clusters. Some stores on the prior version will not be populating their new CPU field in store capacity. This change should be version gated to only activate on v23.1 or later.
Jira issue: CRDB-23493