admission: per tenant WorkQueue latency metrics #134987
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-admission-control
Admission Control
CockroachDB has some support for per-tenant metrics. In multi-tenant environments like CockroachDB standard/basic, a tenant (or cluster operators) should be able to see queueing delay in AC WorkQueues on a per-tenant basis (and the aggregate across all tenants). Currently WorkQueue metrics are only segmented by priority.
This will possibly need additional observability infrastructure since a long running kv server can cycle through thousands of tenants, and we should not keep exporting expensive histograms for tenants that are not active on a kv server. The typical solution to this problem in multi-tenant systems is to export delta metrics instead of cumulative metrics, where when the delta is zero for a timeseries, nothing is exported. So the number of timeseries becomes proportional to the number of active tenants in a server.
The same approach can then be applied to replication AC queuing latency metrics.
Jira issue: CRDB-44325
The text was updated successfully, but these errors were encountered: