Skip to content

DOC-11499 Docs for obs: Support SQL metrics collected by Application and DB #19619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions src/current/_data/v25.2/metrics/multi-dimensional-metrics.yml
Original file line number Diff line number Diff line change
Expand Up @@ -252,3 +252,93 @@

- multi_dimensional_metric_id: sql.exec.latency.detail
feature: detailed-latency

- multi_dimensional_metric_id: sql.crud_query.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.crud_query.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.delete.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.delete.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.distsql.contended_queries.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.distsql.cumulative_contention_nanos
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.failure.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.failure.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.full.scan.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.full.scan.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.insert.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.insert.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.select.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.select.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.service.latency
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.service.latency.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.statements.active
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.statements.active.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.begin.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.begin.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.commit.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.commit.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.latency
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.latency.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.rollback.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txn.rollback.count.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txns.open
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.txns.open.internal
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.update.count
feature: database-and-application_name-labels

- multi_dimensional_metric_id: sql.update.count.internal
feature: database-and-application_name-labels
10 changes: 10 additions & 0 deletions src/current/v25.2/cockroachdb-feature-availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,16 @@ Any feature made available in a phase prior to GA is provided without any warran
**The following features are in preview** and are subject to change. To share feedback and/or issues, contact [Support](https://support.cockroachlabs.com/hc).
{{site.data.alerts.end}}


### `database` and `application_name` labels for certain metrics

The following cluster settings enable the [`database` and `application_name` labels for certain metrics]({% link {{ page.version.version }}/multi-dimensional-metrics.md %}#enable-database-and-application_name-labels), along with their internal counterparts if they exist:

- `sql.metrics.database_name.enabled`
- `sql.metrics.application_name.enabled`

By default, these cluster settings are disabled.

### Vector indexes

A [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) enables efficient approximate nearest neighbor (ANN) search on high-dimensional [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) columns. Use vector indexes to improve the performance of similarity searches over large datasets, such as embeddings generated by machine learning models.
Expand Down
181 changes: 176 additions & 5 deletions src/current/v25.2/multi-dimensional-metrics.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Multi-dimensional Metrics
summary: Learn about high-cardinality multi-dimensional metrics enabled by the child metrics and detailed latency cluster settings
summary: Learn about high-cardinality multi-dimensional metrics enabled by the applicable cluster settings.
toc: true
docs_area: reference.metrics
---
Expand All @@ -10,13 +10,15 @@ Multi-dimensional metrics are additional [Prometheus]({% link {{ page.version.ve
The export of multi-dimensional metrics can be enabled by two [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}):

- [`server.child_metrics.enabled`](#enable-child-metrics)
- [`sql.metrics.database_name.enabled`](#enable-database-and-application_name-labels)
- [`sql.metrics.application_name.enabled`](#enable-database-and-application_name-labels)
- [`sql.stats.detailed_latency_metrics.enabled`](#enable-detailed-latency-metrics)

## Enable child metrics

Child metrics are specific, detailed metrics that are usually related to a higher-level (parent or aggregate) metric. They often provide more granular or specific information about a particular aspect of the parent metric. The parent metrics and their potential child metrics are determined by the specific feature the cluster is using.

The [cluster setting `server.child_metrics.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-server-child-metrics-enabled) is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement:
The [cluster setting `server.child_metrics.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-server-child-metrics-enabled) is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement.

{% include_cached copy-clipboard.html %}
~~~ sql
Expand Down Expand Up @@ -433,12 +435,181 @@ changefeed_flush_hist_nanos_sum{node_id="1",scope="office_dogs"} 9.79696709e+08
changefeed_flush_hist_nanos_count{node_id="1",scope="office_dogs"} 2
~~~

## Enable `database` and `application_name` labels

{{site.data.alerts.callout_info}}
{% include feature-phases/preview.md %}
{{site.data.alerts.end}}

The following cluster settings enable the `database` and `application_name` labels for certain metrics, along with their internal counterparts if they exist:

- [`sql.metrics.database_name.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-metrics-database-name-enabled)
- [`sql.metrics.application_name.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-metrics-application-name-enabled)

By default, these cluster settings are disabled. To enable them, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement. Because these labels use aggregate metrics, you must enable the [`server.child_metrics.enabled`](#enable-child-metrics) cluster setting to use them.

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING server.child_metrics.enabled = true;
~~~

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
~~~

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
~~~

Toggling the `sql.metrics.database_name.enabled` and `sql.metrics.application_name.enabled` cluster settings clears existing metric values for current label combinations and reinitializes the affected metrics to reflect the new label configuration.

When toggling the `sql.metrics.database_name.enabled` and `sql.metrics.application_name.enabled` cluster settings, only the values for existing metric label combinations will be cleared. Aggregated metric values for the affected metrics will not be cleared.

{{site.data.alerts.callout_info}}
Child metrics (metrics with the `database` and `application_name` labels) are independent from the parent (aggregated metric). The child metrics are initialized when the cluster settings are enabled.

For this reason, child `COUNTER` metrics may not always add up to the parent `COUNTER` metric. For an example, refer to [Examples 1 through 6](#1-all-cluster-settings-disabled).

For `GAUGE` metrics, values may be different and potentially unexpected depending on when a setting is enabled. For an example, refer to [7. `GAUGE` metric example](#7-gauge-metric-example).
{{site.data.alerts.end}}

These labels affect only the metrics emitted via [Prometheus export]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint). They are not visible in the [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards).

The system retains up to 5,000 recently used label combinations.

{% assign feature = "database-and-application_name-labels" %}
{% include {{ page.version.version }}/multi-dimensional-metrics-table.md %}

### Examples

This section demonstrates the impact of enabling and disabling the relevant cluster settings.

[Examples 1 through 6](#1-all-cluster-settings-disabled) show the effect on the `COUNTER` metric `sql.select.count`. During these examples, the [`movr` workload]({% link {{ page.version.version }}/cockroach-workload.md %}#run-the-movr-workload) was running on node 1. The aggregated metric consistently increases as the examples progress.

[Example 7](#7-gauge-metric-example) shows a possible effect on the `GAUGE` metric `sql.txn.open`.

#### 1. All cluster settings disabled

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING server.child_metrics.enabled = false;
SET CLUSTER SETTING sql.metrics.database_name.enabled = false;
SET CLUSTER SETTING sql.metrics.application_name.enabled = false;
~~~

The Prometheus export only gives the aggregated metric for the node.

~~~
# HELP sql_select_count Number of SQL SELECT statements successfully executed
# TYPE sql_select_count counter
sql_select_count{node_id="1"} 2030
~~~

#### 2. Only child metrics enabled

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING server.child_metrics.enabled = true;
SET CLUSTER SETTING sql.metrics.database_name.enabled = false;
SET CLUSTER SETTING sql.metrics.application_name.enabled = false;
~~~

The Prometheus export still only gives the aggregated metric for the node.

~~~
sql_select_count{node_id="1"} 6568
~~~

#### 3. Child metrics and `database_name` label enabled

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING server.child_metrics.enabled = true;
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
SET CLUSTER SETTING sql.metrics.application_name.enabled = false;
~~~

The aggregated metric and a child metric with only the `database` label are emitted.

~~~
sql_select_count{node_id="1"} 10259
sql_select_count{node_id="1",database="movr"} 816
~~~

#### 4. Child metrics and `application_name` label enabled

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING server.child_metrics.enabled = true;
SET CLUSTER SETTING sql.metrics.database_name.enabled = false;
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
~~~

The aggregated metric and a child metric with only the `application_name` label are emitted. Note that even though the aggregated metric has increased, the child metric with `application_name` label has a value less than the child metric with `database` label in the preceding example. This is because the labeled metrics have been reset, while the aggregated metric was not reset.

~~~
sql_select_count{node_id="1"} 14077
sql_select_count{node_id="1",application_name="movr"} 718
~~~

#### 5. Child metrics and both `database` and `application_name` label enabled

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING server.child_metrics.enabled = true;
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
~~~

The aggregated metric and a child metric with both `database` and `application_name` labels are emitted.

~~~
sql_select_count{node_id="1"} 21085
sql_select_count{node_id="1",database="movr",application_name="movr"} 3962
~~~

#### 6. Aggregate metric disabled

The [cluster setting `server.child_metrics.include_aggregate.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-server-child-metrics-include-aggregate-enabled) (default: `true`) reports an aggregate time series for applicable multi-dimensional metrics. When set to `false`, it stops reporting the aggregate time series, preventing double counting when querying those metrics.

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING server.child_metrics.enabled = true;
SET CLUSTER SETTING sql.metrics.database_name.enabled = true;
SET CLUSTER SETTING sql.metrics.application_name.enabled = true;
SET CLUSTER SETTING server.child_metrics.include_aggregate.enabled = false;
~~~

No aggregated metric emitted.
Only the child metric with both the `database` and `application_name` labels is emitted.

~~~
sql_select_count{node_id="1",database="movr",application_name="movr"} 8703
~~~

#### 7. `GAUGE` metric example

Changes to cluster settings may take time to reinitialize affected metrics. As a result, some `GAUGE` metrics might briefly show unexpected values.

This example uses the `GAUGE` metric `sql.txn.open`. `GAUGE` values for both aggregated and child metrics increase and decrease as transactions are opened and closed, respectively.

Consider the following scenario:

Time | Action | `sql.txn.open` aggregated metric | `sql.txn.open` child metric
:---:|--------|:--------------------------------:|:---------------------------:
1 | Open a transaction. The value of the `sql.txn.open` aggregated metric is incremented. | 1 | -
2 | Enable `sql.metrics.database_name.enabled` and `sql.metrics.application_name.enabled`.<br>Re-initialize child metrics. | 1 | 0
3 | Close a transaction.<br>The values of both the aggregated and child `sql.txn.open` metrics are decremented. | 0 | -1

To avoid negative values in child metrics, use the [Prometheus `clamp_min` function](https://prometheus.io/docs/prometheus/latest/querying/functions/#clamp_min) to set the metric to zero.

## Enable detailed latency metrics

The [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) `sql.stats.detailed_latency_metrics.enabled`
labels the latency metric `sql.exec.latency.detail` with the [statement fingerprint]({% link {{ page.version.version }}/ui-statements-page.md %}#sql-statement-fingerprints). To estimate the cardinality of the set of all statement fingerprints, use the `sql.query.unique.count` metric. For most workloads, this metric ranges from dozens to hundreds.
The [cluster setting `sql.stats.detailed_latency_metrics.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-stats-detailed-latency-metrics-enabled) labels the latency metric `sql.exec.latency.detail` with the [statement fingerprint]({% link {{ page.version.version }}/ui-statements-page.md %}#sql-statement-fingerprints). To estimate the cardinality of the set of all statement fingerprints, use the `sql.query.unique.count` metric. For most workloads, this metric ranges from dozens to hundreds.

`sql.stats.detailed_latency_metrics.enabled` is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement:
`sql.stats.detailed_latency_metrics.enabled` is disabled by default. To enable it, use the [`SET CLUSTER SETTING`]({% link {{ page.version.version }}/set-cluster-setting.md %}) statement.

{% include_cached copy-clipboard.html %}
~~~ sql
Expand Down
Loading