Description
Is your feature request related to a problem? Please describe.
The ruler
exports some high-cardinality metrics on its /metrics
endpoint. The high-cardinality comes from a rule_group
label on these metrics:
cortex_prometheus_rule_evaluations_total
cortex_prometheus_rule_evaluation_failures_total
cortex_prometheus_rule_group_interval_seconds
cortex_prometheus_rule_group_iterations_missed_total
cortex_prometheus_rule_group_iterations_total
cortex_prometheus_rule_group_last_duration_seconds
cortex_prometheus_rule_group_last_evaluation_timestamp_seconds
cortex_prometheus_rule_group_rules
cortex_prometheus_last_evaluation_samples
We monitor cortex via a single-instance Prometheus, and alert on metrics like cortex_prometheus_rule_evaluation_failures_total
. Unfortunately these are the highest-cardinality metrics we collect.
Describe the solution you'd like
Introduce a new option in ruler_config
:
ruler:
# Disable the rule_group label on exported metrics.
# CLI flag: -ruler.disable-rule-group-label
[disable_rule_group_label: <boolean> | default = false]
Describe alternatives you've considered
We initially considered introducing new metrics without the rule_group
label, and then we could drop the existing high-cardinality metrics at scrape time. @bboreham suggested introducing a config option instead. Although the config option approach adds to cortex's config surface area, I think it's a better approach, as consumers of ruler /metrics
won't have to modify scrape configs to avoid high-cardinality metrics.