Skip to content

Introduce ruler option to disable rule_group labels on metrics. #4566

Closed
@siggy

Description

@siggy

Is your feature request related to a problem? Please describe.

The ruler exports some high-cardinality metrics on its /metrics endpoint. The high-cardinality comes from a rule_group label on these metrics:

  • cortex_prometheus_rule_evaluations_total
  • cortex_prometheus_rule_evaluation_failures_total
  • cortex_prometheus_rule_group_interval_seconds
  • cortex_prometheus_rule_group_iterations_missed_total
  • cortex_prometheus_rule_group_iterations_total
  • cortex_prometheus_rule_group_last_duration_seconds
  • cortex_prometheus_rule_group_last_evaluation_timestamp_seconds
  • cortex_prometheus_rule_group_rules
  • cortex_prometheus_last_evaluation_samples

We monitor cortex via a single-instance Prometheus, and alert on metrics like cortex_prometheus_rule_evaluation_failures_total. Unfortunately these are the highest-cardinality metrics we collect.

Describe the solution you'd like

Introduce a new option in ruler_config:

ruler:
  # Disable the rule_group label on exported metrics.
  # CLI flag: -ruler.disable-rule-group-label
  [disable_rule_group_label: <boolean> | default = false]

Describe alternatives you've considered

We initially considered introducing new metrics without the rule_group label, and then we could drop the existing high-cardinality metrics at scrape time. @bboreham suggested introducing a config option instead. Although the config option approach adds to cortex's config surface area, I think it's a better approach, as consumers of ruler /metrics won't have to modify scrape configs to avoid high-cardinality metrics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions