Skip to content

Ruler never unregisters group metrics #2033

Closed
@bboreham

Description

@bboreham

As a new group is created (e.g. on resharding when a ruler starts or stops) it registers metrics with Prometheus, but there is no code to unregister them when the group stops being used (e.g. on another resharding). Over time this will build up a substantial number of useless metrics.

This makes it hard to observe how well the ruler is keeping up, since time()-cortex_prometheus_rule_group_last_evaluation_timestamp_seconds is ever-increasing for the left-behind metrics.

The metric registration is done in Prometheus code; ruler calls Update() with a list of files, but nobody is checking which files have disappeared since last update.

Metadata

Metadata

Assignees

No one assigned

    Labels

    component/rulesBits & bobs todo with rules and alerts: the ruler, config service etc.staletype/observabilityTo help know what is going on inside Cortex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions