Closed
Description
As a new group is created (e.g. on resharding when a ruler starts or stops) it registers metrics with Prometheus, but there is no code to unregister them when the group stops being used (e.g. on another resharding). Over time this will build up a substantial number of useless metrics.
This makes it hard to observe how well the ruler is keeping up, since time()-cortex_prometheus_rule_group_last_evaluation_timestamp_seconds
is ever-increasing for the left-behind metrics.
The metric registration is done in Prometheus code; ruler calls Update()
with a list of files, but nobody is checking which files have disappeared since last update.