Ruler does not consistently restore `for` state

**Description**

Currently Prometheus rule manager only restores `for` state of rule groups after restarts. This is fine for Prometheus. However, in Cortex, rule groups can jump from one ruler instance (r1) to another (r2) due to resharding. If r2 happens to be evaluating rule groups for that tenant already, then the manager will not restore the `for` state and will result in alerts going into an incorrect state. For example, an alert can go from `FIRING` to `PENDING`

**To Reproduce**

1. Create rules for a tenant with shard size > 1. For ease of testing, all the ruler instances were running rules for the tenant
2. Wait for alerting rule to go into `FIRING`
3. Restart the instance that was evaluating the alerting rule. Here the assumption is the ruler takes a bit to restart giving another ruler a chance to evaluate the alerting rule at least once
4. The alerting rule will go to `PENDING`

**Expected behavior**

- The alert rule should stay in `FIRING` state

**Additional Context**

There is a [PR](https://github.com/prometheus/prometheus/pull/15669) open for Prometheus to address this issue. Without the PR approved, it is difficult to fix this issue 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ruler does not consistently restore `for` state #6465

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ruler does not consistently restore for state #6465

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Ruler does not consistently restore `for` state #6465