Skip to content

feat(dispatch): add alert markers per group#5047

Open
siavashs wants to merge 4 commits intoprometheus:mainfrom
siavashs:feat/group_markers
Open

feat(dispatch): add alert markers per group#5047
siavashs wants to merge 4 commits intoprometheus:mainfrom
siavashs:feat/group_markers

Conversation

@siavashs
Copy link
Contributor

@siavashs siavashs commented Feb 25, 2026

This change adds alert markers to the aggregation groups in dispatcher.
Alert markers replace the global marker and are used to track the state of alerts in each aggregation group.

This change touches many components of the alertmanager.
Per Group alert markers are passed to the notifiers and then inhibitor and silencer using context.

The API has no breaking changes:

  • /alerts uses a temporary marker to track the state of alerts
  • /alerts/groups returns the status from group markers

The metrics are also updated to use group markers.
The alertmanager_alerts metric is moved to dispatcher.
The alertmanager_marked_alerts metric is removed.
By default it behaves the same as before, by aggregating all alerts in the groups.
Enabling group-key-in-metrics flag will cause the metrics to be grouped by group_key.

Pull Request Checklist

Please check all the applicable boxes.

Which user-facing changes does this PR introduce?

[FEATURE] Introduce per aggregation group AlertMarkers and drop Global Alert Marker
[CHANGE] Add `group-key-in-metrics` feature flag
[CHANGE] Remove `alertmanager_marked_alerts`
[CHANGE] Remove the following from `types` package: `MemMarker`, `AlertState*`, `AlertStatus`
[CHANGE] Move `AlertMarker`, `GroupMarker` to `marker` package

This change adds alert markers to the aggregation groups in dispatcher.
Alert markers replace the global marker and are used to track
the state of alerts in each aggregation group.

This change touches many components of the alertmanager.
Per Group alert markers are passed to the notifiers and then inhibitor
and silencer using context.

The API has no breaking changes:
- /alerts uses a temporary marker to track the state of alerts
- /alerts/groups returns the group markers

The metrics are also updated to use group markers.
The `alertmanager_alerts` metric is moved to dispatcher.
The `alertmanager_marked_alerts` metric is removed.
By default it behaves the same as before, by aggregating
all alerts in the groups.
Enabling `group-key-in-metrics` flag will cause the metrics
to be grouped by `group_key`.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
@siavashs siavashs changed the title feat(dispatch): add group markers feat(dispatch): add alert markers per group Feb 25, 2026
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Comment on lines +380 to +383
alertGroup.AlertStatuses = make(map[model.Fingerprint]alert.AlertStatus, len(filteredAlerts))
for _, a := range filteredAlerts {
alertGroup.AlertStatuses[a.Fingerprint()] = ag.marker.Status(a.Fingerprint())
}
Copy link
Contributor Author

@siavashs siavashs Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this analysis https://app.devin.ai/review/prometheus/alertmanager/pull/5047
Basically Groups() now uses the per-group alert marker to report status to /alerts/groups api call.
But /alerts uses a tempMarker to replicate a global marker for backwards compatibility.
This results in the 2 API endpoint to not return exactly the same results:

  • /alerts will keep "predicting" the future (like it did before), to say if an alert will be muted if dispatched now (we want to remove statuses from here in API v3 probably)
  • /alerts/groups shows the current true status without using "prediction", this is different than before I think

We can use a tempMarker for /alerts/groups to keep predicting the future.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrated from Global Marker to per Aggregation Group markers

2 participants