Skip to content

feat(dispatch): add alert markers per group#5047

Merged
siavashs merged 14 commits into
prometheus:mainfrom
siavashs:feat/group_markers
May 19, 2026
Merged

feat(dispatch): add alert markers per group#5047
siavashs merged 14 commits into
prometheus:mainfrom
siavashs:feat/group_markers

Conversation

@siavashs
Copy link
Copy Markdown
Contributor

@siavashs siavashs commented Feb 25, 2026

This change adds alert markers to the aggregation groups in dispatcher.
Alert markers replace the global marker and are used to track the state of alerts in each aggregation group.

This change touches many components of the alertmanager.
Per Group alert markers are passed to the notifiers and then inhibitor and silencer using context.

The API has no breaking changes:

  • /alerts uses a temporary marker to track the state of alerts
  • /alerts/groups returns the status from group markers

The metrics are also updated to use group markers.
The alertmanager_alerts metric is moved to dispatcher.
The alertmanager_marked_alerts metric is removed.
By default it behaves the same as before, by aggregating all alerts in the groups.
Enabling group-key-in-metrics flag will cause the metrics to be grouped by group_key.

Pull Request Checklist

Please check all the applicable boxes.

Which user-facing changes does this PR introduce?

[FEATURE] Introduce per aggregation group AlertMarkers and drop Global Alert Marker
[CHANGE] Add `group-key-in-metrics` feature flag
[CHANGE] Remove `alertmanager_marked_alerts`
[CHANGE] Remove the following from `types` package: `MemMarker`, `AlertState*`, `AlertStatus`
[CHANGE] Move `AlertMarker`, `GroupMarker` to `marker` package

Summary by CodeRabbit

  • New Features

    • Feature flag to include group-key labeling in dispatcher metrics.
    • New dispatcher metrics and per-state alert counts for improved observability.
    • New runtime markers for per-alert and per-group state tracking, with context propagation.
  • Bug Fixes

    • Alert status now computed at query/time-of-use for more consistent active/suppressed/unprocessed results.
    • Improved silence and inhibition handling so mute/inhibit state is recorded and observed reliably.
  • Refactor

    • Configuration simplified: alert-status callback is no longer required.

Review Change Stack

@siavashs siavashs changed the title feat(dispatch): add group markers feat(dispatch): add alert markers per group Feb 25, 2026
Comment thread dispatch/dispatch.go
@TheMeier
Copy link
Copy Markdown
Contributor

TheMeier commented Mar 1, 2026

Reviewed the new marker package standalone. LGTM

Comment thread marker/util.go Outdated
Comment thread marker/group.go Outdated
Comment thread marker/alert.go
}

// SetInhibited implements AlertMarker.
func (m *alertMarker) SetInhibited(fp model.Fingerprint, inhibitedBy []string) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity the previous SetIhibited used
func (m *MemMarker) SetInhibited(alert model.Fingerprint, ids ...string) {
And we changed it to []string ... Is there a reason we changed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistency in the interface API for both SetInhibited and SetSilenced.

Comment thread types/types.go
delete(m.groups, routeID+groupKey)
}

func (m *MemMarker) registerMetrics(r prometheus.Registerer) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note in the release note that we removed this metric, as it would be too expensive to calculate it across all markers... ? Unless each time we change it a marker we were to add/remove the relevant number from the metric... But yeah, if not worth it/useful definitely we should say it's gone

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in the PR description:

[CHANGE] Remove `alertmanager_marked_alerts`

Do we need to also update any MD files? cc @SoloJacobs

Comment thread marker/marker.go
Comment thread marker/alert.go Outdated
Comment thread dispatch/metric.go Outdated
Comment thread notify/mute.go
} else {
filtered = append(filtered, a)
}
// TODO(fabxc): increment muted alerts counter if muted.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this TODO and the one above still relevant?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he wanted to add/change the metrics here.
I am not sure if we need it.
I can remove these but my changes don't touch these.

Comment thread marker/group.go Outdated
@siavashs siavashs force-pushed the feat/group_markers branch from 724c634 to 9a5c33e Compare March 11, 2026 10:54
@siavashs siavashs requested a review from ultrotter March 11, 2026 16:19
@siavashs
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Migrates from a global marker to per-aggregation-group markers, removes callback-based alert-status plumbing (Options.AlertStatusFunc), introduces a marker package with context propagation, switches code from types.Alert to alert.Alert, and extracts dispatcher metrics into a dedicated metrics implementation.

Changes

Per-group marker migration

Layer / File(s) Summary
Alert state helper
alert/state.go
Adds AlertState.Compare to order alert states (suppressed > active > unprocessed).
API options & runtime prediction
api/api.go, api/v2/api.go
Remove Options.AlertStatusFunc, compute alert status at query-time using request-local marker.AlertMarker injected into alertFilter and predictAlertStatus; update handlers to use *alert.Alert.
API compat & tests
api/v2/compat.go, api/v2/api_test.go
Convert OpenAPI conversions and tests from types.Alert/types.AlertStatus to alert.Alert/alert.AlertStatus.
Main wiring
cmd/alertmanager/main.go
Replace global marker plumbing with marker.NewGroupMarker(), remove marker constructor args from silencer/inhibitor/mem, and pass feature flags into dispatcher metrics initialization.
Marker package (new)
marker/marker.go, marker/context.go, marker/alert.go, marker/status.go, marker/group.go
Adds AlertMarker/GroupMarker interfaces, context helpers (WithContext/FromContext), NewAlertMarker and NewGroupMarker in-memory implementations, and internal alertStatus logic.
Dispatcher & aggregation groups
dispatch/dispatch.go, dispatch/dispatch_test.go
Dispatcher uses marker.GroupMarker; aggregation groups create per-group marker.NewAlertMarker(), store per-alert AlertStatuses, inject alert marker into notification context, and migrate alert types to alert.Alert.
Dispatcher metrics
dispatch/metric.go
Adds DispatcherMetrics with alertStateCollector, supports optional group_key labeling controlled by feature flag, and provides a no-op fallback when registerer is nil.
Feature flags
featurecontrol/featurecontrol.go
Adds group-key-in-metrics flag and EnableGroupKeyInMetrics() API to enable group_key labeling in metrics.
Notification pipeline & mute stages
notify/notify.go, notify/mute.go, notify/mute_test.go
Propagate *alert.Alert across pipeline, use marker.GroupMarker, update stage signatures, and ensure time stages clear group marker mute on early returns.
Silencer & Inhibitor
silence/silence.go, inhibit/inhibit.go, tests
Remove stored marker fields, defer marker updates by fetching AlertMarker from context, and change callbacks to operate on *alert.Alert. Tests updated to inject markers via context.
Provider mem
provider/mem/mem.go, provider/mem/mem_test.go
Remove marker storage/constructor parameter, stop deleting per-alert markers during GC, and remove alerts-by-state metrics; tests adjusted.
Types cleanup
types/types.go, types/types_test.go
Remove legacy marker-related aliases/interfaces and MemMarker implementation/tests from types package; retain deprecated alert aliases.
Marker tests
marker/alert_test.go, marker/group_test.go
Add unit tests validating AlertMarker and GroupMarker behavior and context propagation.

Sequence Diagram

sequenceDiagram
  participant Client
  participant APIv2
  participant Dispatcher
  participant AggrGroup
  participant markerGroup as GroupMarker
  participant alertMarker as AlertMarker
  participant Context

  Client->>APIv2: GET /api/v2/alerts
  APIv2->>Dispatcher: request groups + alertFilter(ctx)
  Dispatcher->>markerGroup: ensure group-level marker (marker.NewGroupMarker)
  Dispatcher->>AggrGroup: create aggrGroup (allocates alertMarker = NewAlertMarker())
  AggrGroup->>Context: ctx' = marker.WithContext(ctx, alertMarker)
  Dispatcher->>APIv2: return groups
  APIv2->>alertMarker: Status(fp) via marker.FromContext(ctx') for each alert
  alertMarker-->>APIv2: alert.AlertStatus
  APIv2-->>Client: OpenAPI response with per-alert statuses
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • prometheus/alertmanager#5179: Touches dispatcher aggregation-group creation and related dispatcher metrics adjustments; likely overlaps in metrics/dispatch creation logic.

Suggested reviewers

  • TheMeier
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(dispatch): add alert markers per group' clearly summarizes the main change: adding per-group alert markers to the dispatcher.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering the changes, API compatibility, metrics updates, and the linked issue. It follows the template requirements.
Linked Issues check ✅ Passed The PR successfully implements per-aggregation-group alert markers, replaces the global marker, maintains API compatibility, updates metrics with group-key-in-metrics flag, and reorganizes marker packages as required by issue #4953.
Out of Scope Changes check ✅ Passed All changes are within scope of the linked issue #4953: marker refactoring, package reorganization, dispatcher updates, and metrics changes. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
inhibit/inhibit_test.go (1)

47-52: ⚠️ Potential issue | 🟡 Minor

Keep the empty InhibitedBy assertion on the active path.

Adding the AlertStateActive check is good, but dropping require.Empty(t, status.InhibitedBy, ...) weakens this helper again. The non-muted branch should still prove both that the marker was written and that no inhibition IDs were recorded.

Suggested tweak
 	} else {
 		require.Equal(t, alert.AlertStateActive, status.State, msgAndArgs...)
+		require.Empty(t, status.InhibitedBy, msgAndArgs...)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@inhibit/inhibit_test.go` around lines 47 - 52, The non-muted branch dropped
the assertion that no inhibition IDs were recorded; when wantMuted is false you
should also assert status.InhibitedBy is empty. Modify the else branch that
checks require.Equal(t, alert.AlertStateActive, status.State, msgAndArgs...) to
additionally call require.Empty(t, status.InhibitedBy, msgAndArgs...) so the
helper verifies both the active state and that no inhibition IDs were written
(referencing wantMuted, status.State, status.InhibitedBy, and AlertStateActive).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v2/api.go`:
- Around line 434-442: The code creates a single request-scoped tempMarker via
marker.NewAlertMarker() and passes it into api.alertFilter, which lets
setAlertStatus mutate that shared tempMarker causing status races across groups;
instead, change the logic so /alerts/groups does not reuse a single tempMarker
for the entire request: compute or fetch each group's actual marker/status
(e.g., call the group's marker lookup or instantiate a fresh marker per group)
and pass that per-group marker into setAlertStatus/api.alertFilter rather than
the global tempMarker, ensuring you stop using tempMarker.Status(fp) as a
request-wide cache; update the code paths around api.alertFilter,
setAlertStatus, and any callers that currently rely on the shared tempMarker
(also adjust the same pattern at the other noted spot around lines 469-470).

---

Duplicate comments:
In `@inhibit/inhibit_test.go`:
- Around line 47-52: The non-muted branch dropped the assertion that no
inhibition IDs were recorded; when wantMuted is false you should also assert
status.InhibitedBy is empty. Modify the else branch that checks require.Equal(t,
alert.AlertStateActive, status.State, msgAndArgs...) to additionally call
require.Empty(t, status.InhibitedBy, msgAndArgs...) so the helper verifies both
the active state and that no inhibition IDs were written (referencing wantMuted,
status.State, status.InhibitedBy, and AlertStateActive).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7fbf439a-752e-4283-977c-8c9f86a7d849

📥 Commits

Reviewing files that changed from the base of the PR and between 512409b and a28f0b4.

📒 Files selected for processing (2)
  • api/v2/api.go
  • inhibit/inhibit_test.go

Comment thread api/v2/api.go Outdated
Comment thread dispatch/metric.go Outdated
Comment thread dispatch/metric.go
c.emit(ch, float64(suppressed), labelValues...)
labelValues[0] = string(alert.AlertStateUnprocessed)
c.emit(ch, float64(unprocessed), labelValues...)
return true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here do we want to exclude groups that have 0 in each state? or not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not, because these are gauge metrics and can drop to zero, so I assume having zero values in between changes is desired compared to missing metrics.
What do you think?

Comment thread api/v2/api.go Outdated
// Set alert's current status based on its label set.
api.setAlertStatus(ctx, a.Labels)
// The inhibitor and silencer write to m via the context.
ctx = marker.WithContext(ctx, m)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to create this context outside the function itself? This way we would avoid creating a new context each time the returned func is called

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need to keep it as is since each call to the returned function is a new operation that we want to trace as a child span.
This specific call returns a derived context which is pointing tot he parent.

Comment thread marker/group.go
status = &groupStatus{}
m.groups[key] = status
}
status.mutedBy = timeIntervalNames
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If timeIntervalNames is nil or empty, shoudl we actually delete from the map, here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but this is just moving existing logic and is not new code.

Comment thread silence/silence.go
"pending", len(allIDs)-len(activeIDs),
)
// TODO: remove this sort once the marker is removed.
sort.Strings(activeIDs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we shouldn't still sort for the API to be consistent and the results comparable? not necessarily here, but somewhere...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but does it matter if the IDs are in a specific order or do we guarantee this?

Comment thread marker/group.go
status = &groupStatus{}
m.groups[key] = status
}
status.mutedBy = timeIntervalNames
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places you clone the slice, but not here... Any particular reason?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moving existing logic and is not new code.
The PR is not focused on Group Markers used by time intervals so I would leave any changes related to those for later.

@siavashs siavashs force-pushed the feat/group_markers branch from a28f0b4 to db964e5 Compare April 20, 2026 18:50
siavashs added 12 commits May 18, 2026 14:18
This change adds alert markers to the aggregation groups in dispatcher.
Alert markers replace the global marker and are used to track
the state of alerts in each aggregation group.

This change touches many components of the alertmanager.
Per Group alert markers are passed to the notifiers and then inhibitor
and silencer using context.

The API has no breaking changes:
- /alerts uses a temporary marker to track the state of alerts
- /alerts/groups returns the group markers

Update metrics to use group markers.
The `alertmanager_alerts` metric is moved to dispatcher.
The `alertmanager_marked_alerts` metric is removed.
By default it behaves the same as before, by aggregating
all alerts in the groups.
Enabling `group-key-in-metrics` flag will cause the metrics
to be grouped by `group_key`.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
@siavashs siavashs force-pushed the feat/group_markers branch from db964e5 to c146730 Compare May 18, 2026 12:19
@SoloJacobs SoloJacobs removed their request for review May 18, 2026 15:21
@siavashs siavashs requested review from ultrotter and removed request for grobinson-grafana May 18, 2026 15:25
Compute each alert's status with a fresh AlertMarker at query time
instead of reading the group's actual marker or sharing a single
request-scoped marker across all groups.

This addresses two issues:

1. A shared tempMarker across groups let setAlertStatus writes from
   one group overwrite another, since the same fingerprint can appear
   in multiple aggregation groups.

2. Reading the group's actual marker (alertGroup.AlertStatuses) makes
   the response depend on whether the notification pipeline has run
   between calls. The endpoint should return the same output when no
   new alerts, silences, or inhibits have been added.

The filter callback also uses a fresh per-alert marker when none is
provided, so prediction for one alert does not leak into another.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
@siavashs
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
marker/marker.go (1)

26-33: ⚡ Quick win

Define slice ownership/mutation semantics in the interface contract.

[]string is accepted and returned on goroutine-safe interfaces, but ownership rules are not specified. Please document whether implementations copy inputs and whether returned slices are immutable/copies, so callers cannot accidentally introduce aliasing races.

💡 Suggested doc update
 type AlertMarker interface {
 	// SetInhibited sets the inhibitedBy for the given fingerprint.
-	// If inhibitedBy is empty, it clears the inhibitedBy.
+	// Implementations must not retain caller-owned slice backing arrays.
+	// If inhibitedBy is empty, it clears the inhibitedBy.
 	SetInhibited(fingerprint model.Fingerprint, inhibitedBy []string)

 	// SetSilenced sets the silencedBy for the given fingerprint.
-	// If silencedBy is empty, it clears the silencedBy.
+	// Implementations must not retain caller-owned slice backing arrays.
+	// If silencedBy is empty, it clears the silencedBy.
 	SetSilenced(fingerprint model.Fingerprint, silencedBy []string)
@@
 type GroupMarker interface {
 	// Muted returns true if the group is muted, otherwise false. If the group
 	// is muted then it also returns the names of the time intervals that muted
-	// it.
+	// it. Callers must treat the returned slice as immutable.
 	Muted(routeID, groupKey string) ([]string, bool)

Also applies to: 49-57

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@marker/marker.go` around lines 26 - 33, Update the interface documentation to
define slice ownership/mutation semantics: for SetInhibited and SetSilenced (and
any other methods in the same interface around lines 49-57), state explicitly
that implementations MUST copy the input []string before storing it (so callers
retain ownership) and that any method returning []string MUST return a newly
allocated copy (so callers may mutate without causing aliasing/races); mention
this contract in the comments for SetInhibited and SetSilenced and in the
corresponding getter methods to make the goroutine-safety guarantee explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dispatch/metric.go`:
- Around line 40-45: newNoopDispatcherMetrics currently constructs a
DispatcherMetrics with several Prometheus metrics but omits initializing
aggrGroupCreationRetries and aggrGroupCreationGivenUp, which can cause
nil-interface panics when .Inc() is later called; update
newNoopDispatcherMetrics to initialize those two fields
(aggrGroupCreationRetries and aggrGroupCreationGivenUp) with non-nil
prometheus.Counter values (e.g., prometheus.NewCounter with empty CounterOpts,
matching how aggrGroupLimitReached is created) so all metric fields on
DispatcherMetrics are always non-nil.

---

Nitpick comments:
In `@marker/marker.go`:
- Around line 26-33: Update the interface documentation to define slice
ownership/mutation semantics: for SetInhibited and SetSilenced (and any other
methods in the same interface around lines 49-57), state explicitly that
implementations MUST copy the input []string before storing it (so callers
retain ownership) and that any method returning []string MUST return a newly
allocated copy (so callers may mutate without causing aliasing/races); mention
this contract in the comments for SetInhibited and SetSilenced and in the
corresponding getter methods to make the goroutine-safety guarantee explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a1e23e96-57fb-4c00-b1f1-72beead7e651

📥 Commits

Reviewing files that changed from the base of the PR and between a28f0b4 and 63c3d21.

📒 Files selected for processing (31)
  • alert/state.go
  • api/api.go
  • api/v2/api.go
  • api/v2/api_test.go
  • api/v2/compat.go
  • cmd/alertmanager/main.go
  • dispatch/dispatch.go
  • dispatch/dispatch_bench_test.go
  • dispatch/dispatch_test.go
  • dispatch/metric.go
  • featurecontrol/featurecontrol.go
  • inhibit/inhibit.go
  • inhibit/inhibit_bench_test.go
  • inhibit/inhibit_test.go
  • marker/alert.go
  • marker/alert_test.go
  • marker/context.go
  • marker/group.go
  • marker/group_test.go
  • marker/marker.go
  • marker/status.go
  • notify/mute.go
  • notify/mute_test.go
  • notify/notify.go
  • provider/mem/mem.go
  • provider/mem/mem_test.go
  • silence/silence.go
  • silence/silence_bench_test.go
  • silence/silence_test.go
  • types/types.go
  • types/types_test.go
💤 Files with no reviewable changes (4)
  • api/api.go
  • types/types_test.go
  • types/types.go
  • provider/mem/mem.go
🚧 Files skipped from review as they are similar to previous changes (23)
  • marker/group.go
  • marker/group_test.go
  • alert/state.go
  • inhibit/inhibit_bench_test.go
  • featurecontrol/featurecontrol.go
  • marker/context.go
  • dispatch/dispatch_bench_test.go
  • api/v2/compat.go
  • cmd/alertmanager/main.go
  • silence/silence_test.go
  • inhibit/inhibit.go
  • marker/alert_test.go
  • notify/mute.go
  • silence/silence.go
  • api/v2/api_test.go
  • marker/alert.go
  • notify/mute_test.go
  • silence/silence_bench_test.go
  • inhibit/inhibit_test.go
  • notify/notify.go
  • api/v2/api.go
  • dispatch/dispatch.go
  • dispatch/dispatch_test.go

Comment thread dispatch/metric.go
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
@siavashs siavashs merged commit 9aae092 into prometheus:main May 19, 2026
7 checks passed
@siavashs siavashs deleted the feat/group_markers branch May 19, 2026 18:29
ashleyprimo pushed a commit to ashleyprimo/alertmanager that referenced this pull request May 20, 2026
Replace the global alert marker with per-aggregation-group alert markers
in the dispatcher. Per-group markers track alert state in each group and
are passed to the notifier, inhibitor, and silencer through context.

A new marker package centralises AlertMarker and GroupMarker, replacing
the marker types previously living in the types package (MemMarker,
AlertState*, AlertStatus, AlertMarker, GroupMarker). The silencer and
inhibitor no longer hold a marker reference; they read it from the
context and defer writes until the call returns.

The API has no breaking changes:
- /alerts predicts each alert's status at query time using a temporary
  marker.
- /alerts/groups predicts each alert's status per (alert, group) with a
  fresh marker, so the response is stable across calls regardless of
  whether the notification pipeline has run in between (assuming no new
  alerts, silences, or inhibits) and prediction for one alert does not
  leak into another.

Metrics are updated to use the group markers:
- `alertmanager_alerts` moves to the dispatcher and is emitted by a
  custom collector that walks the per-group markers.
- `alertmanager_marked_alerts` is removed (it would be too expensive to
  compute across all per-group markers).
- A new `group-key-in-metrics` feature flag adds a `group_key` label to
  `alertmanager_alerts`; by default the metric still aggregates across
  all groups.
- A no-op DispatcherMetrics is used when no registerer is provided so
  the dispatcher never dereferences nil counters.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrated from Global Marker to per Aggregation Group markers

5 participants