Aggregation Groups result in too many go routines

Aggregation groups run in dedicated go routines, this means that alertmanager can create thousands of routines, depending on the number of unique alerts it receives and grouping config: https://github.com/prometheus/alertmanager/blob/f6b942cf9b3a503d59192eada300d2ad97cba82f/dispatch/dispatch.go#L350

These routines are usually waiting in a `select` statement for the next ticker which seems very wasteful: https://github.com/prometheus/alertmanager/blob/f6b942cf9b3a503d59192eada300d2ad97cba82f/dispatch/dispatch.go#L440-L442

Some graphs from our deployment:

<img width="911" height="272" alt="Image" src="https://github.com/user-attachments/assets/f65a63d8-cb3c-41e9-a80e-9525ca90ad59" />

<img width="911" height="272" alt="Image" src="https://github.com/user-attachments/assets/92541c70-7eaf-4c7c-a67d-bc9c85b73c1f" />

A better approach could be creating one go routine per `receiver` instead as aggregation groups share receivers. This should reduce the number of go routines significantly.
Aggregation groups can depend on receiver based trigger to flush notifications.

---

Aggregation groups also support a global limit: https://github.com/prometheus/alertmanager/blob/f6b942cf9b3a503d59192eada300d2ad97cba82f/dispatch/dispatch.go#L333-L338
but this is currently not used(`nil` limiter): https://github.com/prometheus/alertmanager/blob/f6b942cf9b3a503d59192eada300d2ad97cba82f/cmd/alertmanager/main.go#L495

I think to protect alertmanager under load, we should implement a `per receiver limiter`.
If alerts sent to a receiver are not acted upon it can be assumed that alert manager can limit the number of aggregation groups created for that receiver.


	// If the group does not exist, create it. But check the limit first.
	if limit := d.limits.MaxNumberOfAggregationGroups(); limit > 0 && d.aggrGroupsNum >= limit {
	d.metrics.aggrGroupLimitReached.Inc()
	d.logger.Error("Too many aggregation groups, cannot create new group for alert", "groups", d.aggrGroupsNum, "limit", limit, "alert", alert.Name())
	return
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Aggregation Groups result in too many go routines #4503

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	for {
	select {
	case now := <-ag.next.C:

Uh oh!

Aggregation Groups result in too many go routines #4503

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions