-
Couldn't load subscription status.
- Fork 2.3k
Description
Aggregation groups run in dedicated go routines, this means that alertmanager can create thousands of routines, depending on the number of unique alerts it receives and grouping config:
alertmanager/dispatch/dispatch.go
Line 350 in f6b942c
| go ag.run(func(ctx context.Context, alerts ...*types.Alert) bool { |
These routines are usually waiting in a select statement for the next ticker which seems very wasteful:
alertmanager/dispatch/dispatch.go
Lines 440 to 442 in f6b942c
| for { | |
| select { | |
| case now := <-ag.next.C: |
Some graphs from our deployment:
A better approach could be creating one go routine per receiver instead as aggregation groups share receivers. This should reduce the number of go routines significantly.
Aggregation groups can depend on receiver based trigger to flush notifications.
Aggregation groups also support a global limit:
alertmanager/dispatch/dispatch.go
Lines 333 to 338 in f6b942c
| // If the group does not exist, create it. But check the limit first. | |
| if limit := d.limits.MaxNumberOfAggregationGroups(); limit > 0 && d.aggrGroupsNum >= limit { | |
| d.metrics.aggrGroupLimitReached.Inc() | |
| d.logger.Error("Too many aggregation groups, cannot create new group for alert", "groups", d.aggrGroupsNum, "limit", limit, "alert", alert.Name()) | |
| return | |
| } |
but this is currently not used(
nil limiter): alertmanager/cmd/alertmanager/main.go
Line 495 in f6b942c
| disp = dispatch.NewDispatcher(alerts, routes, pipeline, marker, timeoutFunc, nil, logger, dispMetrics) |
I think to protect alertmanager under load, we should implement a per receiver limiter.
If alerts sent to a receiver are not acted upon it can be assumed that alert manager can limit the number of aggregation groups created for that receiver.