Skip to content

Conversation

@willianpaixao
Copy link

This PR implements notification logging for Alertmanager.

  • Add --notification-log.file CLI flag to enable JSON-formatted notification logging
  • Log successful notifications with rich metadata (receiver, integration, group key, alert counts, group labels)
  • Follow the pattern established by Prometheus query logging

Motivation

Users need visibility into which notifications were successfully sent for auditing and analytics purposes. The current workaround of using a webhook receiver with continue: true has limitations around destination visibility and routing behavior.

Implementation Details

New Files

  • notify/notification_log.go - Notification logger implementation
  • notify/notification_log_test.go - Unit tests

Modified Files

  • notify/notify.go - Integrate NotificationLogger into the pipeline
  • cmd/alertmanager/main.go - Add CLI flag and initialization
  • notify/notify_test.go - Update existing tests for new function signature

Log Entry Format

Each line in the notification log file is a JSON object:

{
  "timestamp": "2024-01-15T10:30:00.123Z",
  "integration": "slack",
  "integrationIdx": 0,
  "receiver": "team-alerts",
  "groupKey": "{}:{alertname=\"HighMemory\"}",
  "alertsCount": 3,
  "firingCount": 2,
  "resolvedCount": 1,
  "groupLabels": {
    "alertname": "HighMemory"
  }
}

Design Decisions

  1. Metadata only: Log receiver, integration, group key, counts, and group labels. Avoids sensitive alert data and keeps log files manageable.
  2. Success only: Only log successful notifications. Failures are already visible in standard logs and metrics.
  3. JSON lines format: Easy to parse, grep, and integrate with log aggregation systems.
  4. No built-in rotation: Users can use logrotate or similar tools, following Prometheus conventions.

Usage

alertmanager --notification-log.file=/var/log/alertmanager/notifications.log

When the flag is not set or empty, notification logging is disabled (default behavior).

Test Plan

  • Unit tests for JSON serialization
  • Unit tests for file writing
  • Unit tests for concurrent write safety
  • Unit tests for noop logger
  • Existing notify package tests pass
  • Build succeeds
  • Manual integration test: start Alertmanager with flag, send alerts, verify JSON entries in log file

Future Considerations (Out of Scope)

  • Log rotation (users can use logrotate)
  • Logging failures (could add with status: "failure" in future PR)
  • Configurable fields to include/exclude
  • Metrics for notification logging

Fixes #2304

…tions

This adds a new --notification-log.file flag that enables logging of
successfully sent notifications to a JSON lines file. This feature
addresses the need for auditing and analytics of notification delivery.

When enabled, each successful notification is logged with metadata
including timestamp, integration type, receiver name, group key, alert
counts, and group labels. The implementation follows the pattern of
Prometheus query logging as suggested by maintainers.

The log format is JSON lines, making it easy to parse and integrate
with log aggregation systems. Log rotation is left to external tools
like logrotate, following Prometheus conventions.

Signed-off-by: Willian Paixao <willian@ufpa.br>
Copilot AI review requested due to automatic review settings January 21, 2026 23:56
@willianpaixao willianpaixao changed the title Log successfully sent notification feat(notify): add notification logging for successfully sent notifications Jan 21, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements notification logging for Alertmanager, enabling users to audit and analyze successfully sent notifications. The implementation adds a new CLI flag --notification-log.file that, when set, writes JSON-formatted log entries containing notification metadata (receiver, integration, group key, alert counts, and group labels) to a file.

Changes:

  • Added notification logging infrastructure with a file-based logger implementation that supports concurrent writes
  • Integrated notification logger into the RetryStage of the notification pipeline to log successful notifications
  • Added CLI flag for enabling notification logging with file path configuration

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
notify/notification_log.go New file implementing NotificationLogger interface with FileNotificationLogger and NoopNotificationLogger
notify/notification_log_test.go Comprehensive test suite covering JSON serialization, file operations, concurrency, and edge cases
notify/notify.go Integration of NotificationLogger into PipelineBuilder and RetryStage with helper function labelSetToMap
notify/notify_test.go Updated existing tests to accommodate new NotificationLogger parameter in NewRetryStage function signature
cmd/alertmanager/main.go Added CLI flag and initialization logic for notification logger

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +65
require.Equal(t, entry.Integration, decoded.Integration)
require.Equal(t, entry.IntegrationIdx, decoded.IntegrationIdx)
require.Equal(t, entry.Receiver, decoded.Receiver)
require.Equal(t, entry.GroupKey, decoded.GroupKey)
require.Equal(t, entry.AlertsCount, decoded.AlertsCount)
require.Equal(t, entry.FiringCount, decoded.FiringCount)
require.Equal(t, entry.ResolvedCount, decoded.ResolvedCount)
require.Equal(t, entry.GroupLabels, decoded.GroupLabels)
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp field is not verified in the test assertion. While the other fields are checked for equality after JSON serialization/deserialization, the Timestamp field should also be verified to ensure proper JSON marshaling and unmarshaling of the time.Time type.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +114
// Log writes a notification entry to the log file as a JSON line.
// The data is buffered by the OS and synced to disk when Close is called.
// Returns ErrNotificationLogClosed if the logger has been closed.
func (l *FileNotificationLogger) Log(entry *NotificationLogEntry) error {
l.mu.Lock()
defer l.mu.Unlock()

if l.closed {
return ErrNotificationLogClosed
}

data, err := json.Marshal(entry)
if err != nil {
return err
}
data = append(data, '\n')
_, err = l.file.Write(data)
return err
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Log method writes to the file without calling Sync. If the application crashes or the system loses power before the OS flushes the buffered data, notification logs could be lost. Consider adding a configuration option to sync after each write for users who need guaranteed durability, or document this behavior in the function comment since it currently states "The data is buffered by the OS and synced to disk when Close is called."

Copilot uses AI. Check for mistakes.
// Log writes a notification entry to the log file as a JSON line.
// The data is buffered by the OS and synced to disk when Close is called.
// Returns ErrNotificationLogClosed if the logger has been closed.
func (l *FileNotificationLogger) Log(entry *NotificationLogEntry) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should add new locks for this, or a custom notification format...
Would it make sense to use slog for example, and standardize to the standard library?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also implementation wise, would sending the entries to a channel, and having a goroutine do the marshalling and logging be better, and hold locks for less? I am a bit worried about json.Marshal(...) while holding a lock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature request] Log each successfully sent notification with metadata

2 participants