Skip to content

No Notification sent after flapping phrase #10845

@carraroj

Description

@carraroj

Description:

When a service starts flapping and later stabilizes in a problem state (e.g. CRITICAL), no Problem notification is ever delivered during or after the flapping phase. The hard state transition to CRITICAL occurs while is_flapping is still true, so it is blocked by the !is_flapping guard and not stashed for later delivery. When flapping ends, the service is already in hard CRITICAL — no new hard state change occurs, so no Problem notification is triggered either. As a result, notifiedProblemUsers remains empty throughout the entire cycle. When the service later recovers, the Recovery notification is processed but every user is skipped because they were never recorded as having been notified about a problem. Icinga Web shows the Recovery notification in the history with "Notified Contacts: None".

Scenario:

Service is in OK state (hard)
Service oscillates between OK and CRITICAL — all state changes remain soft (max_check_attempts > 1), no Problem or Recovery notifications are sent, `notifiedProblemUsers` stays empty
Flapping threshold is exceeded — FlappingStart notification is sent
While flapping, the service delivers enough consecutive CRITICAL results to reach hard CRITICAL state — the resulting Problem notification is blocked by the `!is_flapping` guard and not stashed — `notifiedProblemUsers` remains empty
Flapping value drops below threshold — FlappingEnd notification is sent — the service is already in hard CRITICAL, no new `hardChange` occurs, no Problem notification is triggered
Service recovers (CRITICAL → OK) — Recovery notification is triggered, but every user is skipped because `notifiedProblemUsers` is empty

The notification history entry shows "Type: Recovery, State: Ok, Notified Contacts: None".

Root Cause:

State notifications are only sent when the service is not flapping. The hard state transition to CRITICAL that occurs during flapping is blocked and not stashed for later delivery:

FlappingStart is sent here:

https://github.com/Icinga/icinga2/blob/v2.16.0/lib/icinga/checkable-check.cpp#L431-L440

State notification requires !is_flapping — during flapping, this block is never entered, the Problem notification is neither sent nor stashed into suppressed_types:

https://github.com/Icinga/icinga2/blob/v2.16.0/lib/icinga/checkable-check.cpp#L463-L476

Since notifiedProblemUsers is populated exclusively by NotificationProblem notifications, it remains empty:

https://github.com/Icinga/icinga2/blob/v2.16.0/lib/icinga/notification.cpp#L517-L518

When Recovery is later triggered, the filter skips every user whose type filter includes Problem:

https://github.com/Icinga/icinga2/blob/v2.16.0/lib/icinga/notification.cpp#L461-L471

OnNotificationSentToAllUsers is then called with an empty user set, creating the history entry with zero contacts:

https://github.com/Icinga/icinga2/blob/v2.16.0/lib/icinga/notification.cpp#L526

Relation to PR #10361:

PR #10361 handles the case where flapping ends and the service has already recovered (FlappingEnd + OK state) via IsRecoveryOrFlappingEndAndCheckableIsOK():

https://github.com/Icinga/icinga2/blob/v2.16.0/lib/icinga/notification.cpp#L242-L250

Our scenario is the complementary case: FlappingEnd while the service is still in a problem state. IsRecoveryOrFlappingEndAndCheckableIsOK() returns false here, so no internal state reset occurs and notifiedProblemUsers is never repopulated.

Expected Behavior:

After flapping ends with the service in a problem state, a subsequent Recovery notification should be delivered to the configured contacts.

Environment:

Icinga 2: 2.16.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions