Skip to content

Icinga2 looses state history and notifications during restart #10179

@w1ll-i-code

Description

@w1ll-i-code

Describe the bug

If an object has a state change during an icinga2 restart (e.g. during a deploy), it is sometimes not written to the database and does not trigger the notifications.

To Reproduce

  1. Import the basket with icingacli director basket restore < icinga-lost-statechange-basket.json
    1. icinga-lost-statechange-basket.json
    2. This basket contains:
      1. A check command that randomly goes into warning to generate the state changes.
      2. A service template that runs the check command
      3. A service group to quickly create lots of services, making the occurrence more likely.
      4. A host template as the target for the apply rule of the service group.
  2. Create a few hosts:
    1. for i in $(seq --equal-width 1 100); do
          icingacli director host create "host-icinga-lost-statehistory-${i}" --imports 'ht-icinga-lost-statechange'
       done
  3. Deploy the config
    1. icingacli director config deploy

With that configuration running, deploy icinga2 a few times: icingacli director config deploy --force --wait

Soon there will be state changes in the state history that should not be possible:
Screenshot from 2024-09-30 14-38-26

In this case, the service went from hard warning into soft warning. The soft warning history says that the last state was Ok, but that was never written into the history.

To find lost state histories quicker I used the following script:
dropped_state_query.tar.gz

It needs as parameters the endpoint, user and password. If the db is postgres, it can be run with the --postgres flag.

Expected behavior

I expect icinga2 to not loose state changes like that.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version):
icinga2 - The Icinga 2 network monitoring daemon (version: r2.14.2-1)

Copyright (c) 2012-2024 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Build information:
  Compiler: GNU 8.5.0
  Build host: staging5591master
  OpenSSL version: OpenSSL 1.1.1k  FIPS 25 Mar 2021
  • Operating System and version:
System information:
  Platform: Red Hat Enterprise Linux
  Platform version: 8.10 (Ootpa)
  Kernel: Linux
  Kernel version: 4.18.0-553.el8_10.x86_64
  Architecture: x86_64
  • Enabled features (icinga2 feature list):
Disabled features: command compatlog debuglog elasticsearch gelf graphite icingadb influxdb2 journald opentsdb perfdata syslog
Enabled features: api checker ido-mysql influxdb livestatus mainlog notification
  • Icinga Web 2 version and modules (System - About):
Icinga Web 2  NetEye release 4.39 (Traditional bock)

PHP Version   7.4.33
MODULE                  VERSION
analytics               1.58.0
auditlog                1.15.1
cube                    1.1.0
customproblemview       0.0.0
director                1.11.1
geomap                  1.22.0
grafana                 1.4.2
neteye                  1.155.0-1
host2servicedetailview  1.4.0
idoreports              0.10.1
incubator               0.22.0
ipl                     v0.5.0
lampo                   1.2.2
leafletjs               1.9.4
loginaudit              0.0.1
mapDatatype             0.1.0
monitoring              2.10.5
monitoringview          1.7.0
nagvis                  1.1.1
pdfexport               0.10.2
reactbundle             0.9.0
reporting               1.0.0
shutdownmanager         0.0.0
srwebbackend            0.0.0
tornado                 2.19.2
update                  1.44.1-2
  • Config validation (icinga2 daemon -C):
[2024-09-30 14:45:34 +0200] information/cli: Icinga application loader (version: r2.14.2-1)
[2024-09-30 14:45:34 +0200] information/cli: Loading configuration file(s).
[2024-09-30 14:45:34 +0200] information/ConfigItem: Committing config item(s).
[2024-09-30 14:45:34 +0200] information/ApiListener: My API identity: localhost.localdomain
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 LivestatusListener.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 ServiceGroup.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 902 Services.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 3 Zones.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 101 Hosts.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 6 ApiUsers.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2024-09-30 14:45:34 +0200] information/ConfigItem: Instantiated 251 CheckCommands.
[2024-09-30 14:45:34 +0200] information/ScriptGlobal: Dumping variables to file '/neteye/shared/icinga2/data/cache/icinga2/icinga2.vars'
[2024-09-30 14:45:34 +0200] information/cli: Finished validating the configuration file(s).

Additional context

I could observe the loss of notifications in production, have however not yet reproduced that behavior locally. I suspect however that the two behavior are linked.

We could also observe the same behavior when creating objects over the icinga2 api and then immediately sending a check-result. Once again, I have not replicated this locally yet, but I suspect the problem is the same in all these cases.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingref/IP

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions