-
Notifications
You must be signed in to change notification settings - Fork 595
Fix scheduling of downtimes for all services on child hosts #8990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Will cancelling the parent downtime now also cancel the service downtimes of children? |
lib/icinga/apiactions.cpp
Outdated
Downtime::Ptr serviceDowntime = Downtime::AddDowntime(hostService, author, comment, startTime, endTime, | ||
Downtime::Ptr serviceDowntime = Downtime::AddDowntime(childService, author, comment, startTime, endTime, | ||
fixed, triggerName, duration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will cancelling the parent downtime now also cancel the service downtimes of children?
Looks like it won't as #8913 didn't add the information about that relation to this call. But I think it should, so I'll change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, it's not only missing in that place but also in line 442. @N-o-X was this a deliberate decision to only set the parent relation for those downtimes created by all_services
but not child_options
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, I think I can remember the discussion we had about that (looks like we haven't written down the result over there in the PR).
We decided on not propagating the deletion down to dependencies as it's not clear what should happen if a downtime somewhere in the middle of a dependency chain is deleted. Intuitively, one could say that this should delete all the downtimes in that sub-tree, however dependencies are a graph, not a tree, so a dependant child could be in there due to multiple dependencies.
What's definitely an oversight in the current PR that service downtimes on child hosts don't have a relation to the downtime on the host. This should be addressed in any case. I'm no longer sure about dependencies. If we want to also delete then, this should be handled separately as this needs larger changes to what we store and how we handle deletes (as described in the second paragraph) and also a downtime can only one parent right now, so it can't have a relation to both the downtime on the child host and the root downtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye fine, but then should the documentation get updated (https://icinga.com/docs/icinga-2/latest/doc/16-upgrading-icinga-2/#deletion-of-child-downtimes-on-services) before this fix is propagated. Otherwise the expected result is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see anything wrong with the documentation. It says all_services
flag and nothing about child_options
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But with this change the all_services flag applies also to childs. If that's the case, and I re-read the documentation with this in mind, I expect that these service downtimes of children also get removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting a downtime of a child host will now also delete downtimes for service of that particular child host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The loop iterated over the services of the wrong host resulting in duplicate downtimes scheduled for services of the parent host instead of downtimes for services of the child host.
b762af7
to
bb0dcdf
Compare
I've pushed a new version that handles the deletion of child downtimes for the @nilmerg Please ack if that behavior is ok for you. Otherwise please create an issue for also handling the propagation of deletion for |
I just want that creating and removing downtimes behaves the same regarding the all_services flag. If a child's services get a downtime due to that flag and the documentation tells me that service downtimes created with this flag get automatically removed, and if this is in fact the case.., then it's fine. If that's not the case the documentation has to reflect that. |
Does the current behavior (i.e. downtime removal on any hosts propagates to the services of that very host) match the documentation? The only problem I could see is that the term "child" is sometimes used for the relation between a service and the host it belongs to and sometimes for the relation given by an explicit |
From: https://icinga.com/docs/icinga-2/latest/doc/12-icinga2-api/#schedule-downtime
The latter only speaks about the parent host's services. It needs to clarify what happens to child hosts's services. Btw: I miss that part here: https://icinga.com/docs/icinga-2/latest/doc/12-icinga2-api/#remove-downtime |
@nilmerg Can you provide a short update when this will be moving forward? Thanks in advance. |
I've added another commit that documents how downtime removals using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs are fine now, thanks @julianbrost!
Can anyone tell me when the new release will be out? @lippserd told me he thinks it will be out in late November. |
Fundamentally, scheduling downtimes for all services on child hosts was implemented, it just iterated over the services of the wrong host resulting in duplicate downtimes being scheduled for services of the parent host. This PR fixes this loop and in addition prevents the creation of duplicate downtimes for individual services in case that both the service and its host are children of the parent checkable.
Tests
Test config: 3 hosts, each with two services,
service
on on hostschild1
andchild2
depends on hostparent
, additionally, hostchild1
depends on hostparent
.Please ignore the service
keep-state
in the following output, that's just an additional service I apply to every host in my test cluster.Before
Note how the
service_downtimes
of downtimegithub-8989-child1!65a4a10e-473c-4ae1-9e46-96cf4ce7e463
lists downtime forgithub-8989-parent
.After
Now
service_downtimes
forgithub-8989-child1!82279e74-7225-477b-bfe0-b150c637afaa
indeed contains downtimes for all services of hostgithub-8989-child1
. Also note thatgithub-8989-child2!service1
part of that list and there is no duplicate downtime for it inchild_downtimes
.Deleting the downtime for
parent
also deletes the downtimes created for services of that host but leaves downtimes of other hosts and their services unaffected. Deleting the downtime forchild1
also deletes the downtimes for all its serivces (includingservice1
which is a child itself).Special thanks to @NicolasGoeddel for pointing out the cause of this bug.
fixes #8989