-
Notifications
You must be signed in to change notification settings - Fork 126
Description
How to categorize this issue?
/area usability
/area control-plane
/kind enhancement
/priority 3
What would you like to be added:
An annotation similar to the already existing node.machine.sapcloud.io/trigger-deletion-by-mcm
that, if set on a node, disables the machineHealthTimeout
for the specific machine until the annotation is removed/set to false.
Why is this needed:
We have the use-case where some VMs are powered off because of maintenances unrelated from any gardener related maintenance (OS updated, k8s update, ...).
An external, gardener independent, component (e.g., a daemonset) can check for such maintenances and set the annotation accordingly.
After the maintenance is offline and the VM is up and running again, the component would remove the annotation.
Using the in-place
upgrade strategy is not suitable for us, as we still want to update nodes by rolling replace. Additionally, the in-place
upgrade strategy would cause all nodes of the workerpool to have disableHealthTimeout
always set to true
(not just while an upgrade is in progress), but we still want to automatically replace machines that unexpectedly went offline.
Additional context/small discussion can be found here: https://gardener-cloud.slack.com/archives/C045DSWJZB9/p1747128961067609
CC: @acumino @unmarshall
If you agree with the proposed solution, we will gladly implement it 😄