Skip to content

Support overriding/setting disableHealthTimeout via annotation on node #996

@Kumm-Kai

Description

@Kumm-Kai

How to categorize this issue?

/area usability
/area control-plane
/kind enhancement
/priority 3

What would you like to be added:
An annotation similar to the already existing node.machine.sapcloud.io/trigger-deletion-by-mcm that, if set on a node, disables the machineHealthTimeout for the specific machine until the annotation is removed/set to false.

Why is this needed:
We have the use-case where some VMs are powered off because of maintenances unrelated from any gardener related maintenance (OS updated, k8s update, ...).

An external, gardener independent, component (e.g., a daemonset) can check for such maintenances and set the annotation accordingly.
After the maintenance is offline and the VM is up and running again, the component would remove the annotation.

Using the in-place upgrade strategy is not suitable for us, as we still want to update nodes by rolling replace. Additionally, the in-place upgrade strategy would cause all nodes of the workerpool to have disableHealthTimeout always set to true (not just while an upgrade is in progress), but we still want to automatically replace machines that unexpectedly went offline.

Additional context/small discussion can be found here: https://gardener-cloud.slack.com/archives/C045DSWJZB9/p1747128961067609

CC: @acumino @unmarshall

If you agree with the proposed solution, we will gladly implement it 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/control-planeControl plane relatedarea/usabilityUsability relatedkind/enhancementEnhancement, improvement, extensionpriority/3Priority (lower number equals higher priority)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions