Skip to content

Add a way to temporarily prevent node deletion a.k.a Freeze machine #818

Open

Description

How to categorize this issue?

/area quality robustness
/kind enhancement
/priority 3

What would you like to be added:
A way to temporarily prevent node from getting deleted. For eg, when we cordon/drain a node and investigate it, sometimes it gets deleted automatically because it's not healthy. It would be really useful to be able to keep a node alive to investigate it and find the root cause of a given problem.

It could be something like an annotation to add to a node resource (ideally not machine since shoot owner might also find this useful). I also think this should add another annotation with something like a timeout threshold (that can be increased if needs be) to prevent people from forgetting a node with that state.

** Update 2Aug meeting with Etienne **

Investigation would be needed in following phases:

  • Pending (machine is not joining cases)
  • Unknown machine (pods not working so cordon/drain node and then inspect)
  • Running machine (pods not working, but machine Running , probably because the issue couldn't be tracked through a node condition)

Terminating WON'T need any investigation as the resources are in deletion phase, and could have been partly deleted by the time , machine is marked to be ignored from deletion.

Why is this needed:
This would be useful to troubleshoot nodes that are suddenly stop working as expected (RCA purposes)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    area/qualityOutput qualification (tests, checks, scans, automation in general, etc.) relatedarea/robustnessRobustness, reliability, resilience relatedkind/enhancementEnhancement, improvement, extensionlifecycle/staleNobody worked on this for 6 months (will further age)needs/planningNeeds (more) planning with other MCM maintainerspriority/3Priority (lower number equals higher priority)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions