Improve calculation of NodeDrainTimeout & NodeVolumeDetachTimeout exceeded #11126

sbueringer · 2024-09-02T15:05:28Z

Today, we check if the NodeDrainTimeout & NodeVolumeDetachTimeout are exceeded by comparing time.Now with the LastTransitionTime of the corresponding conditions.

This is fragile for multiple reasons:

LastTransitionTime is updated when the condition flickers between true & false
LastTransitionTime is updated every time anything changes in the condition (e.g. message), xref:
- cluster-api/util/conditions/setter.go
  
  Line 54 in 4f1637e
  
  if !hasSameState(&existingCondition, condition) {
- cluster-api/util/conditions/setter.go
  
  Line 199 in 4f1637e
  
  func hasSameState(i, j *clusterv1.Condition) bool {

It would be already a good improvement if the following is implemented:

existing conditions utils changes lastTranistionTime whenever something changes in the condition, while future/upstream aligned conditions utils are changing lastTranistionTime only when status change
#11033 (comment)

But I think instead of relying on the LastTransitionTime we should have additional status fields tracking when we started to drain & when we started to wait for volume detach and then calculate if the timeouts are exceeded based on that.

(Note: this is not relevant for NodeDeletionTimeout because for this one we use the deletionTimestamp field)

sbueringer · 2024-09-02T15:07:03Z

/triage accepted

sbueringer · 2024-09-02T15:07:09Z

/cc @fabriziopandini @chrischdi @enxebre

fabriziopandini · 2024-09-03T09:22:55Z

But I think instead of relying on the LastTransitionTime we should have additional status fields tracking when we started to drain & when we started to wait for volume detach and then calculate if the timeouts are exceeded based on that.

+1
I think that we should add a status.Termination struct with NodeDrainStartTime & NodeVolumeDetachStartTime fields. This will be somehow "consistent" with the Initialization struct proposed by #10897.

chrischdi · 2024-09-09T15:18:08Z

/assign

sbueringer · 2024-09-09T16:08:54Z

One small note, wondering if we should call it status.deletion instead of termination. I think in CAPI we usually always talk about "deletion"

+ NodeVolumeDetachStartTime should be WaitForNodeVolumeDetachStartTime

fabriziopandini · 2024-09-10T08:49:39Z

No strong opinions about the name.
Termination makes a good pair to Initialization, it also reminds of stuff like gracetermination period etc etc

sbueringer · 2024-09-10T10:16:02Z

Yup, we just use deletion in a lot of places and we don't have things like terminationGracePeriod for Machines (but we have a deletionTimestamp)

(we're also always talking about Machine deletion, never really used/heard Machine termination)

sbueringer added this to the v1.9 milestone Sep 2, 2024

k8s-ci-robot added needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 2, 2024

sbueringer added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Sep 2, 2024

k8s-ci-robot removed needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 2, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 2, 2024

fabriziopandini mentioned this issue Sep 4, 2024

📖 Proposal: Improving status in CAPI resources #10897

Merged

k8s-ci-robot assigned chrischdi Sep 9, 2024

chrischdi mentioned this issue Sep 10, 2024

✨ machine: Introduce Deletion status field and add timestamps for drain and volumeDetach instead of using the condition #11166

Merged

k8s-ci-robot closed this as completed in #11166 Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve calculation of NodeDrainTimeout & NodeVolumeDetachTimeout exceeded #11126

Improve calculation of NodeDrainTimeout & NodeVolumeDetachTimeout exceeded #11126

sbueringer commented Sep 2, 2024

sbueringer commented Sep 2, 2024

sbueringer commented Sep 2, 2024

fabriziopandini commented Sep 3, 2024 •

edited

Loading

chrischdi commented Sep 9, 2024

sbueringer commented Sep 9, 2024 •

edited

Loading

fabriziopandini commented Sep 10, 2024

sbueringer commented Sep 10, 2024 •

edited

Loading

Improve calculation of NodeDrainTimeout & NodeVolumeDetachTimeout exceeded #11126

Improve calculation of NodeDrainTimeout & NodeVolumeDetachTimeout exceeded #11126

Comments

sbueringer commented Sep 2, 2024

sbueringer commented Sep 2, 2024

sbueringer commented Sep 2, 2024

fabriziopandini commented Sep 3, 2024 • edited Loading

chrischdi commented Sep 9, 2024

sbueringer commented Sep 9, 2024 • edited Loading

fabriziopandini commented Sep 10, 2024

sbueringer commented Sep 10, 2024 • edited Loading

fabriziopandini commented Sep 3, 2024 •

edited

Loading

sbueringer commented Sep 9, 2024 •

edited

Loading

sbueringer commented Sep 10, 2024 •

edited

Loading