Skip to content

Provide a way to surface arbitrary node conditions at machine level #11826

Open
@sm4ll-3gg

Description

@sm4ll-3gg

What would you like to be added (User Story)?

As operator I'd like to be able to reflect workload cluster's Node status on relevant Machine resources without any remediation.

Detailed Description

We're looking for a way to obtain more control over the MD's (and KCP in the future) rolling update process. First of all, we want to reflect custom conditions, that NPD set, on the Machine's Ready condition in order to pause rolling update until all checks are succeeded. It'll be possible out of the box by using MachineHealthCheck after v1beta2 is released, but MHC is tightly coupled with the remediation feature that we don't want to use.

In this case we want to reflect custom health checks from workload cluster on it's lifecycle management, controlled by CAPI.

Since the resource is named MachineHealthCheck, not MachineSelfHealing, I suppose it'd be ok to plug out remediation when we want to. It could be implemented just like one optional boolean field remediationDisabled which will be backward compatible.

Anything else you would like to add?

Considered alternative solutions:

  • Use cluster.x-k8s.io/skip-remediation annotation on Machine resources: it's implicit and error prone since the annotation can be easily forgotten when creating a new MD;
  • Use maxUnhealthy: 0: deprecated and has no alternatives for now (Deprecate MachineHealthCheck MaxUnhealthy and UnhealthyRange #10722)
  • Implement custom no-op remediation template: looks more like workaround than a solution
  • Implement custom controller that will be doing the same things like MHC, but without remediation: but why not to just patch existing feature instead of duplication?

Label(s) to be applied

/kind feature
/area machinehealthcheck
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/machinehealthcheckIssues or PRs related to machinehealthcheckskind/featureCategorizes issue or PR as related to a new feature.needs-priorityIndicates an issue lacks a `priority/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions