Skip to content

clarify control plane downgrade and/or rollback during HA upgrade #12327

@liggitt

Description

@liggitt

Follow up from #11060, tracked in #12329

What is tested/supported for control plane component downgrade, and for safe rollback during an HA control plane upgrade is not clear in user-facing documentation:

Relevant comments are copied here:

#11060 (comment)

@yastij:
Do we support downgrades ? We should clarify this as @tpepper said.

@bgrant0607:
The open-source project currently doesn't support control-plane downgrades, but we are working on it. Replacing kubelets with older versions within permitted skew should be fine. I don't see any documentation on kubernetes.io about downgrades, either. So far, it's been provider-specific. Issues include storage version downgrades, resource orphaning / leaking, component and add-on downgrade order, and extension management. There's some discussion here: kubernetes/kubernetes#4855 (comment)

#11060 (comment):

@tpepper:
I'm curious about the comments around downgrade. My impression today is that we do not actually have anybody giving meaningful support for downgrade. There's a very narrow use case where we have some test coverage. It breaks regularly and there isn't an owner for the test. I can't find people who actually use or genuinely want it. At best we keep saying "that's Google" in SIG Release, the release teams, and SIG Cluster Lifecycle when we bump into downgrade issues, but at KubeCon last week in discussion with Chao Xu @caesarxuchao I got the distinct impression that actually this is not something Google does today and he was talking about looking in 2019 at making it functional...ie: adding meaningful support for downgrade.

As it stands the PR mentions upgrade, the skew document starts out somewhat generic in terms of skew direction, and then focuses on upgrade.

Is downgrade supported today? If so, the document should cover it explicitly.

That said, I really prefer the engineering simplifications that come with saying "no" to allowing downgrade, but even if we say we only support forward moves, we also don't have sufficient tooling to make it easy for operators to validate forward moves ahead of attempting them or make it safe and easy to discard such attempts (ie: no manually editing etcd content to amend the mistakes). In my experience it's dramatically easier to implement those improvements in a forward-only scheme, versus also trying to support downgrades.

@kubernetes/sig-testing @kubernetes/sig-release @kubernetes/sig-cluster-lifecycle @kubernetes/sig-architecture-feature-requests

Page to Update:
https://kubernetes.io/docs/setup/version-skew-policy/ (or update to link to relevant docs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.language/enIssues or PRs related to English languagelifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/backlogHigher priority than priority/awaiting-more-evidence.sig/architectureCategorizes an issue or PR as relevant to SIG Architecture.sig/cluster-lifecycleCategorizes an issue or PR as relevant to SIG Cluster Lifecycle.wg/ltsCategorizes an issue or PR as relevant to WG LTS.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions