-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Description
Follow up from #11060, tracked in #12329
What is tested/supported for control plane component downgrade, and for safe rollback during an HA control plane upgrade is not clear in user-facing documentation:
Relevant comments are copied here:
@yastij:
Do we support downgrades ? We should clarify this as @tpepper said.
@bgrant0607:
The open-source project currently doesn't support control-plane downgrades, but we are working on it. Replacing kubelets with older versions within permitted skew should be fine. I don't see any documentation on kubernetes.io about downgrades, either. So far, it's been provider-specific. Issues include storage version downgrades, resource orphaning / leaking, component and add-on downgrade order, and extension management. There's some discussion here: kubernetes/kubernetes#4855 (comment)
@tpepper:
I'm curious about the comments around downgrade. My impression today is that we do not actually have anybody giving meaningful support for downgrade. There's a very narrow use case where we have some test coverage. It breaks regularly and there isn't an owner for the test. I can't find people who actually use or genuinely want it. At best we keep saying "that's Google" in SIG Release, the release teams, and SIG Cluster Lifecycle when we bump into downgrade issues, but at KubeCon last week in discussion with Chao Xu @caesarxuchao I got the distinct impression that actually this is not something Google does today and he was talking about looking in 2019 at making it functional...ie: adding meaningful support for downgrade.As it stands the PR mentions upgrade, the skew document starts out somewhat generic in terms of skew direction, and then focuses on upgrade.
Is downgrade supported today? If so, the document should cover it explicitly.
That said, I really prefer the engineering simplifications that come with saying "no" to allowing downgrade, but even if we say we only support forward moves, we also don't have sufficient tooling to make it easy for operators to validate forward moves ahead of attempting them or make it safe and easy to discard such attempts (ie: no manually editing etcd content to amend the mistakes). In my experience it's dramatically easier to implement those improvements in a forward-only scheme, versus also trying to support downgrades.
@kubernetes/sig-testing @kubernetes/sig-release @kubernetes/sig-cluster-lifecycle @kubernetes/sig-architecture-feature-requests
Page to Update:
https://kubernetes.io/docs/setup/version-skew-policy/ (or update to link to relevant docs)