-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4330: Change API availability to forward compatibility #5038
base: master
Are you sure you want to change the base?
KEP-4330: Change API availability to forward compatibility #5038
Conversation
siyuanfoundation
commented
Jan 14, 2025
•
edited
Loading
edited
- One-line PR description: change api availability to forward compatible to make it easier to handle API graduation in compatibility version.
- Issue link: Introduce compatibility version for Kubernetes control-plane upgrades #4330
- Other comments: more discussions can be found in [Compatibility Version] Guidance on API version promotion kubernetes#127791
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: siyuanfoundation The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6abb021
to
fbbc5f7
Compare
Instead, we are proposing to keep forward compatibility of all AIPs in compatibility version mode: | ||
All APIs group-versions declarations would be modified | ||
to track which Kubernetes version the API group-versions are removed at. APIs introduced between the emulation version and | ||
the binary version are still available if the stability level is no less than the APIs introduced before or at the emulation version and enabled at the emulation version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this is referring to API enablement/disablement on a group-version level. However, API lifecycles are generally individual on the group-version-resource level. I think it would make more sense to discuss the individual beta->GA resources rather than the entire group version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
|
||
Alpha APIs may not be enabled in conjunction with emulation version. | ||
|
||
The forward compatibility of APIs should not affect data compatibility because storage version is still controlled by the `MinCompatibilityVersion` regardless of whether the data are created through future versions of the API endpoints. Webhooks should also work fine if the matching policy is `Equivalent`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a section on how MinCompatibilityVersion
will change storage version since the policy has always been n-1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in #5006
to track which Kubernetes version the API group-versions are introduced (or | ||
removed) at. | ||
removed) at. But in practice, that would make code changes of API graduation intractable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intractable only for withholding API introductions. API removals don't need special handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the whole section. Please review again see if it is more clear.
@@ -610,21 +623,25 @@ The storage version of each group-resource is the newest | |||
|
|||
### API availability | |||
|
|||
Similar to feature flags, all APIs group-versions declarations will be modified | |||
Ideally, similar to feature flags, all APIs group-versions declarations should be modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referring to the prerelease lifecycle files or a different mechanism? prerelease lifecycle operates on the types instead of the group version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is referring to the the prerelease lifecycle.
fbbc5f7
to
ebd2f55
Compare
removed) at: if an API is introduced after (or removed before) the emulation version, it should not be available at the emulation version. But in practice, that would make code changes of API graduation that wants to use newer APIs intractable. | ||
For example, if a controller is switching from Beta to GA API at the binary version, a full emulation would mean the whole controller code needs to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we're doing this specifically for controllers. Are there other use cases or should we simplify the wording here and just state up front that we're doing this to support controllers specifically?
the emulation version is set to. I.e. `--runtime-config=api/<version>` needs | ||
to be able to turn on APIs exactly like it did for each Kubernetes version that | ||
emulation version can be set to. | ||
If an API has GAed and has not been removed at the emulated version, it would be enabled by default at the emulation version. If a newer stable version of the GA API has been introduced between the emulation version and the binary version, the new GA API would also be enabled at the emulation version along with the old GA API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comparison (maybe in a table), or how emulation works for normal APIs vs how it works for un-emulatable (non-emulatable?) features might make this information easier to digest.
|
||
GA APIs should match the exact set of APIs enabled in GA for the Kubernetes version | ||
the emulation version is set to. | ||
Instead, we are proposing to keep forward compatibility of all APIs in compatibility version mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're proposing that non-emulatable feature controllers to update to use the newest available API, ignoring emulation API enablement, right? Maybe say this more directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mainly for controllers for now. May expand to kublet if we want to add compat version in node. Updated the section.
f8bce11
to
961b0c9
Compare
961b0c9
to
9d67760
Compare
|
||
With the `NonEmulatableFeatures` list, the server would fail to start if | ||
1. the `EmulationVersion != BinaryVersion` and | ||
1. any feature in the `NonEmulatableFeatures` list is disabled by default at the emulation version and explicitly enabled by the `--feature-gates` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What alternatives were considered here? It's not clear to me WHY we're deciding to refuse to start apiservers that have a non-emulatable beta on vs. allowing the beta to be turned on but not guaranteeing emulation for the feature.
Having the apiserver refuse to start is a potential footgun for anyone with automation that uses emulation version, and also offers the capability to turn on beta features/APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving more thoughts to it, I think the alternative is the better way to go. Updated the section with pros and cons.
69475d9
to
32e11d2
Compare
32e11d2
to
0cdaece
Compare
the emulation version is set to. I.e. `--runtime-config=api/<version>` needs | ||
to be able to turn on APIs exactly like it did for each Kubernetes version that | ||
emulation version can be set to. | ||
For the use case of upgrading control plane binary version without changing the emulation version, this would mean api servers should be upgraded first before any other components, because an api server of the old binary version would not be able to serve a controller of the new binary version even though the emulation version is the same as the old binary version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not the case today? apiserver should be always be upgraded first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today although apiserver should be upgraded first according to the doc, you can still upgrade all the control plane components in one node first - apiserver1, cm1, scheduler1, before moving on to the next node. That is not a problem today because apis support N-1, N, N+1. But with emulation version, this might not work anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this is what the coordinated leader election will solve #5088
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coordinated Leader Election is trying to solve this problem but it is preexisting. Given the situation Siyuan described, a controller that graduated to GA could still talk to a GA API unavailable on other apiservers that haven't upgraded.
|
||
Alpha APIs may not be enabled in conjunction with emulation version. | ||
#### Alternatives to API forward compatibility | ||
To make API graduation workable for controller code change under compatibility version, we have considered and rejected the following alternative options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also always assume that the logic in the controllers is the same between versions, but that may not be the case, if the controller and the API change between versions, or also when an existing GA API add a new field and the controller depends on that new field, you'll never being able to emulate the previous behavior ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to #5038 (comment)
``` | ||
To fully emulate the controller for an older version, anywhere v1 api/type is referenced, it would need to switch to the v1beta version if the emulation version is older than the binary version. This would mean a lot of extra work, complicated testing rules, and high maintenance cost even for simple API graduations, while the emulation fidelity is unreliable with the extra complexity. | ||
|
||
So instead of truly emulating the feature controllers and API availability at the emulation version, we are proposing to keep forward compatibility of all APIs in compatibility version mode and have the non-emulatable feature controllers to update to use the newest available API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a 1.32 controller does thing/A. A 1.33 controller does thing/A and also does wrongThing/B. Does this meant that switching to an emulation version for 1.32 does not stop the 1.33 controller from doing wrongThing/B?
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.30 | NA | ||
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true` | ||
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.33` | 1.33 | `api/v1` | ||
`v1beta1: introduced=1.31, removed=1.32`<br>`v1beta2: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1beta2` available only when `--runtime-config=api/v1beta1=true` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is v1beta2 on in this scenario since it wasn't even introduced in 1.31?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1beta2 and v1beta1 have the same stability level, so the controller may have moved on to v1beta2 in the binary. By forward compatibility, we are exposing apis that are introduced later than the emulation version that are the same / more stable then the apis enabled at the emulation version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newer APIs can have new behavior that may be problematic and be the reason someone is trying to rollback. For instance, what if v1beta2 broke cost estimation for VAP and someone wants to close the hole. They rollback to 1.31 behavior, but because this API is being enabled when only v1beta1 is wanted, they're stuck with the improper version enabled. That seems like the sort of scenario we want to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet this is stemming from different modes of upgrade. Some systems upgrade all the kube binaries as a unit (I think GKE does this) and while they may rollout in an HA manner, each unit is either 1.31 or 1.32. Other system upgrade binaries across the cluster, so each instance of the kube-apiserver upgrades, then the kube-controller-manager, etc. OpenShift does this.
This leads to two different perspectives of emulation version.
- I want
kube-apsierver --emulation-version=1.31
to be explicitly different than a 1.31 kube-apiserver. Perhaps in a manner that makes it impractical to stop taking broken paths (v1beta2 is broken case). This might be preferred (sometimes) by upgrade everything as a unit upgrades. But leaves us in a state where if v1beta2 is broken, there doesn't appear to be a way to actually disable it and function with the v1beta1 API on and the v1beta2 API off. - I want
kube-apiserver --emulation-version=1.31
to be behave like a 1.31 kube-apiserver. This is probably preferred by those who upgrade binaries independent of each other.
The current proposal appears to make #2 impossible. What if instead we made --emulation-version=1.31
emulate the 1.31 version of the kube-apiserver and required requesting serving the non-included APIs? In this example --runtime-config=api/v1beta1=true,api/v1beta2=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the confusion about what is expected from emulation version, I have added a --api-forward-compatible
, so that kube-apiserver --emulation-version=1.31
would behave like a 1.31 kube-apiserver by default, and only include the future api if explicitly set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this situation they need the v1 API enabled also.
This is exactly the type of scenario I'd like to make sure we support correctly. This scenario works when -emulate-version-forward-compatible=1.33
since the binary version is 1.33 and the 1.33 provides v1 of the API, which is what the controller expects. I wouldn't know what --emulate-version-forward-compatible=1.32
would mean for 1.31 emulation with a 1.33 binary. Does this suggest that --emulate-version-forward-compatible
should always be set to the binary version?
If we had customer/C, who ran 1.31 without v1beta1=true. They end up at 1.33. They want to go back to
--emulation-version=1.31 --emulate-version-forward-compatible=1.33
. I would expect this to not enable the v1 API and I would expect controller/x to not run. Is this what would happen?
That's my expectation as well. I'm expecting the apiserver to only enable forward compatible APIs for the APIs that are actually enabled, and that we've flagged as needing forward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the type of scenario I'd like to make sure we support correctly. This scenario works when -emulate-version-forward-compatible=1.33 since the binary version is 1.33 and the 1.33 provides v1 of the API, which is what the controller expects.
I think we're missing a case in the KEP of this scenario with the chain of v1beta1->v1beta2->v1
. Logically (at--emulated-version=1.31 --forward-compatible=1.33
bv: 1.33) enabling v1beta1 would only require v1 to be enabled (for controller compatibility), but what about other APIs in the chain? Would v1beta2 be need to be enabled as well? Is it weird that upgrading from --emulated-version=1.31 --forward-compatible=1.32
(bv=1.32) to --emulated-version=1.31 --forward-compatible=1.33
(bv=1.33) disables the v1beta2 API?
I wouldn't know what --emulate-version-forward-compatible=1.32 would mean for 1.31 emulation with a 1.33 binary. Does this suggest that --emulate-version-forward-compatible should always be set to the binary version?
I would think so, we cannot guarantee forward compatibility with another version other than the binary version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suppose the controller uses v1beta1 (bv: 1.31)
-> v1beta2 (bv: 1.32)
-> v1 (bv: 1.33)
.
If the user is using --emulated-version=1.31 (bv: 1.32)
to make the controller work, we have to enable v1beta2
when --runtime-config=v1beta1=true
. If the user is using --emulated-version=1.31 (bv: 1.33)
to make the controller work, we have to enable v1
when --runtime-config=v1beta1=true
, in this case setting --emulate-version-forward-compatible=1.32
does not make the controller work with v1beta2
api because the controller is already using the lastest API. So we have to enable the latest api.
--emulate-version-forward-compatible=1.32
will only make controllers with APIs that have not changed from 1.32 to 1.33 work. The users would have to track when the versions of all beta api in use are changed, or most likely they would just use --emulate-version-forward-compatible=1.33
to cover all cases. If they want finer control, they can use --runtime-config
without --emulate-version-forward-compatible
. So I think setting the specific version in emulate-version-forward-compatible
does not give us much benefit.
As for whether v1beta2
should be enabled: from the api lifecycle, we know that v1beta2
is introduced at 1.32 and v1
at 1.33. But we don't know when the controller is changed from v1beta2
to v1
, because sometimes an api is introduced before it is actually used. From the perspective of the api server, I think we have enable both v1beta2
and v1
when --emulate-version-forward-compatible
is requested.
If we had customer/C, who ran 1.31 without v1beta1=true. They end up at 1.33. They want to go back to --emulation-version=1.31 --emulate-version-forward-compatible=1.33. I would expect this to not enable the v1 API and I would expect controller/x to not run. Is this what would happen?
Yes, they would need to enable v1beta1
for v1
to be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like we've mostly converged on the problem and what the general shape of a solution should be? How about a KEP update that incorporates all of this? We could then iterate from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. PTAL
a101848
to
32a06b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this feature described as Removed: 1.31
in the 1.30 release below?
enhancements/keps/sig-architecture/4330-compatibility-versions/README.md
Lines 438 to 449 in 41a49ff
The steps to remove the Beta feature would be: | |
| Release | Stage | Feature tracking information | | |
| ------- | ----- | ------------------------------------------------- | | |
| 1.26 | beta | Beta: 1.26 | | |
| 1.27 | beta | Beta: 1.26, Deprecated: 1.27 | | |
| 1.28 | beta | Beta: 1.26, Deprecated: 1.27 | | |
| 1.29 | beta | Beta: 1.26, Deprecated: 1.27 | | |
| 1.30 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 | | |
| 1.31 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 | | |
| 1.32 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 | | |
| 1.33 | - | **`if featureGate enabled { // implement feature }` code may be removed at this step** | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes if you know when the api would be removed you can just add the removal before the actual release.
Beta API graduated to GA|off-by-default|on|API available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available | ||
Beta API removed|off-by-default|-|API available only when `--runtime-config=api/v1beta1=true`|API `api/v1beta1` does not exist | ||
on-by-default Beta API removed|on-by-default|-|API available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist | ||
Beta API graduated to GA|off-by-default|on|API `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "graduated to GA" imply that the Beta API is removed (from N)?
Should --runtime-config=api/v1beta1=true
have some effect when emulation-version=N?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"graduated to GA" does not imply that the Beta API is removed (from N) automatically.
--runtime-config=api/v1beta1=true
would have both api/v1beta1
and api/v1
both available. This is the same as before, so we are not listing it out explicitly.
on-by-default Beta API removed|on-by-default|-|API available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist | ||
Beta API graduated to GA|off-by-default|on|API `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available | ||
Beta API removed|off-by-default|-|API `api/v1beta1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1beta1` does not exist | ||
on-by-default Beta API removed|on-by-default|-|API `api/v1beta1` available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist | ||
API Storage version changed|v1beta1|v1|Resources stored as v1beta1|Resources stored as v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the storage version depend on the compatibility version? If so, should it be mentioned here?
What happens in this case for Story 2, administrator needs to roll back?
- 1pm: emulation=N-1, everything is stored v1beta1
- 2pm: emulation=N, so some items are stored as v1
- 3pm: emulation=N-1, so...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storage version is determined by the minCompatibilityVersion
, and I am adding some clarification in #5006.
In a 2 step upgrade scenario, only the 1st step is considered rollback safe. So if the emulation version is increased, it is no longer rollbacksafe, unless you have set the minCompatibilityVersion
N-1 in all steps in your example.
An alternative would be to keep a exception list of `NonEmulatableFeatures`, and make the server fail to start if any of the `NonEmulatableFeatures` are enabled in compatibility mode. This approach has the benefits of predictable outcomes. The downsides are: | ||
1. whether a feature should be added to the list is subjective. It is hard to define a threshold to decide if the difficulty to preserve the implementation history is above this threshold, we could add the feature to the list. | ||
1. this could break automated upgrade processes using the emulation version, depending what features a cluster opts in. | ||
1. there needs proper cleaning up rules to ensure the list would not just keep growing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Jordan and David did something similar for avoiding beta apis to be enabled by default kubernetes/kubernetes@af99d19
LGTM IIUIC the proposal builds on the API compatibility guarantees we have to avoid complicating the consumers of the APIs in the emulation versions, if the API stored version is the same the emulated version. This allows any code in the in-tree kubernetes codebase that consumes new APIs to be perfectly emulated without having to complicate the code with conditionals based on emulated / binary version and API available. |
61e35f0
to
58938bb
Compare
58938bb
to
4f322b7
Compare
9e935f8
to
cfafd42
Compare
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
cfafd42
to
588c1ec
Compare