Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4004: Deprecate the kubeProxyVersion field of v1.Node #4005

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

HirazawaUi
Copy link
Contributor

One-line PR description: This KEP proposes for deprecate the status.nodeInfo.kubeProxyVersion field of v1.Node
Issue link: #4004
Other comments: kubernetes/kubernetes#117756

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 15, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @HirazawaUi!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 15, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @HirazawaUi. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 15, 2023
@danwinship
Copy link
Contributor

/retitle KEP-4004: Deprecate the kubeProxyVersion field of v1.Node

@k8s-ci-robot k8s-ci-robot changed the title Add KEP for Deprecate the kubeProxyVersion field of v1.Node KEP-4004: Deprecate the kubeProxyVersion field of v1.Node May 15, 2023
Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few procedural notes, and some suggestions...

Comment on lines 5 to 6
owning-sig: sig-node
participating-sigs: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
owning-sig: sig-node
participating-sigs: []
owning-sig: sig-network
participating-sigs:
- sig-node

which implies moving the files to keps/sig-network too.

Even though this is about a change to Node, if sig-network is going to be driving this, then we should be the owning-sig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: kubeProxyVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature gates start with a capital letter, and should be more verbose than that. Maybe DisableNodeKubeProxyVersion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisableNodeKubeProxyVersion used to be one of the options I was thinking about, and it would really be more relevant (although I discarded it at the last minute....)


## Motivation

This field is not accurate, the field is set by kubelet, which does not actually know the kube-proxy version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This field is not accurate, the field is set by kubelet, which does not actually know the kube-proxy version
This field is not accurate, the field is set by kubelet, which does not actually know the kube-proxy version,
or even if kube-proxy is running.


### Goals

Deprecate the `status.nodeInfo.kubeProxyVersion` field and then do the hard work of finding all the places using it and getting them off in time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'd phrase that as:

- Mark `status.nodeInfo.kubeProxyVersion` deprecated.
- Find any places use it and get them to stop.
- Make kubelet stop setting the field.

## Proposal

- Deprecate the `status.nodeInfo.kubeProxyVersion` field of v1.Node
- Remove the gcp cloud provider check for this field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah, I feel like this is more "Goals" than "Proposal"...

Also, wherever it is that you mention the GCP cloud provider, you need to be clear that that's just one specific example, and that we may find others.

(Also, maybe link to the PRs I already filed: kubernetes/cloud-provider-gcp#533, kubernetes/kubernetes#117806.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah, I feel like this is more "Goals" than "Proposal"...

Also, wherever it is that you mention the GCP cloud provider, you need to be clear that that's just one specific example, and that we may find others.

(Also, maybe link to the PRs I already filed: kubernetes/cloud-provider-gcp#533, kubernetes/kubernetes#117806.)

I have made some changes, please see if it is more appropriate

- Else:
- `status.nodeInfo.kubeProxyVersion` will be not empty

### Graduation Criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this section filled in.

For Alpha, we should have created the feature gate, disabled by default, and started looking for components that might be using the deprecated field.

For Beta, we want to make sure that we fixed the GCP cloud provider and any other components we found that use the deprecated field, and we need to make sure that upgrades to the Beta version will work across supported levels of version skew. eg, if there are components in 1.28 that depend on kubeProxyVersion being set, and it's allowable for a cluster to have a 1.29 kubelet and a 1.28 version of the component that depends on kubeProxyVersion, then we can't safely go to beta in 1.29.

Comment on lines 139 to 154
### Upgrade / Downgrade Strategy

### Version Skew Strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrade/downgrade is basically "N/A". For skew, see what I said above.


###### How can this feature be enabled / disabled in a live cluster?

- [ ] Feature gate (also fill in values in `kep.yaml`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[X] this, and delete the [ ] Other section

keps/sig-node/4004-deprecate-kube-proxy-version/README.md Outdated Show resolved Hide resolved

###### Are there any tests for feature enablement/disablement?

No, these will be introduced in the Alpha phase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be "Yes", not "No" because you're supposed to include the work being done for Alpha in the answers here.

But I think this should be "No" anyway; "tests for feature enablement/disablement" means "tests where you have a running cluster with the feature disabled and then you enable it", or vice versa. And there's no reason to do that here; we need to make sure no one is looking at the deprecated field ever, so we don't need to test what happens when the value of the field changes, since if anyone was actually looking at it all, then we're already failing.

@bart0sh
Copy link
Contributor

bart0sh commented May 17, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 17, 2023
@HirazawaUi HirazawaUi force-pushed the master branch 3 times, most recently from 069f5dc to 8975c42 Compare May 17, 2023 16:38
@HirazawaUi
Copy link
Contributor Author

@danwinship I made some changes to the suggestion, can I bother you to continue reviewing this PR?


## Proposal

- The proposal is to deprecate the kubeProxyVersion field on NodeStatus, which would involve modifying the Kubernetes API and would affect GCP cloud providers (e.g. https://github.com/kubernetes/cloud-provider-gcp/pull/533, https://github.com/kubernetes/kubernetes/pull/117806) or other components, so we will use a feature gates to protect it until we fix the GCP cloud provider and any other components we find using the deprecated fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The proposal is to deprecate the kubeProxyVersion field on NodeStatus, which would involve modifying the Kubernetes API and would affect GCP cloud providers (e.g. https://github.com/kubernetes/cloud-provider-gcp/pull/533, https://github.com/kubernetes/kubernetes/pull/117806) or other components, so we will use a feature gates to protect it until we fix the GCP cloud provider and any other components we find using the deprecated fields
- The proposal is to deprecate the kubeProxyVersion field on NodeStatus, which would involve modifying the Kubernetes API and would affect GCP cloud providers (e.g. https://github.com/kubernetes/cloud-provider-gcp/pull/533, https://github.com/kubernetes/kubernetes/pull/117806) or other components, so we will use a feature gates to protect it until we fix the GCP cloud provider and any other components we find using the deprecated fields

can we be more explicit? what do we mean by "modifying the Kubernetes API "
I thought that we were going to stop setting the field only and set as "deprecated" on the generated API docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be more explicit? what do we mean by "modifying the Kubernetes API " I thought that we were going to stop setting the field only and set as "deprecated" on the generated API docs

Sorry, setting the field to deprecated shouldn't count as an API change, I'll change the wording

@pacoxu
Copy link
Member

pacoxu commented May 24, 2023

/sig network

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label May 24, 2023

```
// Deprecated: Kubelet Version reported by the node.
KubeletVersion string `json:"kubeletVersion" protobuf:"bytes,7,opt,name=kubeletVersion"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this say Kubelet ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, that was a stupid mistake

- "@danwinship"
- "@thockin"
approvers:
- "@sig-node-leads"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship I think this is you, yeah?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


1. Alpha: v1.28, Created the feature gate, disabled by default, and started looking for components that might be using the deprecated field.
2. Beta: v1.29, Make sure that we fixed the GCP cloud provider and any other components we found that use the deprecated field, and we need to make sure that upgrades to the Beta version will work across supported levels of [version skew](https://kubernetes.io/releases/version-skew-policy/).
3. GA: v1.30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be separate sections; see the example in the template README

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be separate sections; see the example in the template README

all right

1. Alpha: v1.28, Created the feature gate, disabled by default, and started looking for components that might be using the deprecated field.
2. Beta: v1.29, Make sure that we fixed the GCP cloud provider and any other components we found that use the deprecated field, and we need to make sure that upgrades to the Beta version will work across supported levels of [version skew](https://kubernetes.io/releases/version-skew-policy/).
3. GA: v1.30
keps/sig-node/4004-deprecate-kube-proxy-version/kep.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line got accidentally pasted here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes...


###### Are there any tests for feature enablement/disablement?

No, we don't need to test what happens when the value of the field changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People will push back against that claim; you need to explain why. We don't need to test what happens when the value changes, because our goal is to make sure no one is looking at the value at all, so no components should notice if the value changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People will push back against that claim; you need to explain why. We don't need to test what happens when the value changes, because our goal is to make sure no one is looking at the value at all, so no components should notice if the value changes.

thanks for reminder

- "@danwinship"
- "@thockin"
approvers:
- "@sig-node-leads"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@HirazawaUi HirazawaUi force-pushed the master branch 2 times, most recently from 47553ed to d1a9416 Compare May 31, 2023 16:04
Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's get this in for 1.29...


## Proposal

- The proposal is to deprecate the kubeProxyVersion field on NodeStatus, which would affect GCP cloud providers (e.g. https://github.com/kubernetes/cloud-provider-gcp/pull/533, https://github.com/kubernetes/kubernetes/pull/117806) or other components, so we will use a feature gates to protect it until we fix the GCP cloud provider and any other components we find using the deprecated fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, both of those PRs have merged; the cloud-provider-gcp fix will be in 1.29. So...

Suggested change
- The proposal is to deprecate the kubeProxyVersion field on NodeStatus, which would affect GCP cloud providers (e.g. https://github.com/kubernetes/cloud-provider-gcp/pull/533, https://github.com/kubernetes/kubernetes/pull/117806) or other components, so we will use a feature gates to protect it until we fix the GCP cloud provider and any other components we find using the deprecated fields
The proposal is to deprecate the `kubeProxyVersion` field of `NodeStatus`, and to stop setting it in the future.
This field was used by the GCP cloud provider up until 1.28 for the legacy built-in cloud provider ([kubernetes #117806], and up until 1.29 for the external cloud-provider ([cloud-provider-gcp #533]). It may also be used by other components. Thus, we will use a feature gate to protect it until all components are fixed.
[kubernetes #117806]: https://github.com/kubernetes/kubernetes/pull/117806
[cloud-provider-gcp #533]: https://github.com/kubernetes/cloud-provider-gcp/pull/533

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, those versions are off by one; the bug was fixed in cloud-provider-gcp as of 1.29, so that means it used the field up until 1.28. And likewise 1.27 for the legacy provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right

Comment on lines 122 to 129
##### Integration tests


- N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##### Integration tests
- N/A
##### Integration tests
- N/A


#### Alpha

- [ ] Created the feature gate, disabled by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we don't normally bother with checkboxes here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


#### Beta

- [ ] Make sure that we fixed the GCP cloud provider and any other components we found that use the deprecated field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCP cloud provider is already done at this point so that doesn't need to be mentioned here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

### Rollout, Upgrade and Rollback Planning

###### How can a rollout or rollback fail? Can it impact already running workloads?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a component that we didn't know about using kubeProxyVersion, then it may fail in some unknown way when the administrator upgrades to a version with that field disabled. It should be possible to just roll back in this case.


###### What specific metrics should inform a rollback?

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answer these questions with "N/A" or "No", as appropriate; when they're empty it looks like you just haven't gotten around to filling in this section of the KEP yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

### Monitoring Requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just put "N/A" here and delete the sub-sections of this section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.28"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1.29 now

Comment on lines 28 to 31
milestone:
alpha: "v1.28"
beta: "v1.29"
stable: "v1.30"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to do alpha in 1.29, beta in 1.31 (3 releases after 1.28, to avoid version skew issues with cloud-provider-gcp), and stable in 1.33 (your graduation criteria say "no issues reported in 2 releases")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@HirazawaUi HirazawaUi force-pushed the master branch 3 times, most recently from 248d582 to a668f4b Compare August 29, 2023 16:30

###### What specific metrics should inform a rollback?

No
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
No
N/A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

No
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
No
N/A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@danwinship
Copy link
Contributor

oh, sorry, just noticed that this is still in keps/sig-node/ rather than keps/sig-network/. You need to move it to sig-network so it has the right approvers.

@HirazawaUi
Copy link
Contributor Author

oh, sorry, just noticed that this is still in keps/sig-node/ rather than keps/sig-network/. You need to move it to sig-network so it has the right approvers.

Okay, it's moved to sig-network.

@bart0sh
Copy link
Contributor

bart0sh commented Sep 2, 2023

/remove-sig node

@k8s-ci-robot k8s-ci-robot removed the sig/node Categorizes an issue or PR as relevant to SIG Node. label Sep 2, 2023
@danwinship
Copy link
Contributor

/lgtm
/assign @shaneutt
for approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 4, 2023
@shaneutt
Copy link
Member

shaneutt commented Sep 5, 2023

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, HirazawaUi, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2023
@k8s-ci-robot k8s-ci-robot merged commit f6871ab into kubernetes:master Sep 5, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Sep 5, 2023
@HirazawaUi
Copy link
Contributor Author

@danwinship After the PR merger, may I start working for it now?

@danwinship
Copy link
Contributor

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

8 participants