Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark KEP-2568 as implementable. #2654

Merged
merged 1 commit into from
May 6, 2021

Conversation

vinayakankugoyal
Copy link
Contributor

No description provided.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 28, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @vinayakankugoyal. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 28, 2021
@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 28, 2021
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @randomvariable @fabriziopandini
do we have more comments on this KEP before we mark it as "implementable"?

/cc @PushkarJ
any more comments from your as a SIG Security representative?

i will review this myself again, tomorrow.

@k8s-ci-robot
Copy link
Contributor

@neolit123: GitHub didn't allow me to request PR reviews from the following users: pushkarj.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @randomvariable @fabriziopandini
do we have more comments on this KEP before we mark it as "implementable".

/cc @PushkarJ
any more comments from your as a SIG Security representative?

i will review this myself again, tomorrow.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 28, 2021
@neolit123
Copy link
Member

/milestone v1.22

@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Apr 28, 2021
@neolit123
Copy link
Member

looks like the PRR file is mandatory as part of CI now?

time="2021-04-28T22:13:28Z" level=info msg="PRR file: >?
/home/prow/go/src/github.com/kubernetes/enhancements/keps/prod-readiness/sig-cluster-lifecycle/2568.yaml"
--- FAIL: TestValidation (0.46s)
panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

so this is something that has to be resolved with the PRR owners - e.g. whether we are going to skip it for some KEPS.

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 29, 2021
@neolit123
Copy link
Member

neolit123 commented Apr 29, 2021

at this point github makes this difficult to review so i will just go trough the file and copy / comment.

  1. Set the seccomp profile to runtime/default, this can be done by setting the seccompProfile field in the securityContext.

going back to this. do you see a risk in existing users having a breakage due to this change and depending on their seccomp settings?
cc @PushkarJ

As a security conscious user I would like to run the kubernetes control-plane as non-root.

i would extend the sentence with "to avoid vulnerabilities."

Create system users and let users override the defaults: Use adduser --system or equivalent to create UIDs in the SYS_UID_MIN - SYS_UID_MAX range and groupadd --system or equivalent to create GIDs in the SYS_GID_MIN - SYS_GID_MAX range. Additionally if users want to specify their own UIDs or GIDs we will support that through kubeadm patching.

given this is the best option that we picked (3), how are we going to handle running kubeadm init/join/upgrade in case the set of kubeadm-* users /groups already exists? i think the best option is to re-use them, but error if they are in unexpected range or have unexpected permissions?

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes?
kubeadm reset tries to remove everything created by the kubeadm binary on the node.

cleanup on "reset" would work for "init/join" but "upgrade" does not have a command that can reverse its state, it's idempotent.
so if we upgrade a node from non-root to root but the process fails the user will be left with a number of users/groups on the node. subsequent upgrades must not fail because of that and just error if the existing names have unexpected IDs or permissions.

i think we might want to have a section in the doc that covers what do we do after a set of users / groups already exists on the node covering how we re-use them and clean them up. you could call this "Reusing user/groups and cleanup" under "Design details". but first you can tell me what do you think about this.

The author(s) believes that starting out with a safe default of option 3. and allowing the user to set the UID and GID through the kubeadm patching mechanism is more user-friendly for users who just wan't to quickly bootstrap and also users who care about which UIDs and GIDs that they want to run the control-plane as and also users . Further this feature will be opt-in and will be hidden behind a feature-gate.

i would add "hidden behind a feature-gate, until it graduates to GA".

it will be feature-gated only until GA. once the FG is dropped it becomes locked to true.
for beta it will be defaulted to true, at that point we might want to consider exposing the API for users to configure the UID/GID via the main kubeadm API. but it depends on the feedback too, we could leave it to the GA graduation or leave this to patches entirely.

Each of the components will run with a unique UID and GID. For each of the components we will create a unique user. For the shared files/resources we will create groups. The naming convention of these groups is tabulated below.

we should indicate that kubeadm would take exclusive ownership of these users/groups names and throw errors if it encounters existing names that are not in the expected ID range or about their permissions.

These tests will be added using the kinder tooling.

we normally add e2e tests during alpha, not sure i've mentioned that on the other PRs.
we can rephrase:
"These tests will be added using the kinder tooling during the Alpha stage."

e2e tests created and running periodically, but may not be completely green.

we can rephrase this to:
"e2e tests are running periodically and are green with respect to testing this functionality."

@randomvariable
Copy link
Member

given this is the best option that we picked (3), how are we going to handle running kubeadm init/join/upgrade in case the set of kubeadm-* users /groups already exists? i think the best option is to re-use them, but error if they are in unexpected range or have unexpected permissions?

Yes, definitely we should reuse them.

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes? kubeadm reset tries to remove everything created by the kubeadm binary on the node.

Agnostic on this one. I don't think it's a deal breaker if the users/groups hang around, which probably answers the upgrade question.

@vinayakankugoyal
Copy link
Contributor Author

vinayakankugoyal commented Apr 29, 2021

at this point github makes this difficult to review so i will just go trough the file and copy / comment.

  1. Set the seccomp profile to runtime/default, this can be done by setting the seccompProfile field in the securityContext.

going back to this. do you see a risk in existing users having a breakage due to this change and depending on their seccomp settings?
cc @PushkarJ

No there isn't. The control-plane has had that setting in kube-up for a long time and we have not seen any issues. I'd be extremely surprised if the control-plane components started failing because of seccomProfile of runtime/default.

@fabriziopandini
Copy link
Member

fabriziopandini commented Apr 29, 2021

I'm supportive to get this implementable in 1.22 as soon as pending comments are addressed.
If this can help, I also don't think it is strictly required to support upgrading existing clusters to non root in the first iteration

@neolit123
Copy link
Member

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes? kubeadm reset tries to remove everything created by the kubeadm binary on the node.

Agnostic on this one. I don't think it's a deal breaker if the users/groups hang around, which probably answers the upgrade question.

as pointed out, reset does clean all artifacts the kubeadm init/join left on the machine. so we might want to add that for consistency.

@vinayakankugoyal
Copy link
Contributor Author

As a security conscious user I would like to run the kubernetes control-plane as non-root.

i would extend the sentence with "to avoid vulnerabilities."

Updated, thanks!

The author(s) believes that starting out with a safe default of option 3. and allowing the user to set the UID and GID through the kubeadm patching mechanism is more user-friendly for users who just wan't to quickly bootstrap and also users who care about which UIDs and GIDs that they want to run the control-plane as and also users . Further this feature will be opt-in and will be hidden behind a feature-gate.

i would add "hidden behind a feature-gate, until it graduates to GA".

it will be feature-gated only until GA. once the FG is dropped it becomes locked to true.
for beta it will be defaulted to true, at that point we might want to consider exposing the API for users to configure the UID/GID via the main kubeadm API. but it depends on the feedback too, we could leave it to the GA graduation or leave this to patches entirely.

Updated, thanks!

Each of the components will run with a unique UID and GID. For each of the components we will create a unique user. For the shared files/resources we will create groups. The naming convention of these groups is tabulated below.

we should indicate that kubeadm would take exclusive ownership of these users/groups names and throw errors if it encounters existing names that are not in the expected ID range or about their permissions.

Updated thanks!

These tests will be added using the kinder tooling.

we normally add e2e tests during alpha, not sure i've mentioned that on the other PRs.
we can rephrase:
"These tests will be added using the kinder tooling during the Alpha stage."

Updated, thanks!

e2e tests created and running periodically, but may not be completely green.

we can rephrase this to:
"e2e tests are running periodically and are green with respect to testing this functionality."

Updated, thanks!

@vinayakankugoyal
Copy link
Contributor Author

Create system users and let users override the defaults: Use adduser --system or equivalent to create UIDs in the SYS_UID_MIN - SYS_UID_MAX range and groupadd --system or equivalent to create GIDs in the SYS_GID_MIN - SYS_GID_MAX range. Additionally if users want to specify their own UIDs or GIDs we will support that through kubeadm patching.

given this is the best option that we picked (3), how are we going to handle running kubeadm init/join/upgrade in case the set of kubeadm-* users /groups already exists? i think the best option is to re-use them, but error if they are in unexpected range or have unexpected permissions?

with respect to cleanup, do we expect kubeadm reset to remove these users . groups. i'd say yes?
kubeadm reset tries to remove everything created by the kubeadm binary on the node.

cleanup on "reset" would work for "init/join" but "upgrade" does not have a command that can reverse its state, it's idempotent.
so if we upgrade a node from non-root to root but the process fails the user will be left with a number of users/groups on the node. subsequent upgrades must not fail because of that and just error if the existing names have unexpected IDs or permissions.

i think we might want to have a section in the doc that covers what do we do after a set of users / groups already exists on the node covering how we re-use them and clean them up. you could call this "Reusing user/groups and cleanup" under "Design details". but first you can tell me what do you think about this.

I added 2 sections: one for reuse and one for cleanup. Let me know how those look. Thanks!

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 29, 2021
@neolit123
Copy link
Member

thank you for the updates. found one minor typo, the rest seems good to me.

once this gets approved and LGTM-ed, we will let it sit for few more days and apply lazy consensus on it - i.e. if nobody else has comments we will merge it, as is.

i'm proposing the lazy consensus to apply after EOD Tuesday next week.

/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, vinayakankugoyal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2021
@vinayakankugoyal
Copy link
Contributor Author

i'm proposing the lazy consensus to apply after EOD Tuesday next week.

Yeah that seems reasonable to me. What time zone will we follow?

@neolit123
Copy link
Member

please squash the commits to 1; you can drop my authorship too.

Yeah that seems reasonable to me. What time zone will we follow?

we usually follow PST, as that is what k8s does.

i've notified the SIG mailing list if more people want to have a look:
https://groups.google.com/g/kubernetes-sig-cluster-lifecycle/c/4Ex7kPaL7tc

@vinayakankugoyal
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2021
@vinayakankugoyal
Copy link
Contributor Author

We are now past the deadline. Anyone with the ability to LGTM please do so, so that we can merge. Thanks in advance!

@vinayakankugoyal
Copy link
Contributor Author

@fabriziopandini - since you are the other approver of the KEP do you want to LGTM?

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 6, 2021
@k8s-ci-robot k8s-ci-robot merged commit 812b69a into kubernetes:master May 6, 2021
@fabriziopandini
Copy link
Member

Sorry, I'm late to this one...

@vinayakankugoyal
Copy link
Contributor Author

Sorry, I'm late to this one...

No worries! Thank you for taking a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants