Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skip permission change kep #1478

Merged

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Jan 21, 2020

Design proposal for #695

/sig storage
/assign @liggitt @msau42 @jingxu97

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jan 21, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 21, 2020
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 21, 2020
@gnufied gnufied force-pushed the add-skip-permission-change-kep branch from ad8aaa2 to edf2842 Compare January 21, 2020 17:30
@gnufied gnufied force-pushed the add-skip-permission-change-kep branch from edf2842 to 6036dbd Compare January 21, 2020 17:33
keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved

### Risks and Mitigations

- One of the risks is if user volume's permission was previously changed using old algorithm(which changes permission of top level directory first) and user opts in for `OnDemand` `VolumePermissionChangePolicy` then we can't distinguish if the volume was previously only partially recursively chown'd.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User would still have to opt-in to change to the non-default behavior though, so hopefully they have verified that all the permissions are correct.

keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved
keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved
@msau42
Copy link
Member

msau42 commented Jan 22, 2020

/assign @tallclair

@msau42
Copy link
Member

msau42 commented Jan 22, 2020

@kubernetes/sig-storage-proposals

@k8s-ci-robot k8s-ci-robot added the kind/design Categorizes issue or PR as related to design. label Jan 22, 2020
@msau42
Copy link
Member

msau42 commented Jan 22, 2020

@kubernetes/sig-auth-proposals

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Jan 22, 2020
@gnufied gnufied force-pushed the add-skip-permission-change-kep branch from 49ce074 to faf63eb Compare January 23, 2020 20:59
@gnufied
Copy link
Member Author

gnufied commented Jan 24, 2020

/unassign @liggitt
/assign @thockin

@k8s-ci-robot k8s-ci-robot assigned thockin and unassigned liggitt Jan 24, 2020
@gnufied
Copy link
Member Author

gnufied commented Jan 24, 2020

/label api/review

@k8s-ci-robot
Copy link
Contributor

@gnufied: The label(s) /label api/review cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/label api/review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gnufied
Copy link
Member Author

gnufied commented Jan 24, 2020

/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Jan 24, 2020
keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved
keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved
When creating a pod, we propose that `pod.Spec.SecurityContext` field expanded to include a new field called `FSGroupPermissionChangePolicy` which can have following possible values:

- `Always` --> Always change the permissions and ownership to match fsGroup. This is the current behavior and it will be the default one when this proposal is implemented. Algorithm that performs permission change however will be changed to perform permission change of top level directory last.
- `OnDemand` --> Only perform permission and ownership change if permissions of top level directory does not match with expected permissions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "top level directory" is the directory that gets bind-mounted, right? So its metadata is not actually visible from within the pod? (that's a good thing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general yeah.

When creating a pod, we propose that `pod.Spec.SecurityContext` field expanded to include a new field called `FSGroupPermissionChangePolicy` which can have following possible values:

- `Always` --> Always change the permissions and ownership to match fsGroup. This is the current behavior and it will be the default one when this proposal is implemented. Algorithm that performs permission change however will be changed to perform permission change of top level directory last.
- `OnDemand` --> Only perform permission and ownership change if permissions of top level directory does not match with expected permissions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with atomic writer volumes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For atomic writer volumes - the permission change is performed at the end. The only difference is, ReadOnly mask gets applied for emptydir/secrets. So permission bits is - roMask = os.FileMode(0440).

So what will happen is - if root of secret dir(for example) has the correct permissions/ownership already, the permission change will be skipped. IMO - we should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic volumes can have new files added. Will the top level directory still have the old permissions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The permission on the top level directory may actually change on remount call that adds new files to the secret directories - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/emptydir/empty_dir.go#L352

One way to look at this is - OnRootDirPermissionMismatch setting may not work for ephemeral volumes, but technically it is still doing its job. Another thing we could do is to fix it. If fsGroup is available in pod's spec then volume's root directory permission need not be 777. But this is probably a longer discussion . For alpha feature - we could document this as a limitation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related issue - kubernetes/kubernetes#76158

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic volumes are max 2MB. I don't think they're going to have the same performance issue that is motivating this change.

I think we actually do want to set permissions every time for atomic volumes since you can have new files added during the remount.

type PodFSGroupPermissionChangePolicy string

const(
OnDemandChangeVolumePermission PodFSGroupPermissionChangePolicy = "OnDemand"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) I'd prefer to have a more specific name to OnDemand which conveys that it is based off the top-level directory (in case we want to introduce a different heuristic later). I don't have a good suggestion though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"optimistic"? We optimistically assume that if a directory has the right owner, all its content has it too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed this to OnRootDirPermissionMismatch . It is bit mouthful but hopefully we should be fine.

keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved

### Risks and Mitigations

- One of the risks is if user volume's permission was previously changed using old algorithm(which changes permission of top level directory first) and user opts in for `OnDemand` `FSGroupPermissionChangePolicy` then we can't distinguish if the volume was previously only partially recursively chown'd.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a mitigation? How about remediation? Do they just need to redeploy with Always?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a corner case. One easy mitigation is obviously to redeploy with Always policy but we are also considering failing pod start when permission change fails. For CSI drivers that is already the case (ie if SetVolumeOwnership call failed, pod can't start).

keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly this seems fine. Sad that we need yet more knobs.

keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved
keps/sig-storage/20200120-skip-permission-change.md Outdated Show resolved Hide resolved
@tallclair
Copy link
Member

/assign @jingxu97

@msau42
Copy link
Member

msau42 commented Jan 27, 2020

This mostly lgtm. Just clarify the atomic volumes behavior. We can nitpick on naming during the API review :-)

@gnufied
Copy link
Member Author

gnufied commented Jan 27, 2020

@msau42 added a note about ephemeral volume types.

Comment on lines +103 to +104
// It will have no effect on ephemeral volume types such as: secret, configmaps
// and emptydir.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How is it going to interact with ephemeral CSI volumes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the heuristics defined for detecting fsGroup support for CSI volumes - detects that changing volume ownership is supported , then proposed policies will work for ephemeral CSI volumes. Also these policies have any effect only when a fsGroup is present in pod's spec and hence we should be fine.

@jsafrane
Copy link
Member

I have one small question above, still the KEP is in a good design. The implementation details (API naming) belong to implementation phase.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2020
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Currently before a volume is bind-mounted inside a container the permissions on
that volume are changed recursively to the provided fsGroup value. This change
in ownership can take an excessively long time to complete, especially for very
large volumes (>=1TB) as well as a few other reasons detailed in [Motivation](#motivation).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is the issue large volumes or the number of files/directories in the volume? Let's be very clear about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this in a follow up

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, jsafrane, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2020
@k8s-ci-robot k8s-ci-robot merged commit 97cd79b into kubernetes:master Jan 28, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants