-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skip permission change kep #1478
Add skip permission change kep #1478
Conversation
ad8aaa2
to
edf2842
Compare
edf2842
to
6036dbd
Compare
|
||
### Risks and Mitigations | ||
|
||
- One of the risks is if user volume's permission was previously changed using old algorithm(which changes permission of top level directory first) and user opts in for `OnDemand` `VolumePermissionChangePolicy` then we can't distinguish if the volume was previously only partially recursively chown'd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User would still have to opt-in to change to the non-default behavior though, so hopefully they have verified that all the permissions are correct.
/assign @tallclair |
@kubernetes/sig-storage-proposals |
@kubernetes/sig-auth-proposals |
49ce074
to
faf63eb
Compare
/label api/review |
@gnufied: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/label api-review |
When creating a pod, we propose that `pod.Spec.SecurityContext` field expanded to include a new field called `FSGroupPermissionChangePolicy` which can have following possible values: | ||
|
||
- `Always` --> Always change the permissions and ownership to match fsGroup. This is the current behavior and it will be the default one when this proposal is implemented. Algorithm that performs permission change however will be changed to perform permission change of top level directory last. | ||
- `OnDemand` --> Only perform permission and ownership change if permissions of top level directory does not match with expected permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "top level directory" is the directory that gets bind-mounted, right? So its metadata is not actually visible from within the pod? (that's a good thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general yeah.
When creating a pod, we propose that `pod.Spec.SecurityContext` field expanded to include a new field called `FSGroupPermissionChangePolicy` which can have following possible values: | ||
|
||
- `Always` --> Always change the permissions and ownership to match fsGroup. This is the current behavior and it will be the default one when this proposal is implemented. Algorithm that performs permission change however will be changed to perform permission change of top level directory last. | ||
- `OnDemand` --> Only perform permission and ownership change if permissions of top level directory does not match with expected permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with atomic writer volumes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For atomic writer volumes - the permission change is performed at the end. The only difference is, ReadOnly
mask gets applied for emptydir/secrets. So permission bits is - roMask = os.FileMode(0440)
.
So what will happen is - if root of secret dir(for example) has the correct permissions/ownership already, the permission change will be skipped. IMO - we should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic volumes can have new files added. Will the top level directory still have the old permissions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The permission on the top level directory may actually change on remount call that adds new files to the secret directories - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/emptydir/empty_dir.go#L352
One way to look at this is - OnRootDirPermissionMismatch
setting may not work for ephemeral volumes, but technically it is still doing its job. Another thing we could do is to fix it. If fsGroup
is available in pod's spec then volume's root directory permission need not be 777. But this is probably a longer discussion . For alpha feature - we could document this as a limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related issue - kubernetes/kubernetes#76158
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic volumes are max 2MB. I don't think they're going to have the same performance issue that is motivating this change.
I think we actually do want to set permissions every time for atomic volumes since you can have new files added during the remount.
type PodFSGroupPermissionChangePolicy string | ||
|
||
const( | ||
OnDemandChangeVolumePermission PodFSGroupPermissionChangePolicy = "OnDemand" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) I'd prefer to have a more specific name to OnDemand which conveys that it is based off the top-level directory (in case we want to introduce a different heuristic later). I don't have a good suggestion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"optimistic"? We optimistically assume that if a directory has the right owner, all its content has it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed this to OnRootDirPermissionMismatch
. It is bit mouthful but hopefully we should be fine.
|
||
### Risks and Mitigations | ||
|
||
- One of the risks is if user volume's permission was previously changed using old algorithm(which changes permission of top level directory first) and user opts in for `OnDemand` `FSGroupPermissionChangePolicy` then we can't distinguish if the volume was previously only partially recursively chown'd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a mitigation? How about remediation? Do they just need to redeploy with Always
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a corner case. One easy mitigation is obviously to redeploy with Always
policy but we are also considering failing pod start when permission change fails. For CSI drivers that is already the case (ie if SetVolumeOwnership
call failed, pod can't start).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly this seems fine. Sad that we need yet more knobs.
/assign @jingxu97 |
This mostly lgtm. Just clarify the atomic volumes behavior. We can nitpick on naming during the API review :-) |
@msau42 added a note about ephemeral volume types. |
// It will have no effect on ephemeral volume types such as: secret, configmaps | ||
// and emptydir. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How is it going to interact with ephemeral CSI volumes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the heuristics defined for detecting fsGroup support for CSI volumes - detects that changing volume ownership is supported , then proposed policies will work for ephemeral CSI volumes. Also these policies have any effect only when a fsGroup
is present in pod's spec and hence we should be fine.
I have one small question above, still the KEP is in a good design. The implementation details (API naming) belong to implementation phase. /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Currently before a volume is bind-mounted inside a container the permissions on | ||
that volume are changed recursively to the provided fsGroup value. This change | ||
in ownership can take an excessively long time to complete, especially for very | ||
large volumes (>=1TB) as well as a few other reasons detailed in [Motivation](#motivation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is the issue large volumes or the number of files/directories in the volume? Let's be very clear about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix this in a follow up
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnufied, jsafrane, saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Design proposal for #695
/sig storage
/assign @liggitt @msau42 @jingxu97