-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skip permission change kep #1478
Changes from 6 commits
1b66072
6036dbd
faf63eb
ac5efed
ea07da2
68751b9
0e1e6c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
--- | ||
title: Skip Volume Ownership Change | ||
authors: | ||
- "@gnuified" | ||
owning-sig: sig-storage | ||
participating-sigs: | ||
- sig-auth | ||
reviewers: | ||
- "@msau42" | ||
- "@liggit" | ||
- "@tallclair" | ||
approvers: | ||
- "@saad-ali" | ||
editor: TBD | ||
creation-date: 2020-01-20 | ||
last-updated: 2020-01-20 | ||
status: implementable | ||
see-also: | ||
replaces: | ||
superseded-by: | ||
|
||
--- | ||
|
||
# Skip Volume Ownership Change | ||
|
||
## Table of Contents | ||
|
||
<!-- toc --> | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Proposal](#proposal) | ||
- [Implementation Details/Notes/Constraints [optional]](#implementation-detailsnotesconstraints-optional) | ||
- [Risks and Mitigations](#risks-and-mitigations) | ||
- [Graduation Criteria](#graduation-criteria) | ||
- [Test Plan](#test-plan) | ||
- [Monitoring](#monitoring) | ||
- [Implementation History](#implementation-history) | ||
- [Drawbacks [optional]](#drawbacks-optional) | ||
- [Alternatives [optional]](#alternatives-optional) | ||
- [Infrastructure Needed [optional]](#infrastructure-needed-optional) | ||
<!-- /toc --> | ||
|
||
## Summary | ||
|
||
Currently before a volume is bind-mounted inside a container the permissions on | ||
that volume are changed recursively to the provided fsGroup value. This change | ||
in ownership can take an excessively long time to complete, especially for very | ||
large volumes (>=1TB) as well as a few other reasons detailed in [Motivation](#motivation). | ||
To solve this issue we will add a new field called `pod.Spec.Volumes[].FSGroupPermissionChangePolicy` and | ||
allow the user to specify whether they want the permission and ownership change to occur for that volume. | ||
|
||
## Motivation | ||
|
||
When a volume is mounted on the node, we recursively change permissions of volume | ||
before bind mounting the volume inside container. The reason of doing this is to ensure | ||
that volumes are readable/writable by provided fsGroup. | ||
|
||
But this presents following problems: | ||
- An application(many popular databases) which is sensitive to permission bits changing | ||
underneath may refuse to start whenever volume being used inside pod gets mounted on | ||
different node. | ||
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- If volume has a large number of files, performing recursive `chown` and `chmod` | ||
could be slow and could cause timeout while starting the pod. | ||
|
||
### Goals | ||
|
||
- Allow volume ownership and permission change to be skipped during mount | ||
|
||
### Non-Goals | ||
|
||
- In some cases if user brings in a large enough volume from outside, the first time ownership and permission change still could take lot of time. | ||
- On SELinux enabled distributions we will still do recursive chcon whenever applicable and handling that is outside the scope. | ||
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- This proposal does not attempt to fix two pods using same volume with conflicting fsgroup. `FSGroupPermissionChangePolicy` also will be only applicable to volume types which support setting fsgroup. | ||
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Proposal | ||
|
||
We propose that an user can optionally opt-in to skip recursive ownership(and permission) change on a volume if volume already has right permissions. | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
Currently Volume permission and ownership change is done using a breadth-first algorithm starting from root directory of the volume. As part of this proposal we will change the algorithm to modify permissions/ownership of the root directory after all directories/files underneath have their permissions changed. | ||
|
||
When creating a pod, we propose that `pod.Spec.Volumes[]` field expanded to include a new field called `FSGroupPermissionChangePolicy` which can have following possible values: | ||
|
||
- `Always` --> Always change the permissions and ownership to match fsGroup. This is the current behavior and it will be the default one when this proposal is implemented. Algorithm that performs permission change however will be changed to perform permission change of top level directory last. | ||
- `OnRootDirPermissionMismatch` --> Only perform permission and ownership change if permissions of top level directory does not match with expected permissions and ownership. | ||
saad-ali marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```go | ||
type PodFSGroupPermissionChangePolicy string | ||
|
||
const( | ||
ChangePermissionOnRootDirmismatch PodFSGroupPermissionChangePolicy = "OnRootDirPermissionMismatch" | ||
AlwaysChangeVolumePermission PodFSGroupPermissionChangePolicy = "Always" | ||
) | ||
|
||
// Volume represents a named volume in a pod that may be accessed by any containers in the pod. | ||
type Volume struct { | ||
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// Required: This must be a DNS_LABEL. Each volume in a pod must have | ||
// a unique name. | ||
Name string | ||
// The VolumeSource represents the location and type of a volume to mount. | ||
// This is optional for now. If not specified, the Volume is implied to be an EmptyDir. | ||
// This implied behavior is deprecated and will be removed in a future version. | ||
// +optional | ||
VolumeSource | ||
// FSGroupPermissionChangePolicy ← new field | ||
// Defines behavior of changing ownership and permission of the volume | ||
// before being exposed inside Pod. This field will only apply to | ||
// volume types which support fsGroup based ownership(and permissions). | ||
// It will have no effect on ephemeral volume types such as: secret, configmaps | ||
// and emptydir. | ||
Comment on lines
+103
to
+104
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: How is it going to interact with ephemeral CSI volumes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the heuristics defined for detecting fsGroup support for CSI volumes - detects that changing volume ownership is supported , then proposed policies will work for ephemeral CSI volumes. Also these policies have any effect only when a |
||
// + optional | ||
FSGroupPermissionChangePolicy *PodFSGroupPermissionChangePolicy | ||
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
``` | ||
|
||
### Risks and Mitigations | ||
|
||
- One of the risks is if user volume's permission was previously changed using old algorithm(which changes permission of top level directory first) and user opts in for `ChangePermissionOnRootDirmismatch` `FSGroupPermissionChangePolicy` then we can't distinguish if the volume was previously only partially recursively chown'd. | ||
|
||
|
||
## Graduation Criteria | ||
|
||
* Alpha in 1.18 provided all tests are passing and gated by the feature Gate | ||
`ConfigurableFSGroupPermissions` and set to a default of `False` | ||
|
||
* Beta in 1.19 with design validated by at least two customer deployments | ||
(non-production), with discussions in SIG-Storage regarding success of | ||
deployments. A metric will be added to report time taken to perform a | ||
volume ownership change. Also e2e tests that verify volume permissions with various `FSGroupPermissionChangePolicy`. | ||
* GA in 1.20, with Node E2E tests in place tagged with feature Storage | ||
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
[umbrella issues]: https://github.com/kubernetes/kubernetes/issues/69699 | ||
|
||
### Test Plan | ||
|
||
A test plan will consist of the following tests | ||
|
||
* Basic tests including a permutation of the following values | ||
- pod.Spec.Volumes[].FSGroupPermissionChangePolicy (ChangePermissionOnRootDirmismatch/Always) | ||
- PersistentVolumeClaimStatus.FSGroup (matching, non-matching) | ||
- Volume Filesystem existing permissions (none, matching, non-matching, partial-matching?) | ||
* E2E tests | ||
|
||
|
||
### Monitoring | ||
|
||
We will add a metric that measures the volume ownership change times. | ||
|
||
## Implementation History | ||
|
||
- 2020-01-20 Initial KEP pull request submitted | ||
|
||
## Drawbacks [optional] | ||
|
||
|
||
## Alternatives [optional] | ||
|
||
gnufied marked this conversation as resolved.
Show resolved
Hide resolved
|
||
We considered various alternatives before proposing changes mentioned in this proposal. | ||
- We considered using a shiftfs(https://github.com/linuxkit/linuxkit/tree/master/projects/shiftfs) like solution for mounting volumes inside containers without changing permissions on the host. But any such solution is technically not feasible until support in Linux kernel improves. | ||
- We also considered redesigning volume permission API to better support different volume types and different operating systems because `fsGroup` is somewhat Linux specific. But any such redesign has to be backward compatible and given lack of clarity about how the new API should look like, we aren't quite ready to do that yet. | ||
|
||
## Infrastructure Needed [optional] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is the issue large volumes or the number of files/directories in the volume? Let's be very clear about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix this in a follow up