-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sig-node/1029-ephemeral-storage-quotas/*: #4699
Conversation
|
Welcome @PannagaRao! |
Hi @PannagaRao. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
per our slack convo /cc @soltysh for PRR review |
@haircommander: GitHub didn't allow me to request PR reviews from the following users: for, PRR, review. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
29c8007
to
fc6e21e
Compare
@@ -576,6 +579,59 @@ required elsewhere: | |||
future allow adding additional data without having to change code | |||
other than that which uses the new information. | |||
|
|||
#### Implementation details of using User Namespace to Toggle Ephemeral-Storage-Quotas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to a reviewer: the approach described here means the kubelet may enforce du
quotas and xfs
quotas simultaneously, if pods without a user namespace have a limit
disk usage monitoring method. | ||
|
||
* During container creation, there must be a check to verify | ||
if xfs-quota in Kubelet is enabled. Additionally, verify if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what we mean by xfs-quota in Kubelet check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to check whether or not LocalStorageCapacityIsolationFSQuotaMonitoring flag is enabled. Modified the description to add the same.
@@ -576,6 +579,71 @@ required elsewhere: | |||
future allow adding additional data without having to change code | |||
other than that which uses the new information. | |||
|
|||
#### Implementation details of using User Namespace to Toggle Ephemeral-Storage-Quotas | |||
|
|||
* The implementation centers around a feature toggle which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we describe in more detail why we need to enable user namespaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the description. Please let me know if it sounds good
|
||
* The SIG raised the possibility of a container being unable to exit | ||
should we enforce quotas, and the quota interferes with writing the | ||
log. This can be mitigated by either not applying a quota to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs are written by a container monitor process to a directory which should't be under a quota so we need to clarify here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this piece was just copy pasted from below. we can iterate here but just noting this isn't new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforcement of quota on log directory is the existing behavior
https://discuss.kubernetes.io/t/why-stdout-logs-are-accounted-in-ephemeral-storage-usage/27815
controls the integration of xfs-quota with user namespaces for | ||
managing filesystem quotas within containers. The toggle allows | ||
to switch between a quota management system and a traditional | ||
disk usage monitoring method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe where the toggle is and the entity responsible for setting/unsetting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed, but I'm missing an explanation how a user can be notified if the functionality is not working for them, ideally that will be part of ###### How can someone using this feature know that it is working for their instance?
question that's missing. I'm mostly interested in how you report back to a user that the requested quota won't work, due to unsupported filesystem, or others like that.
@@ -24,21 +24,23 @@ | |||
- [Future](#future) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're re-doing the beta PRR review, I'd update https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/sig-node/1029.yaml#L5 as well with my name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template has changed significantly between then and now can you update to match https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md It looks like some of the current headers will need to be changed, some examples are:
### Implementation Details/Notes/Constraints [optional]
->### Notes/Constraints/Caveats (Optional)
- the section
## Design Details
seems empty, I'd probably move the bits from under#### Notes on Implementation
which has all those details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
|
||
## Design Details | ||
|
||
### Test Plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth updating the unit test coverage, they are 2 years old.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why we no integration tests are planned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubelet doesn't really define many integration tests, going for unit and then node e2e instead
@@ -767,7 +801,7 @@ and are not reported here. | |||
| du after umount/mount | 8.1 | 10.2 | 3.9 | 3.7 | | |||
| Remove Files | 4.3 | 4.1 | 4.2 | 4.3 | | |||
|
|||
### System CPU Time | |||
#### System CPU Time | |||
|
|||
| *Operation* | *XFS+Quota* | *XFS* | *Ext4fs+Quota* | *Ext4fs* | | |||
|--------------------------|-----------|-------|--------------|--------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the PRR section you're missing these 2:
### Upgrade / Downgrade Strategy
### Version Skew Strategy
Also in the ### Monitoring Requirements
section can you update the questions to match the new template? Mostly it's just changing from * **<question...
to ##### <question...
and you're missing the following question:
###### How can someone using this feature know that it is working for their instance?
Same problem with ### Scalability
section, and missing question:
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
Missing answer for ###### How does this feature react if the API server and/or etcd is unavailable?
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more nits.
XFS quotas within user namespaces will not be supported | ||
|
||
|
||
### Performance Benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this entire section seems best suited in the testing section, I'd just move all of that after ##### e2e tests
.
@@ -16,8 +16,8 @@ editor: TBD | |||
creation-date: 2018-09-06 | |||
last-updated: 2023-08-31 | |||
status: implementable | |||
latest-milestone: "1.29" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document also needs update with the latest template https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/kep.yaml it's way out of date 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -840,65 +890,78 @@ If LocalStorageCapacityIsolationFSQuotaMonitoring is turned on but LocalStorageC | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In question ###### How can this feature be enabled / disabled in a live cluster?
you only mentioned kubelet with LocalStorageCapacityIsolationFSQuotaMonitoring
flag. But earlier in the #### Kubelet and API Server Skew:
section you're talking about both kubelet and apiserver. So I'd expect this question to explicitly point out both feature gates. The same applies to the kep.yaml
, I've pointed out it's out of date, and there's a separate section describing feature gates and components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ###### How can a rollout or rollback fail? Can it impact already running workloads?
you're saying that it won't affect already running workloads. What about the ones which enabled this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
you probably want to just say No
, since you're not removing any fields, features, flags, or anything like that.
|
||
###### How can someone using this feature know that it is working for their instance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, this question should be a bit higher in the doc, right after ###### How can an operator determine if the feature is in use by workloads?
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
- Yes, the feature depneds on project quotas. Once quotas are enabled, the xfs_quota tool can be used to | ||
set limits and report on disk usage. | ||
Yes, the feature depneds on project quotas. Once quotas are enabled, the xfs_quota tool can be used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the feature depneds on project quotas. Once quotas are enabled, the xfs_quota tool can be used to | |
Yes, the feature depends on project quotas. Once quotas are enabled, the xfs_quota tool can be used to |
beta: "1.31" | ||
|
||
feature-gates: | ||
- name: LocalStorageCapacityIsolationFSQuotaMonitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh : @PannagaRao and I chatted about this and it feels more right just to include this one as it's net new for this feature. above in the enablement section we do mention UserNamespacesSupport as it's required, but I think it's more correct to just have this feature gate here. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, let's keep it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
for PRR
/lgtm |
Modifications to push the changes to beta 1.31 Signed-off-by: PannagaRamamanohara <pbhojara@redhat.com>
9af70d2
to
2e6b66b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp, PannagaRao, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Modifications to push the changes to beta 1.31
KEP-1029: adding beta graduation criteria