Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding docs for node allocatable #2649

Merged
merged 3 commits into from
Mar 15, 2017

Conversation

vishh
Copy link
Contributor

@vishh vishh commented Mar 1, 2017

Signed-off-by: Vishnu kannan <vishnuk@google.com>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 1, 2017
@vishh vishh added this to the 1.6 milestone Mar 1, 2017
@chenopis chenopis changed the base branch from master to release-1.6 March 2, 2017 17:50
@chenopis chenopis requested a review from derekwaynecarr March 3, 2017 08:23

### Kube Reserved

**Kubelet Flag**: `--kube-reserved=[cpu=100mi][,][memory=100Mi]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100m,100Mi

### Kube Reserved

**Kubelet Flag**: `--kube-reserved=[cpu=100mi][,][memory=100Mi]`
**Kubelet Flag**: `--kube-reserved-cgroup=`/runtime.slice`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the name you are using in your images?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make clear that /runtime.slice is not the kubelet default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the defaults. I hope I have made the defaults clear this time around. PTAL

[This performance dashboard](http://node-perf-dash.k8s.io/#/builds) exposes `cpu` and `memory` usage profiles of `kubelet` and `docker engine` at multiple levels of pod density.
[This blog post](http://blog.kubernetes.io/2016/11/visualize-kubelet-performance-with-node-dashboard.html) explains how the dashboard can be interpreted to come up with a suitable `kube-reserved` reservation.

It is recommended that the kubernetes system daemons are placed under a top level control group (`system.slice` on systemd machines for example).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this text should be in system reserved section.

you should have text specific to kube daemons here..

### System Reserved

**Kubelet Flag**: `--system-reserved=[cpu=100mi][,][memory=100Mi]`
**Kubelet Flag**: `--system-reserved-cgroup=`/system.slice`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make clear this flag has no default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make clear that the kubelet doesnt create either of these two cgroups.

Evictions are supported for `memory` and `storage` only.
By reserving some memory via `--eviction-hard` flag, the `kubelet` attempts to `evict` pods whenever memory availability on the node drops below the reserved value.
Hypothetically, if system daemons did not exist on a node, pods cannot use more than `capacity - eviction-hard`.
For this reason, resources reserved for evictions will not be available for pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to schedule against?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduling is meant to be implicit since pods can be placed directly on nodes bypassing the scheduler


**Kubelet Flag**: `--enforce-node-allocatable=[pods][,][system-reserved][,][kube-reserved]`

The scheduler will treat `Allocatable` as the available `capacity` for pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would remove the use of will style phrasing in the document as we are describing the present in this doc.

The scheduler treats 'Allocatable'...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments throughout.

The scheduler will treat `Allocatable` as the available `capacity` for pods.

`kubelet` will enforce `Allocatable` across pods by default.
This enforcement is controlled by specifying `pods` value to the kubelet flag `--enforce-node-allocatable`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this is the default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

### System Reserved

**Kubelet Flag**: `--system-reserved=[cpu=100mi][,][memory=100Mi]`
**Kubelet Flag**: `--system-reserved-cgroup=`/system.slice`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make clear that the kubelet doesnt create either of these two cgroups.

However, Kubelet cannot burst and use up all available Node resources if `kube-reserved` is enforced.

Be extra careful while enforcing `system-reserved` reservation since it can lead to critical system services being CPU starved or OOM killed on the node.
The recommendation is to enforce `system-reserved` only if a user has profiled their nodes exhaustively to come up with precise estimates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and is confident in their ability to recover if any item in that group is oom_killed.


* To begin with enforce `Allocatable` on `pods`.
* Once adequate monitoring and alerting is in place to track kube system daemons, attempt to enforce `kube-reserved` based on usage heuristics.
* If aboslutely necessary, enforce `system-reserved` over time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on absolutely

Copy link
Contributor Author

@vishh vishh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishh vishh force-pushed the node-allocatable branch 2 times, most recently from 94c3ef3 to a617a42 Compare March 9, 2017 19:44
Signed-off-by: Vishnu kannan <vishnuk@google.com>
@vishh vishh force-pushed the node-allocatable branch from a617a42 to bcd5e12 Compare March 9, 2017 19:45
@vishh
Copy link
Contributor Author

vishh commented Mar 9, 2017

@kubernetes/sig-docs-maintainers this PR is meant for v1.6 Can I get a docs review?

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one typo, and maybe some more clarifying text.

while not needed on this pr, i feel like this should be linked to from somewhere centrally on how to administer a kubernetes node. maybe a doc team member can assist there.


Memory pressure at the node level leads to System OOMs which affects the entire node and all pods running on it.
Nodes can go offline temporarily until memory has been reclaimed.
To avoid (or reduce the probabilty) system OOMs kubelet provides [`Out of Resource`](./out-of-resource.md) management.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: probability

The scheduler treats `Allocatable` as the available `capacity` for pods.

`kubelet` enforce `Allocatable` across pods by default.
This enforcement is controlled by specifying `pods` value to the kubelet flag `--enforce-node-allocatable`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe explain what enforcement means? for example, by enforcing at this level, we ensure pods cannot consume more memory and cpu time than allocated?

Signed-off-by: Vishnu kannan <vishnuk@google.com>
@vishh
Copy link
Contributor Author

vishh commented Mar 14, 2017

@derekwaynecarr PTAL

@derekwaynecarr
Copy link
Member

/lgtm

@chenopis chenopis merged commit d4383a4 into kubernetes:release-1.6 Mar 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants