Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore/deployment: add resources requests and limits for helm and Kustomize #1631

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions deployment/base/gc/gc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ spec:
- name: nfd-gc
image: gcr.io/k8s-staging-nfd/node-feature-discovery:master
imagePullPolicy: Always
resources:
limits:
cpu: 20m
memory: 1Gi
requests:
cpu: 10m
memory: 128Mi
command:
- "nfd-gc"
ports:
Expand Down
7 changes: 7 additions & 0 deletions deployment/base/master/master-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ spec:
- name: nfd-master
image: gcr.io/k8s-staging-nfd/node-feature-discovery:master
imagePullPolicy: Always
resources:
limits:
cpu: 300m
memory: 4Gi
requests:
cpu: 100m
memory: 128Mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting limits.memory: 4Gi means you're ok with an OOM Kill at that number. Hence, the expectation is for the master pod to requesting more memory.
Setting a lower requests.memory is not helping at all. Let's say you get an OOM Kill, then the scheduler spawns a new pod in a node with enough memory according to the requests, but not enough for the limits. This is perfectly fine for the scheduler, but your pod will die again.
Because of the above and because memory cannot be returned to the pool (after GC for example), this is why the recommendation is to set limits equal to requests for memory. You only help the scheduler this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cmontemuino for the feedback. I think one reason to set limits is to really oom kill the pod if it get's crazy (with memory leaks like #1614). It might be better to set it to 8Gi than 4Gi, though. The memory consumption very heavily depends on the number of nodes in the cluster (on the ballpark of ~1.5MB per node or so, it seems). We probablty want the default deployment to work in a wide variety of clusters so we cannot set requests to 4Gi or 8Gi as it's total waste (or cannot be satisfied) in smaller clusters and lowering limits to 128Mi will cause problems in bigger clusters. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that setting proper resource constraints should be up to administrator of specific cluster. Our example deployment manifests should only provide some guidance. Some platforms operators would prefer to OOM nfd-master quicker not to occupy resources from other components other would prefer to do it later etc. There is no golden rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my clusters I am setting limits 25% above actual usage. As I prefer to be alerted about OOMs in my control plane quicker than later so It's easier to triage issues wrt to resource starving but I understand that other can disagree with this approach 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marquiz -- I'm not suggesting to left limits unset. What I'm saying is to use the same value for both resources.limits.memory and resources.requests.memory.
Having them with different values won't help because memory is not an "elastic" resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmontemuino as I explained I don't think we can do that. We don't want to set requests=limits=4Gi as that will hurt/block small clusters and similarly we don't want to set requests=limits=128Mi (or smth on that range) because we'll get oom'ed in bigger clusters.

I think we should document the resource requests/limits better, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document the resource requests/limits better, though.

Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmontemuino as I explained I don't think we can do that. We don't want to set requests=limits=4Gi as that will hurt/block small clusters and similarly we don't want to set requests=limits=128Mi (or smth on that range) because we'll get oom'ed in bigger clusters.

I think we should document the resource requests/limits better, though.

@marquiz -- got it 👍

Thus it becomes a documentation issue. It must be super clear what implications are there when having different limits and requests for memory. Otherwise OOM Kill will start happening for smaller clusters; and likely non-availability for clusters with memory-pressure scenarios.

Would you like me to try documenting it in a PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like me to try documenting it in a PR?

@cmontemuino please do, thanks for volunteering, really appreciated 👍

livenessProbe:
grpc:
port: 8082
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ spec:
command:
- "nfd-topology-updater"
args: []
resources:
limits:
cpu: 100m
memory: 60Mi
requests:
cpu: 50m
memory: 40Mi
ports:
- name: metrics
containerPort: 8081
7 changes: 7 additions & 0 deletions deployment/base/worker-daemonset/worker-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ spec:
imagePullPolicy: Always
command:
- "nfd-worker"
resources:
limits:
cpu: 200m
memory: 512Mi
requests:
cpu: 5m
memory: 64Mi
args:
- "-server=nfd-master:8080"
ports:
Expand Down
72 changes: 28 additions & 44 deletions deployment/helm/node-feature-discovery/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,17 +94,13 @@ master:
type: ClusterIP
port: 8080

resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
resources:
limits:
cpu: 300m
memory: 4Gi
requests:
cpu: 100m
memory: 128Mi

nodeSelector: {}

Expand Down Expand Up @@ -411,17 +407,13 @@ worker:
# Does not work on systems without /usr/src AND a read-only /usr, such as Talos
mountUsrSrc: false

resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
resources:
limits:
cpu: 200m
memory: 512Mi
requests:
cpu: 5m
memory: 64Mi

nodeSelector: {}

Expand Down Expand Up @@ -469,17 +461,13 @@ topologyUpdater:
readOnlyRootFilesystem: true
runAsUser: 0

resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
resources:
limits:
cpu: 100m
memory: 60Mi
requests:
cpu: 50m
memory: 40Mi

nodeSelector: {}
tolerations: []
Expand All @@ -503,17 +491,13 @@ gc:

podSecurityContext: {}

resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
resources:
limits:
cpu: 20m
memory: 1Gi
requests:
cpu: 10m
memory: 128Mi

metricsPort: 8081

Expand Down
13 changes: 8 additions & 5 deletions docs/deployment/helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,8 @@ API's you need to install the prometheus operator in your cluster.
| `master.rbac.create` | bool | true | Specifies whether to create [RBAC][rbac] configuration for nfd-master |
| `master.service.type` | string | ClusterIP | NFD master service type. **NOTE**: this parameter is related to the deprecated gRPC API and will be removed with it in a future release |
| `master.service.port` | integer | 8080 | NFD master service port. **NOTE**: this parameter is related to the deprecated gRPC API and will be removed with it in a future release |
| `master.resources` | dict | {} | NFD master pod [resources management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) |
| `master.nodeSelector` | dict | {} | NFD master pod [node selector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) |
| `master.resources.limits` | dict | {cpu: 300m, memory: 4Gi} | NFD master pod [resources limits](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `master.resources.requests`| dict | {cpu: 100m, memory: 128Mi} | NFD master pod [resources requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `master.tolerations` | dict | _Scheduling to master node is disabled_ | NFD master pod [tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) |
| `master.annotations` | dict | {} | NFD master pod [annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) |
| `master.affinity` | dict | | NFD master pod required [node affinity](https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/) |
Expand All @@ -155,7 +155,8 @@ API's you need to install the prometheus operator in your cluster.
| `worker.serviceAccount.name` | string | | The name of the service account to use for nfd-worker. If not set and create is true, a name is generated using the fullname template (suffixed with `-worker`) |
| `worker.rbac.create` | bool | true | Specifies whether to create [RBAC][rbac] configuration for nfd-worker |
| `worker.mountUsrSrc` | bool | false | Specifies whether to allow users to mount the hostpath /user/src. Does not work on systems without /usr/src AND a read-only /usr |
| `worker.resources` | dict | {} | NFD worker pod [resources management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) |
| `worker.resources.limits` | dict | {cpu: 200m, memory: 512Mi} | NFD worker pod [resources limits](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `worker.resources.requests` | dict | {cpu: 5m, memory: 64Mi} | NFD worker pod [resources requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `worker.nodeSelector` | dict | {} | NFD worker pod [node selector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) |
| `worker.tolerations` | dict | {} | NFD worker pod [node tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) |
| `worker.priorityClassName` | string | | NFD worker pod [priority class](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/) |
Expand All @@ -180,7 +181,8 @@ API's you need to install the prometheus operator in your cluster.
| `topologyUpdater.watchNamespace` | string | `*` | Namespace to watch pods, `*` for all namespaces |
| `topologyUpdater.podSecurityContext` | dict | {} | [PodSecurityContext](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod) holds pod-level security attributes and common container settings |
| `topologyUpdater.securityContext` | dict | {} | Container [security settings](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-container) |
| `topologyUpdater.resources` | dict | {} | Topology updater pod [resources management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) |
| `topologyUpdater.resources.limits` | dict | {cpu: 100m, memory: 60Mi} | NFD Topology Updater pod [resources limits](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `topologyUpdater.resources.requests` | dict | {cpu: 50m, memory: 40Mi} | NFD Topology Updater pod [resources requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `topologyUpdater.nodeSelector` | dict | {} | Topology updater pod [node selector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) |
| `topologyUpdater.tolerations` | dict | {} | Topology updater pod [node tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) |
| `topologyUpdater.annotations` | dict | {} | Topology updater pod [annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) |
Expand All @@ -202,7 +204,8 @@ API's you need to install the prometheus operator in your cluster.
| `gc.rbac.create` | bool | true | Specifies whether to create [RBAC][rbac] configuration for garbage collector |
| `gc.interval` | string | 1h | Time between periodic garbage collector runs |
| `gc.podSecurityContext` | dict | {} | [PodSecurityContext](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod) holds pod-level security attributes and common container settings |
| `gc.resources` | dict | {} | Garbage collector pod [resources management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) |
| `gc.resources.limits` | dict | {cpu: 20m, memory: 1Gi} | NFD Garbage Collector pod [resources limits](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `gc.resources.requests` | dict | {cpu: 10m, memory: 128Mi} | NFD Garbage Collector pod [resources requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) |
| `gc.metricsPort` | integer | 8081 | Port on which to serve Prometheus metrics |
| `gc.nodeSelector` | dict | {} | Garbage collector pod [node selector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector) |
| `gc.tolerations` | dict | {} | Garbage collector pod [node tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) |
Expand Down