-
Notifications
You must be signed in to change notification settings - Fork 282
Description
Setup
- Rancher version: tested on 2.8-head, 2.8.3
- Rancher UI Extensions: n/a
- Browser type & version: tested on chrome
Describe the bug
After provisioning an EKS cluster containing a Windows worker node, the node summary within the cluster explorer does not show the current cpu or memory use percentage, however the pod use percentage is shown properly. This is not an issue for Linux nodes, and only reproduces on EKS.
It should be noted that, by default, EKS does not install a metrics server onto the cluster. However this issue also reproduces when the metrics server is installed manually. For Windows nodes managed by Rancher, it is expected that the rancher-monitoring chart is installed for all metrics to be represented accurately. In order to test this monitoring 103.1.0
must be used, as that is the first version which supports EKS.
To Reproduce
- Provision an EKS cluster and add a windows worker node group.
- Install monitoring
103.1.0
, which configures the windows nodes - Install the metrics server using this guide https://docs.aws.amazon.com/eks/latest/userguide/metrics-server.html
- View the cluster explorer node summary page
Result
n/a
is reported for the CPU and memory values
Expected Result
Usage percentages for CPU and memory are shown in the UI when the monitoring chart and metrics server is installed.
Screenshots
Additional context
After installing the metrics server manually I was able to confirm that the windows node is returned by the metrics API and the object is not malformed
NodeMetrics API Output
{
"kind": "NodeMetrics",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"name": "xxx",
"creationTimestamp": "2024-05-15T15:16:29Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/instance-type": "m5.large",
"beta.kubernetes.io/os": "windows",
"eks.amazonaws.com/capacityType": "ON_DEMAND",
"eks.amazonaws.com/nodegroup": "xxx",
"eks.amazonaws.com/nodegroup-image": "ami-0dbcbd978ec35ebea",
"failure-domain.beta.kubernetes.io/region": "xxx",
"failure-domain.beta.kubernetes.io/zone": "xxx",
"k8s.io/cloud-provider-aws": "xxx",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "xxx",
"kubernetes.io/os": "windows",
"node-role.kubernetes.io/worker": "",
"node.kubernetes.io/instance-type": "m5.large",
"node.kubernetes.io/windows-build": "10.0.20348",
"topology.kubernetes.io/region": "xxx",
"topology.kubernetes.io/zone": "xxx"
}
},
"timestamp": "2024-05-15T15:16:26Z",
"window": "15.173s",
"usage": {
"cpu": "94905424n",
"memory": "384904Ki"
}
}
I was able to confirm that the management cluster object for this node has the expected status.capacity
field that the linux nodes also have
I was able to confirm that the rest of the monitoring stack functions as expected, and all prometheus targets are healthy
By default EKS does not set a role label onto nodes. Even after manually providing the role label, the UI did not populate these values.