Skip to content

[EKS] Node metrics do not display for Windows Nodes  #11035

@HarrisonWAffel

Description

@HarrisonWAffel

Setup

  • Rancher version: tested on 2.8-head, 2.8.3
  • Rancher UI Extensions: n/a
  • Browser type & version: tested on chrome

Describe the bug
After provisioning an EKS cluster containing a Windows worker node, the node summary within the cluster explorer does not show the current cpu or memory use percentage, however the pod use percentage is shown properly. This is not an issue for Linux nodes, and only reproduces on EKS.

It should be noted that, by default, EKS does not install a metrics server onto the cluster. However this issue also reproduces when the metrics server is installed manually. For Windows nodes managed by Rancher, it is expected that the rancher-monitoring chart is installed for all metrics to be represented accurately. In order to test this monitoring 103.1.0 must be used, as that is the first version which supports EKS.

To Reproduce

Result
n/a is reported for the CPU and memory values

Expected Result
Usage percentages for CPU and memory are shown in the UI when the monitoring chart and metrics server is installed.

Screenshots

Screen Shot 2024-05-15 at 11 14 23 AM

Additional context

After installing the metrics server manually I was able to confirm that the windows node is returned by the metrics API and the object is not malformed

NodeMetrics API Output
{
  "kind": "NodeMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "xxx",
    "creationTimestamp": "2024-05-15T15:16:29Z",
    "labels": {
      "beta.kubernetes.io/arch": "amd64",
      "beta.kubernetes.io/instance-type": "m5.large",
      "beta.kubernetes.io/os": "windows",
      "eks.amazonaws.com/capacityType": "ON_DEMAND",
      "eks.amazonaws.com/nodegroup": "xxx",
      "eks.amazonaws.com/nodegroup-image": "ami-0dbcbd978ec35ebea",
      "failure-domain.beta.kubernetes.io/region": "xxx",
      "failure-domain.beta.kubernetes.io/zone": "xxx",
      "k8s.io/cloud-provider-aws": "xxx",
      "kubernetes.io/arch": "amd64",
      "kubernetes.io/hostname": "xxx",
      "kubernetes.io/os": "windows",
      "node-role.kubernetes.io/worker": "",
      "node.kubernetes.io/instance-type": "m5.large",
      "node.kubernetes.io/windows-build": "10.0.20348",
      "topology.kubernetes.io/region": "xxx",
      "topology.kubernetes.io/zone": "xxx"
    }
  },
  "timestamp": "2024-05-15T15:16:26Z",
  "window": "15.173s",
  "usage": {
    "cpu": "94905424n",
    "memory": "384904Ki"
  }
}

I was able to confirm that the management cluster object for this node has the expected status.capacity field that the linux nodes also have

Status Capacity Screenshot

Screen Shot 2024-05-15 at 11 19 14 AM

I was able to confirm that the rest of the monitoring stack functions as expected, and all prometheus targets are healthy

Prometheus targets

Screen Shot 2024-05-15 at 11 28 14 AM

By default EKS does not set a role label onto nodes. Even after manually providing the role label, the UI did not populate these values.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions