[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029

tetianakravchenko · 2024-06-26T14:29:56Z

Proposed commit message

WHAT: Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics: kubernetes.pod.cpu.usage.limit.pct and kubernetes.pod.memory.usage.limit.pct
If any container of the Pod does not have cpu limit defined: kubernetes.pod.cpu.usage.limit.pct will not be reported, similar for the memory: kubernetes.pod.memory.usage.limit.pct and memory.working_set.limit.pct will not be reported
This implementation is similar to https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.103.0/receiver/kubeletstatsreceiver/internal/kubelet/metadata.go#L113-L116
WHY:

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes [metricbeat/kubernetes] Do not fallback to the Node's limit for the pod 'usage.limit.pct' metrics #39854

Use cases

Screenshots

Note: memory.node.pct is 0 only on the table view, in document itself:

"memory": {
          ....
            "node": {
              "pct": 0.006660828976437959
            },

this value is > 0

Namespace                   Name                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                          ------------  ----------  ---------------  -------------  ---
  default                     demo-b66f56cf5-thbl8                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h50m
  ingress-nginx               ingress-nginx-controller-7cd47c5db6-4mp25     100m (0%)     0 (0%)      90Mi (0%)        0 (0%)         4d5h
  kube-system                 coredns-76f75df574-t74xk                      100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     4d12h
  kube-system                 coredns-76f75df574-wxd4z                      100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     4d12h
  kube-system                 elastic-agent-g2vqk                           100m (0%)     0 (0%)      400Mi (2%)       700Mi (4%)     4d11h
  kube-system                 etcd-kind-control-plane                       100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         4d12h
  kube-system                 kindnet-4cnsz                                 100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      4d12h
  kube-system                 kube-apiserver-kind-control-plane             250m (1%)     0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 kube-controller-manager-kind-control-plane    200m (1%)     0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 kube-proxy-hqsxh                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 kube-scheduler-kind-control-plane             100m (0%)     0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 metricbeat-dqfw7                              100m (0%)     0 (0%)      100Mi (0%)       200Mi (1%)     7h4m
  local-path-storage          local-path-provisioner-7577fdbbfb-c6vh6       0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d12h

Logs

…culation Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

mergify · 2024-06-26T14:30:37Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @tetianakravchenko? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

gizas · 2024-06-27T06:54:12Z

CHANGELOG.next.asciidoc

@@ -52,6 +52,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]

 - Setting period for counter cache for Prometheus remote_write at least to 60sec {pull}38553[38553]
 - Add support of Graphite series 1.1.0+ tagging extension for statsd module. {pull}39619[39619]
+- Remove fallback to the node limit for the `kubernetes.pod.cpu.usage.limit.pct` and `kubernetes.pod.memory.usage.limit.pct` metrics calculation


I would say that This should me moved under the Breaking Changes section

It is already under Breaking changes > Metricbeat https://github.com/elastic/beats/blob/873aebc393d753d1ccd9cea9cca141067b368a3a/CHANGELOG.next.asciidoc#breaking-changes
am I missing smth?

I see this, so maybe it is a rendering issue of the page. I though is only under Metricbeat

MichaelKatsoulis · 2024-06-27T06:55:12Z

The fields definition must be updated as well in

beats/metricbeat/module/kubernetes/pod/_meta/fields.yml

Line 59 in c2b2983

    
                           CPU usage as a percentage of the defined limit for the pod containers (or total node CPU if one or more containers of the pod are unlimited). If one or more containers of the pod is unlimited and the `node` and `state_node` metricsets are both disabled on that node, this metric will be missing entirely.

. And also make update needs to be run to update the docs.

gizas · 2024-06-27T07:08:25Z

metricbeat/module/kubernetes/pod/data.go

@@ -167,6 +150,7 @@ func eventMapping(content []byte, metricsRepo *util.MetricsRepo, logger *logp.Lo
 		}

 		if usageMem > 0 {


Here we dont check if podMemLimit > 0, so this means that memory.usage.node.pct will always be reported for a pod right?

But what we need is to report it only in case that podMemLimit == 0

Otherwise this Kibana UI will always use the node limits

Here we dont check if podMemLimit > 0, so this means that memory.usage.node.pct will always be reported for a pod right?

no, memory.usage.node.pct will be reported only in case nodeMem > 0 (nodeMem can be 0 if state_node and/or node metricsets are disabled)

But what we need is to report it only in case that podMemLimit == 0

why? the point is to split memory.usage.node.pct and memory.usage.limit.pct, so if usageMem and nodeMem are available - memory.usage.node.pct can be calculated
similarly: if all containers have memory limit defined, memory.usage.limit.pct can be calculated as usageMem/sum(containers_mem_limit)

Otherwise this Kibana UI will always use the node limits

I think Kibana issue should be covered separately by elastic/kibana#184984

Ok, I had a second look here. So this change wont break the existing UI, because kibana checks kubernetes.pod.memory.usage.limit.pct > 0.0. As long as this metric is populated then we dont care for the kubernetes.pod.memory.usage.node.pct.

hm, it seems to not work this way

usage.node.pct is > 1% for the selected pod:

seems this script doesn't work as expected

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

metricbeat/docs/fields.asciidoc

Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

tetianakravchenko added 2 commits June 26, 2024 16:18

Remove fallback to the node limit for the usage.limit.pct metrics cal…

220ae6d

…culation Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

remove debug files

873aebc

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

tetianakravchenko added breaking change Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team labels Jun 26, 2024

tetianakravchenko requested a review from a team as a code owner June 26, 2024 14:29

tetianakravchenko requested review from gizas and MichaelKatsoulis June 26, 2024 14:29

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 26, 2024

mergify bot assigned tetianakravchenko Jun 26, 2024

gizas reviewed Jun 27, 2024

View reviewed changes

fix the fields description

dcc356d

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

gizas reviewed Jun 28, 2024

View reviewed changes

metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved

gizas reviewed Jun 28, 2024

View reviewed changes

metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved

gizas reviewed Jun 28, 2024

View reviewed changes

metricbeat/docs/fields.asciidoc Outdated Show resolved Hide resolved

gizas approved these changes Jun 28, 2024

View reviewed changes

tetianakravchenko and others added 3 commits July 1, 2024 09:48

Update metricbeat/docs/fields.asciidoc

59388cf

Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>

Update metricbeat/docs/fields.asciidoc

f32ad29

Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>

Update metricbeat/docs/fields.asciidoc

8da750c

Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>

MichaelKatsoulis approved these changes Jul 1, 2024

View reviewed changes

update fields description

3916619

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

tetianakravchenko merged commit 53bf67c into elastic:main Jul 1, 2024
20 checks passed

tetianakravchenko mentioned this pull request Jul 22, 2024

[Infra UI/Kubernetes] Use without_limit values, if with_limit does not exist elastic/kibana#188839

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029

[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029

tetianakravchenko commented Jun 26, 2024 •

edited

Loading

mergify bot commented Jun 26, 2024

gizas Jun 27, 2024

tetianakravchenko Jun 27, 2024

gizas Jun 27, 2024

MichaelKatsoulis commented Jun 27, 2024

gizas Jun 27, 2024

tetianakravchenko Jun 27, 2024

gizas Jun 28, 2024 •

edited

Loading

tetianakravchenko Jul 1, 2024

		@@ -167,6 +150,7 @@ func eventMapping(content []byte, metricsRepo util.MetricsRepo, logger logp.Lo
		}

		if usageMem > 0 {

[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029

[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029

Conversation

tetianakravchenko commented Jun 26, 2024 • edited Loading

Proposed commit message

Checklist

Disruptive User Impact

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify bot commented Jun 26, 2024

gizas Jun 27, 2024

Choose a reason for hiding this comment

tetianakravchenko Jun 27, 2024

Choose a reason for hiding this comment

gizas Jun 27, 2024

Choose a reason for hiding this comment

MichaelKatsoulis commented Jun 27, 2024

gizas Jun 27, 2024

Choose a reason for hiding this comment

tetianakravchenko Jun 27, 2024

Choose a reason for hiding this comment

gizas Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

tetianakravchenko Jul 1, 2024

Choose a reason for hiding this comment

tetianakravchenko commented Jun 26, 2024 •

edited

Loading

gizas Jun 28, 2024 •

edited

Loading