Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029

Conversation

tetianakravchenko
Copy link
Contributor

@tetianakravchenko tetianakravchenko commented Jun 26, 2024

Proposed commit message

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Screenshot 2024-07-02 at 00 03 33

Note: memory.node.pct is 0 only on the table view, in document itself:

"memory": {
          ....
            "node": {
              "pct": 0.006660828976437959
            },

this value is > 0

Namespace                   Name                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                          ------------  ----------  ---------------  -------------  ---
  default                     demo-b66f56cf5-thbl8                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         6h50m
  ingress-nginx               ingress-nginx-controller-7cd47c5db6-4mp25     100m (0%)     0 (0%)      90Mi (0%)        0 (0%)         4d5h
  kube-system                 coredns-76f75df574-t74xk                      100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     4d12h
  kube-system                 coredns-76f75df574-wxd4z                      100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     4d12h
  kube-system                 elastic-agent-g2vqk                           100m (0%)     0 (0%)      400Mi (2%)       700Mi (4%)     4d11h
  kube-system                 etcd-kind-control-plane                       100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         4d12h
  kube-system                 kindnet-4cnsz                                 100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      4d12h
  kube-system                 kube-apiserver-kind-control-plane             250m (1%)     0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 kube-controller-manager-kind-control-plane    200m (1%)     0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 kube-proxy-hqsxh                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 kube-scheduler-kind-control-plane             100m (0%)     0 (0%)      0 (0%)           0 (0%)         4d12h
  kube-system                 metricbeat-dqfw7                              100m (0%)     0 (0%)      100Mi (0%)       200Mi (1%)     7h4m
  local-path-storage          local-path-provisioner-7577fdbbfb-c6vh6       0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d12h

Logs

…culation

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@tetianakravchenko tetianakravchenko added breaking change Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team labels Jun 26, 2024
@tetianakravchenko tetianakravchenko requested a review from a team as a code owner June 26, 2024 14:29
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 26, 2024
Copy link
Contributor

mergify bot commented Jun 26, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @tetianakravchenko? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@@ -52,6 +52,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]

- Setting period for counter cache for Prometheus remote_write at least to 60sec {pull}38553[38553]
- Add support of Graphite series 1.1.0+ tagging extension for statsd module. {pull}39619[39619]
- Remove fallback to the node limit for the `kubernetes.pod.cpu.usage.limit.pct` and `kubernetes.pod.memory.usage.limit.pct` metrics calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that This should me moved under the Breaking Changes section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this, so maybe it is a rendering issue of the page. I though is only under Metricbeat
Screenshot 2024-06-27 at 4 38 35 PM

@MichaelKatsoulis
Copy link
Contributor

The fields definition must be updated as well in

CPU usage as a percentage of the defined limit for the pod containers (or total node CPU if one or more containers of the pod are unlimited). If one or more containers of the pod is unlimited and the `node` and `state_node` metricsets are both disabled on that node, this metric will be missing entirely.
. And also make update needs to be run to update the docs.

@@ -167,6 +150,7 @@ func eventMapping(content []byte, metricsRepo *util.MetricsRepo, logger *logp.Lo
}

if usageMem > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we dont check if podMemLimit > 0, so this means that memory.usage.node.pct will always be reported for a pod right?

But what we need is to report it only in case that podMemLimit == 0

Otherwise this Kibana UI will always use the node limits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we dont check if podMemLimit > 0, so this means that memory.usage.node.pct will always be reported for a pod right?

no, memory.usage.node.pct will be reported only in case nodeMem > 0 (nodeMem can be 0 if state_node and/or node metricsets are disabled)

But what we need is to report it only in case that podMemLimit == 0

why? the point is to split memory.usage.node.pct and memory.usage.limit.pct, so if usageMem and nodeMem are available - memory.usage.node.pct can be calculated
similarly: if all containers have memory limit defined, memory.usage.limit.pct can be calculated as usageMem/sum(containers_mem_limit)

Otherwise this Kibana UI will always use the node limits

I think Kibana issue should be covered separately by elastic/kibana#184984

Copy link
Contributor

@gizas gizas Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I had a second look here. So this change wont break the existing UI, because kibana checks kubernetes.pod.memory.usage.limit.pct > 0.0. As long as this metric is populated then we dont care for the kubernetes.pod.memory.usage.node.pct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, it seems to not work this way
Screenshot 2024-07-02 at 00 09 54

usage.node.pct is > 1% for the selected pod:
Screenshot 2024-07-02 at 00 11 07

seems this script doesn't work as expected

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
tetianakravchenko and others added 3 commits July 1, 2024 09:48
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@tetianakravchenko tetianakravchenko merged commit 53bf67c into elastic:main Jul 1, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[metricbeat/kubernetes] Do not fallback to the Node's limit for the pod 'usage.limit.pct' metrics
3 participants