-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029
[metricbeat/kubernetes] Remove fallback to the Node's limit for the pod 'usage.limit.pct' metrics #40029
Conversation
…culation Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
@@ -52,6 +52,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff] | |||
|
|||
- Setting period for counter cache for Prometheus remote_write at least to 60sec {pull}38553[38553] | |||
- Add support of Graphite series 1.1.0+ tagging extension for statsd module. {pull}39619[39619] | |||
- Remove fallback to the node limit for the `kubernetes.pod.cpu.usage.limit.pct` and `kubernetes.pod.memory.usage.limit.pct` metrics calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that This should me moved under the Breaking Changes
section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already under Breaking changes
> Metricbeat
https://github.com/elastic/beats/blob/873aebc393d753d1ccd9cea9cca141067b368a3a/CHANGELOG.next.asciidoc#breaking-changes
am I missing smth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fields definition must be updated as well in
make update needs to be run to update the docs.
|
@@ -167,6 +150,7 @@ func eventMapping(content []byte, metricsRepo *util.MetricsRepo, logger *logp.Lo | |||
} | |||
|
|||
if usageMem > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we dont check if podMemLimit > 0, so this means that memory.usage.node.pct will always be reported for a pod right?
But what we need is to report it only in case that podMemLimit == 0
Otherwise this Kibana UI will always use the node limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we dont check if podMemLimit > 0, so this means that memory.usage.node.pct will always be reported for a pod right?
no, memory.usage.node.pct
will be reported only in case nodeMem > 0
(nodeMem
can be 0 if state_node
and/or node
metricsets are disabled)
But what we need is to report it only in case that
podMemLimit == 0
why? the point is to split memory.usage.node.pct
and memory.usage.limit.pct
, so if usageMem
and nodeMem
are available - memory.usage.node.pct
can be calculated
similarly: if all containers have memory limit defined, memory.usage.limit.pct
can be calculated as usageMem/sum(containers_mem_limit)
Otherwise this Kibana UI will always use the node limits
I think Kibana issue should be covered separately by elastic/kibana#184984
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I had a second look here. So this change wont break the existing UI, because kibana checks kubernetes.pod.memory.usage.limit.pct > 0.0. As long as this metric is populated then we dont care for the kubernetes.pod.memory.usage.node.pct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, it seems to not work this way
usage.node.pct is > 1% for the selected pod:
seems this script doesn't work as expected
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Proposed commit message
kubernetes.pod.cpu.usage.limit.pct
andkubernetes.pod.memory.usage.limit.pct
If any container of the Pod does not have cpu limit defined:
kubernetes.pod.cpu.usage.limit.pct
will not be reported, similar for the memory:kubernetes.pod.memory.usage.limit.pct
andmemory.working_set.limit.pct
will not be reportedThis implementation is similar to https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.103.0/receiver/kubeletstatsreceiver/internal/kubelet/metadata.go#L113-L116
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Note: memory.node.pct is
0
only on the table view, in document itself:this value is > 0
Logs