-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/remove k8s cache #32539
Feature/remove k8s cache #32539
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! I left some thoughts already about how to struct the "repo"/"storage" to tackle some of the open questions. Not sure if my suggestion is sane but maybe it worths checking it.
Assuming the comments on the containersStore and podStore were actually swapped. Should be correct in the above code. Thanks for the suggestion, I have a couple of counter points:
|
Yeap, my bad for the swapped comments 🤦🏼♂️ .
Hmm it depends on how you store the metrics. When you store the container level metrics do you know the NodeName? I guess yes. In that case can't you attach those container metrics under a container key which is under a node key? For example: {
nodeA: {
nodeMetrics: {...},
containersStore: {
containerA: {...},
containerB: {...},
......
},
podStore: {
podA: {...},
podB: {...},
},
}, nodeB: {...}
} Would sth like the above work? Actually having a 3-level nesting could also work. It should be faster in case of Pod deletion where you would skip looping over the containers and you would directly delete the attached containers' metrics for the given Pod, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please as well update documentation:
https://www.elastic.co/guide/en/beats/metricbeat/current/exported-fields-kubernetes.html#_cpu_7
with info that node.pct might be missed in some cases
Btw we should make sure that this feature supports cases like the following:
This is what Elastic Agent will produce. See the description of #25640 for more details and #25640 (comment). |
This pull request is now in conflicts. Could you fix it? 🙏
|
/package |
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled (cherry picked from commit 5503761) # Conflicts: # metricbeat/module/kubernetes/container/container.go # metricbeat/module/kubernetes/container/container_test.go # metricbeat/module/kubernetes/container/data.go # metricbeat/module/kubernetes/pod/data.go # metricbeat/module/kubernetes/pod/pod.go # metricbeat/module/kubernetes/pod/pod_test.go # metricbeat/module/kubernetes/state_cronjob/state_cronjob.go # metricbeat/module/kubernetes/state_daemonset/state_daemonset.go # metricbeat/module/kubernetes/state_persistentvolume/state_persistentvolume.go # metricbeat/module/kubernetes/state_persistentvolumeclaim/state_persistentvolumeclaim.go # metricbeat/module/kubernetes/util/kubernetes.go
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled (cherry picked from commit 5503761) # Conflicts: # metricbeat/module/kubernetes/container/container.go # metricbeat/module/kubernetes/container/container_test.go # metricbeat/module/kubernetes/container/data.go # metricbeat/module/kubernetes/pod/data.go # metricbeat/module/kubernetes/pod/pod.go # metricbeat/module/kubernetes/pod/pod_test.go # metricbeat/module/kubernetes/state_persistentvolume/state_persistentvolume.go # metricbeat/module/kubernetes/state_persistentvolumeclaim/state_persistentvolumeclaim.go
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled (cherry picked from commit 5503761) # Conflicts: # metricbeat/module/kubernetes/container/data.go # metricbeat/module/kubernetes/pod/data.go
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled (cherry picked from commit 5503761)
@Mergifyio backport 8.1.0 |
❌ No backport have been created
|
@Mergifyio backport 8.1 |
@Mergifyio backport 8.0 |
✅ Backports have been created
|
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled (cherry picked from commit 5503761) # Conflicts: # metricbeat/module/kubernetes/container/container.go # metricbeat/module/kubernetes/container/container_test.go # metricbeat/module/kubernetes/container/data.go # metricbeat/module/kubernetes/pod/data.go # metricbeat/module/kubernetes/pod/pod.go # metricbeat/module/kubernetes/pod/pod_test.go # metricbeat/module/kubernetes/state_cronjob/state_cronjob.go # metricbeat/module/kubernetes/state_persistentvolume/state_persistentvolume.go # metricbeat/module/kubernetes/state_persistentvolumeclaim/state_persistentvolumeclaim.go # metricbeat/module/kubernetes/util/kubernetes.go
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled (cherry picked from commit 5503761) # Conflicts: # metricbeat/module/kubernetes/container/container.go # metricbeat/module/kubernetes/container/container_test.go # metricbeat/module/kubernetes/container/data.go # metricbeat/module/kubernetes/pod/data.go # metricbeat/module/kubernetes/pod/pod.go # metricbeat/module/kubernetes/pod/pod_test.go # metricbeat/module/kubernetes/state_cronjob/state_cronjob.go # metricbeat/module/kubernetes/state_persistentvolume/state_persistentvolume.go # metricbeat/module/kubernetes/state_persistentvolumeclaim/state_persistentvolumeclaim.go # metricbeat/module/kubernetes/util/kubernetes.go
✅ Backports have been created
|
@Mergifyio backport 8.1 |
✅ Backports have been created
|
* Feature/remove k8s cache (elastic#32539) Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co>
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled (cherry picked from commit 5503761) Co-authored-by: Giuseppe Santoro <giuseppe.santoro@elastic.co>
* replaced internal expiring cache with non expiring dictionary in memory * fixed a bug that prevented to export pod/container metrics when node/state_node metricsets were disabled
What does this PR do?
Replaced time expiring cache in the Kubernetes module with a dictionary that never expires.
Since some metrics in this internal cache were only set once at startup and not regularly updated, we needed to replace the internal cache with a dictionary that never expires.
Why is it important?
It fixes an issue that caused some metrics
kubernetes.container.cpu.usage.limit.pct
andkubernetes.container.memory.usage.limit.pct
,kubernetes.pod.memory.usage.limit.pct
andkubernetes.pod.cpu.usage.limit.pct
to be missing at times:kubernetes.container.*
metrics are presentkubernetes.container.*
metrics are missing. This is CORRECT since limits are not set on containers and we cannot use node metrics as backup optionkubernetes.pod.*
metrics are missing (Bug to be fixed here)kubernetes.pod.*
metrics are missing. This is CORRECT since limits are not set on containers and we cannot use node metrics as backup optionChecklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
nginx-daemons.yaml.zip
author's checklist
Related issues
Use cases
Screenshots
When a pod is deleted (like in the following screenshot at 11:09:37) you might see the container still reporting a cpu/memory usage but not the limit.pct. When the pod is added back (like in the following screenshot at 11:12:31) you might see the container with cpu usage = 0 and limit.pct = 0 for a few seconds until the cpu is reported again and limit.pct to be different from 0.
Logs