Skip to content

Improve scalability of Kubernetes module in metricbeat #32662

Open

Description

Context
During my investigation for the the PR 32539 I have noticed that there might be room for performance improvements in the Kubernetes module of metricbeat.

Each metricbeat instance is storing metrics about all the nodes in the Kubernetes cluster but only metrics about pods and containers on the same node where that instance of metricbeat is running. This replicates how the previous expiring cache worked but it is now more evident and can have detrimental effect in clusters with lots of nodes. This is because, with lots of nodes we might end up wasting lots of memory on unused metrics from other nodes. This behaviour is due to how the watcher notifies events from Kubernetes and it wasn't modified by the afore mentioned PR.

Possible solution is for each metricbeat to filter out events generated by other nodes than the one where it is running. This should simplify the MetricRepo API since we wouldn't need to handle the deletion of nodes but only events from Pods and Containers.

During the same investigation, I noticed that when a Pod is deleted, it first calls the update function (to add its metrics again) to be deleted few seconds after. I am not sure if this is intended since the status of the pod is Terminating already. Also I noticed that the call to deletePod is executing twice. This might be because there is more than 1 watcher or because the code is shared between multiple metricsets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions