Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics never being cleaned up generates Memory and CPU performance issues #44

Open
mrzacarias opened this issue Aug 6, 2020 · 0 comments

Comments

@mrzacarias
Copy link

PAG doesn't keep the track of all metrics in memory, like the normal prom gateway, but just the last version of the merged metric. That reduces the memory usage and makes it possible to use it to handle heavy metrics input loads, like the ones generated from browser-side apps.

Even with that "merge and keep the last value" optimization, as the metrics are never cleaned up, considering time < infinite, PAG will eventually deplete the MEM/CPU resourcing and blow up, as it happened a couple of times in my company. As we have cortex keeping track of metrics, PAG getting restarted every now and then is not a huge problem, but before blowing up we have an increase on the number of "bad requests", which makes us lose some good metrics while it doesn't restart.

image

There's no need to keep the metrics always there on PAG, as they are constantly scraped and stored on Prometheus or Cortex long living storage, so we should have a way to detect and remove old metrics from memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant