Skip to content

Feature: user defined pidprovider #441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

AMekss
Copy link

@AMekss AMekss commented Jul 18, 2019

Similarly as in #430 we are hitting problem when monitoring RQ workers. By design RQ worker forks new process for every new job, while it always execute one job at a time. The churn of process pids is insane and our worker pods are running out of memory in a few days of running because of this.

Would be cool to have a possibility to override PID provider for such edge cases. This PR is an attempt to do so in less obtrusive way I managed to find. I'm not seasoned Python developer nor very familiar with all the ins and outs of this project, so feel free to give me hints on how proposed solution can be improved.

This PR makes possible the following:

import uwsgi
from prometheus_client.pidprovider import Pidprovider
Pidprovider.source = uwsgi.worker_id

# or in case of RQ
Pidprovider.source = os.getppid

AMekss added 2 commits July 18, 2019 14:17
Signed-off-by: Artūrs Mekšs <arturs.mekss@gmail.com>
Signed-off-by: Artūrs Mekšs <arturs.mekss@gmail.com>
@AMekss
Copy link
Author

AMekss commented Jul 18, 2019

Looking to fix broken build..

@brian-brazil
Copy link
Contributor

I don't see how this works, any code like this would need to be called before anything that might instantiate a metric is.

Signed-off-by: Artūrs Mekšs <arturs.mekss@gmail.com>
@AMekss AMekss force-pushed the feature-pidprovider branch from fd7f10d to f18de67 Compare July 18, 2019 17:52
@AMekss
Copy link
Author

AMekss commented Jul 18, 2019

Correct, I think the best place for it would be before other imports while bootstrapping workers. While I also think that it should work alright even if changed later as it will be handled as PID change. Please correct me if I'm wrong

@brian-brazil
Copy link
Contributor

brian-brazil commented Jul 18, 2019 via email

@AMekss
Copy link
Author

AMekss commented Jul 18, 2019

Sure, the correct way of setting this up is before importing any application logic and metrication code. I'm trying to figure out the worst case scenario.. where even if that would mistakenly happen afterwards it would not mess the metrics and be handled by pid change logic.

@AMekss
Copy link
Author

AMekss commented Jul 22, 2019

After playing around with python I figured out a way of how to achieve the same within a current implementation, leaving it here for anyone else tackling the same issue. Before importing any application code this snippet can customise PID provider function:

import os
if os.getenv('prometheus_multiproc_dir'):
    from prometheus_client import values
    values.ValueClass = values.MultiProcessValue(os.getppid)

It works for us while it feels a bit of a hack to me so would be cool have some official and better way of dealing with this edge case. Anyways closing this PR for now as it doesn't bring any value.

Update After almost a week of running it I can confirm it fixed our memory problems. I slightly updated code snippet

@AMekss AMekss closed this Jul 22, 2019
@cai-personal
Copy link

After playing around with python I figured out a way of how to achieve the same within a current implementation, leaving it here for anyone else tackling the same issue. Before importing any application code this snippet can customise PID provider function:

import os
from prometheus_client import values
values.ValueClass = values.MultiProcessValue(os.getppid)

It works for us while it feels a bit of a hack to me so would be cool have some official and better way of dealing with this edge case. Anyways closing this PR for now as it doesn't bring any value.

can we aggregate the metrics of non-current pid files into a specific file(count_0.db/histogram_0.db), so that the total number of pid files can be controlled without increasing.

@cai-personal
Copy link

After playing around with python I figured out a way of how to achieve the same within a current implementation, leaving it here for anyone else tackling the same issue. Before importing any application code this snippet can customise PID provider function:

import os
from prometheus_client import values
values.ValueClass = values.MultiProcessValue(os.getppid)

It works for us while it feels a bit of a hack to me so would be cool have some official and better way of dealing with this edge case. Anyways closing this PR for now as it doesn't bring any value.

The purpose of this is to control the number of pid files and not let it continue to grow. So, I think it's a viable way to aggregate all the db files from a period of time ago into a specific db file.

@AMekss
Copy link
Author

AMekss commented Jul 24, 2019

can we aggregate the metrics of non-current pid files into a specific file(count_0.db/histogram_0.db), so that the total number of pid files can be controlled without increasing.

This could be the way, however I have no idea how efficient would be to merge files once pid got changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants