Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

Open
raghavan20 opened this issue Jul 25, 2022 · 0 comments

Comments

@raghavan20
Copy link

Hello. We have an execution model which is typical of a producer-consumer pattern with a queue/topic in the middle. Currently, the queue holds work of same type from multiple customers/tenants. The consumers/workers are non-http based applications that pick/pop a message from the queue and execute work. These consumers are Kubernetes pods that are spun up using a Deployment. They are configured to autoscale based on the work available on the queue. We would like to know at least two metrics about the performance of the workers/backlog burn up

  • number of executions processed for each customer/tenant
  • number of success or failures for each customer/tenant

What is the best way to publish metrics from these ephemeral workers? We were trying to send through Prometheus gateway but looks it has design philosophy which for us
a) can either result in metric overwrites from multiple workers/pods if they try to use same job/group name
b) can result in garbage build up if job/group name is based on instance/pod name as pods come and go over longer periods of time

We could additionally introduce a mini HTTP web server for each consumer and expose a scrape metric endpoint. It is possibly a bit overkill but it would work. Please suggest.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant