-
-
Notifications
You must be signed in to change notification settings - Fork 290
export RQ statistics as prometheus metrics #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
288bbf5
to
30e3eb4
Compare
79f6394
to
70f07f8
Compare
de7de29
to
84c84bf
Compare
django_rq/metrics_collector.py
Outdated
|
||
with self.summary.time(): | ||
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename rq_workers_success
to successful_job_count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the names should all be prefixed by "rq_", but I agree this is about the jobs not the workers so I renamed it to rq_job_successful_total
. I used that prefix to keep both success/failure counts to be prefixed by "rq_job_" and used "total" instead of "count" because Prometheus' naming conventions suggest that the unit should be a suffix and "total" is preferred over count:
See: https://prometheus.io/docs/practices/naming/ & prometheus/docs#2465 (comment)
django_rq/metrics_collector.py
Outdated
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) | ||
rq_workers_failed = CounterMetricFamily('rq_workers_failed', 'RQ workers fail count', labels=['name', 'queues']) | ||
rq_workers_working_time = CounterMetricFamily('rq_workers_working_time', 'RQ workers spent seconds', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total_working_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reasons in the thread #666 (comment) I went with rq_working_seconds_total
I don't mind merging this in, but could you document this feature? A simple unit test to show that this view works would also be great. |
84c84bf
to
8066be6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind merging this in, but could you document this feature? A simple unit test to show that this view works would also be great.
Sorry for the delay getting this updated, but here it is. The tests took a bit of finagling to get working, but I hope this is acceptable (running tests twice).
You can let me know if you have a strong preference about the metric names, but I reviewed what I put and adjusted them to match the Prometheus naming conventions (they were close but could use some updating), and I factored in the comments you made.
django_rq/metrics_collector.py
Outdated
|
||
with self.summary.time(): | ||
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the names should all be prefixed by "rq_", but I agree this is about the jobs not the workers so I renamed it to rq_job_successful_total
. I used that prefix to keep both success/failure counts to be prefixed by "rq_job_" and used "total" instead of "count" because Prometheus' naming conventions suggest that the unit should be a suffix and "total" is preferred over count:
See: https://prometheus.io/docs/practices/naming/ & prometheus/docs#2465 (comment)
django_rq/metrics_collector.py
Outdated
rq_workers = GaugeMetricFamily('rq_workers', 'RQ workers', labels=['name', 'state', 'queues']) | ||
rq_workers_success = CounterMetricFamily('rq_workers_success', 'RQ workers success count', labels=['name', 'queues']) | ||
rq_workers_failed = CounterMetricFamily('rq_workers_failed', 'RQ workers fail count', labels=['name', 'queues']) | ||
rq_workers_working_time = CounterMetricFamily('rq_workers_working_time', 'RQ workers spent seconds', labels=['name', 'queues']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the same reasons in the thread #666 (comment) I went with rq_working_seconds_total
8066be6
to
3dda38a
Compare
3dda38a
to
0463893
Compare
It looks like the test failures are related to the fix I added in #700 |
This change follows the implementation of
rq_exporter
and the prometheus client to export the RQ statistics as a Django view.fixes: #503