Skip to content

Add enhanced queue metrics and monitoring #9

@mre

Description

@mre

Summary

Add comprehensive queue metrics to provide better operational visibility into the job processing system.

Motivation

Currently we only track failed job count. For production deployments, operators need more detailed metrics to understand system health and performance.

Proposed Metrics

Queue Metrics

  • Queue depth per job type
  • Average/median processing time per job type
  • Job throughput (jobs/sec, jobs/min)
  • Job success/failure rates

Worker Metrics

  • Worker utilization (active/idle workers)
  • Worker pool sizes per queue
  • Average time workers spend polling vs processing

System Metrics

  • Database connection pool usage
  • Queue polling frequency and efficiency
  • Retry attempt distributions

Implementation Ideas

  • Add get_queue_metrics() function returning structured metrics
  • Consider integration with popular metrics systems (Prometheus, StatsD)
  • Add optional metrics collection configuration
  • Include metrics in archive functionality for historical analysis

Inspired By

HN discussion on PostgreSQL job queues emphasizing the importance of monitoring queue length, processing time, and worker utilization for production systems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions