Skip to content

Aggregated queue_messages_published_total metric violates Prometheus expectations about counters #2783

Closed
@michaelklishin

Description

@michaelklishin

See #2781 for the background.

Some metrics, e.g. queue_messages_published_total are computed as a basic sum aggregation of samples from a local ETS table, e.g. channel_queue_exchange_metrics. When a channel is closed, its samples are removed from the table,
decreasing the sum. This violates an expectation for counter metrics in Prometheus: they can only increment or stay flat or reset to 0 but not decrease.

The issue is not present when per-object metrics are used: when a channel is closed, all of its metrics go away, which is what the user expects to happen.

This is a side-effect of our quick-and-dirty switch to aggregated metrics. We need to retain this historical total or delegate
to the Prometheus client library which will do most aggregation work and handle resets.

This is node local state, so we can address this even with a significant rework of the Prometheus plugin and still ship it in a 3.8.x release.

Per discussion with @dcorbacho @gerhard @kjnilsson.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions