Description
See #2781 for the background.
Some metrics, e.g. queue_messages_published_total
are computed as a basic sum aggregation of samples from a local ETS table, e.g. channel_queue_exchange_metrics
. When a channel is closed, its samples are removed from the table,
decreasing the sum. This violates an expectation for counter
metrics in Prometheus: they can only increment or stay flat or reset to 0 but not decrease.
The issue is not present when per-object metrics are used: when a channel is closed, all of its metrics go away, which is what the user expects to happen.
This is a side-effect of our quick-and-dirty switch to aggregated metrics. We need to retain this historical total or delegate
to the Prometheus client library which will do most aggregation work and handle resets.
This is node local state, so we can address this even with a significant rework of the Prometheus plugin and still ship it in a 3.8.x
release.
Per discussion with @dcorbacho @gerhard @kjnilsson.