Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Prometheus Sums for handling delta counter case #2570

Merged
merged 7 commits into from
May 26, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions specification/metrics/datamodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -1149,7 +1149,7 @@ The algorithm is scheduled out as follows:
When the next delta sum reported for a given metric stream does not align with
where we expect it, one of several things could have occurred:

- the process reporting metrics was rebooted, leading to a new reporting
- The process reporting metrics was rebooted, leading to a new reporting
interval for the metric.
- A Single-Writer principle violation where multiple processes are reporting the
same metric stream.
Expand All @@ -1161,7 +1161,10 @@ that some data was lost, and reset the counter.
We detect alignment via two mechanisms:

- If the incoming delta time interval has significant overlap with the previous
time interval, we assume a violation of the single-writer principle.
time interval, we assume a violation of the single-writer principle and can be handled with one of the following options:
- Simply report the inconsistencies in time intervals, as the error condition could be caused by a misconfiguration.
- Eliminate the overlap / deduplicate on the receiver side.
- Correct the inconsistent time intervals by differentiating the given `Resource` and `Attribute` set used from overlapping time.
- If the incoming delta time interval has a significant gap from the last seen
time, we assume some kind of reboot/restart and reset the cumulative counter.

Expand Down Expand Up @@ -1329,7 +1332,9 @@ An [OpenTelemetry Gauge](#gauge) MUST be converted to a Prometheus Gauge.

- If the aggregation temporality is cumulative and the sum is monotonic, it MUST be converted to a Prometheus Counter.
- If the aggregation temporality is cumulative and the sum is non-monotonic, it MUST be converted to a Prometheus Gauge.
- If the aggregation temporality is delta and the sum is monotonic, it SHOULD be converted to a cumulative temporality and become a Prometheus Sum
- If the aggregation temporality is delta and the sum is monotonic, it SHOULD be converted to a cumulative temporality and become a Prometheus Sum. The following behaviors are expected:
- The new data point type must be the same as the accumulated data point type.
- The new data point's start time must match the time of the accumulated data point. If not, see [detecting alignment issues].(#sums-detecting-alignment-issues)
locmai marked this conversation as resolved.
Show resolved Hide resolved
- Otherwise, it MUST be dropped.

Sum metric points MUST have `_total` added as a suffix to the metric name.
Expand Down