Skip to content

Cumulative data points should set StartTimeUnixNano per timeseries #4184

@ArthurSens

Description

@ArthurSens

Current behavior

As mentioned by the Metrics Data Model, the goals of StartTimeUnixNano is to help measure the increase rate of unbroken data points and identifying resets and gaps.

The current implementation of most OTel SDKs sets the value of StartTimeUnixNano as the timestamp representing the start of the instrumented process. This works pretty well when all known data points start being pushed/exposed right at the beginning of the process lifetime, but not so much if unknown data points appear sometime after the process starts.

Problem statement

Let me try to explain with an example:

Let's say an application starts at a time T = 0, some HTTP requests are happening and they are all successful (i.e. status code 200). After 30s (T+30), the first request with code 404 happens and it continues to happen every 1s (therefore 1 request per second is the increase rate).

The increase rate can be measured with a formula like this:

$rate = \frac{Cumulative Value} {Current Time - StartTimeUnixNano}$

1 minute after T, HTTP requests with status code 404 would have happened 30 times, but let's see how different the measurement would be if we use T or T+30 as the start time.

  • If StartTimeUnixNano = T, $rate = \frac{30} {60 - 0} = 0.5$ req/sec
  • If StartTimeUnixNano = T+30, $rate = \frac{30} {60-30} = 1$ req/sec

As mentioned in Current Behavior, the measurement works well when the series is initialized alongside the process, but not so when the series is initialized after.

Requested change

The requested change is that StartTimeUnixNano is set separately per time series.

But of course, this comes with performance drawbacks. For that to be possible, SDKs will need to store in memory some sort of time series ID + StartTimeUnixNano for all time series ever created. This can be a huge drawback for processes wishing to expose high cardinality metrics.

I believe the appropriate approach is to allow users to configure the SDK, where they can opt-in to the current behavior or to the behavior I'm suggesting here.

Additional context

The discussion comes from an issue in OpenTelemetry-Collector-Contrib, where we're implementing support for Prometheus/OpenMetrics's created timestamp: open-telemetry/opentelemetry-collector-contrib#32521

Metadata

Metadata

Assignees

No one assigned

    Labels

    sig-issueA specific SIG should look into this before discussing at the specspec:metricsRelated to the specification/metrics directory

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions