Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: Add "Instantaneous" Temporality (Gauge Histogram) #274

Closed
jmacd opened this issue Mar 4, 2021 · 4 comments
Closed

Metrics: Add "Instantaneous" Temporality (Gauge Histogram) #274

jmacd opened this issue Mar 4, 2021 · 4 comments

Comments

@jmacd
Copy link
Contributor

jmacd commented Mar 4, 2021

PR #236 calls for a GaugeHistogram, which is a form of instantaneous histogram. This histogram can logically be calculated in OTel's metric API model through the use of ValueObserver instruments and a label that is removed by the SDK. The result is a histogram of gauge values.

The concept of Temporality can be extended to include a third kind of temporality called Instantaneous, which will allow us to encode a series of GaugeHistogram values from Prometheus or to aggregate OTel Gauge points into histograms in an export pipeline.

Both OTLP Sum and Histogram points include an aggregation_temporality field, so adding a new value of Temporality means defining what it means in both cases. The meaning for Histogram points of Instantaneous temporality is precisely what GaugeHistogram means (see #236 (comment)).

The meaning for Sum points with Instantaneous temporality is not a requirement for OTel Metrics. However, it is easy to define and reasonable to consider, because an "Instant Sum" is not different than a Sum with Delta temporality and an infinitesimally small window of time. We can simply define Instant Sum points to be deltas with a zero-width time range.

The only problem left by the definitions above is that we now have a way to encode a raw, scalar-valued point corresponding to the API-level metric event that generates a Counter or a Gauge point. A raw Counter event translates into an Instant Sum. A raw Gauge event translates into a Gauge point (sort of), but we would have to use Histogram exemplars to encode a raw histogram value--at the very least this feels asymmetric. See #188.

@jmacd
Copy link
Contributor Author

jmacd commented Mar 4, 2021

In a picture:

Screen Shot 2021-03-04 at 1 16 44 AM

@jmacd
Copy link
Contributor Author

jmacd commented Mar 4, 2021

It is worth making a detailed comparison to the Stackdriver protocol, which is both similar and different:

https://cloud.google.com/monitoring/api/v3/kinds-and-types

The table looks like:

Screen Shot 2021-03-04 at 2 04 30 PM

At a high-level, I would explain the differences between these protocols as follows. There are two coordinates in these tables, describing information the OTel group has referred to as "structure" and the "temporality".

The OTel protocol has arranged this table so that all first-class semantic information is carried in the data point kind. Temporality is strictly second-class information about how data was collected or encoded, and does not change interpretation. The use of temporality is to allow for flexibility in collection, not to describe semantics, in the OpenTelemetry model.

The Stackdriver protocol has arranged this table in a more compact form, allowing Gauges and Counters to share a Value Type but be distinguished by their conceptual temporality ("Metric Kind"). The Stackdriver model therefore includes semantic information in both dimensions of this table, whereas OTel has created more rows of table in order to have a single semantic dimension.

@jsuereth
Copy link
Contributor

jsuereth commented May 8, 2021

I've had a chance to think this over, and I think it's showing some oddities in OTel's fragmenetation of the world.

Specifcially, let's look at the goal of having "natural aggregation" methods for the data types.

  1. What is the meaning of "Instantaneous Sum" that is not also Gauge? Would we define a natural aggregation that isn't just what's done for Gauge?
  2. "instanteous Histogram" (assuming this represents a Gauage histogram) shares a similar aggregation story w/ Gauge. Is it odd that aggregation tempoarlity would shift the ENTIRE mechanism of doing aggregation?

To the extent that "Gauage is already instantaneous temporality" I think it makes more sense to fit a histogram as a point type within Gauge.

If I read the OpenMetrics Definition, it feels more clear that there are two different things going on here:

  • Histogram record discrete events, like http request latency.
  • Gauge-Histogram records a current distribution, like the amount of time items have spent in a queue.

This leads me to a lot of questions:

  1. For item time spent in queue, I could record the latency of items flowing through a queue, but that doesn't give me a notion for how "big my backup" is, that I get with GaugeHistogram.
  2. Can I assume anything about aggregation of time-waiting-in-queue histogram? Not really. On the next sample, some items may still be in the queue and some might not be. If I were to join queue lengths in anyway I'd actually be desotrying the uselessness of the measurement. I really need to treat it like a gauge, not like a histogram.

LONG winded response later, I don't think Instantaneous Temporality solves the semantics here, and it just makes dealing with Histograms a bit more odd. It's possible there's a lot I'm not seeing, so wanted to kick off the discussion.

@jmacd
Copy link
Contributor Author

jmacd commented May 24, 2021

I've reconsidered how to represent GaugeHistogram-type metric data. Instantaneous is not a good concept for OTel metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants