Prometheus Unknown MUST preserve its notion #3058

khanhntd · 2022-12-21T13:43:21Z

What are you trying to achieve?

Preserve the Prometheus Unknown's notion since it would be convert to Gauge as discussed in the following two issues: open-telemetry/wg-prometheus#69 and open-telemetry/opentelemetry-collector-contrib#16768

Additional context.

To understand more about Prometheus Type, please follow this document
Understand the conversion between OTEL and Prometheus

damemi · 2023-06-15T16:01:23Z

Bump, was there any progress on this?

dashpole · 2023-07-01T17:00:34Z

Copying over the feedback from the last spec meeting in which this was discussed (2/7/2023):

Consider adding an Unknown metric type, potentially as a "subtype" of gauge. It should be collector/protocol/transport-only, and should not impact the API/SDK.
Please look at other common metric types which may have an notion of unknown metrics (statsd?, other protocols supported by the collector?). What values can other unknown metrics have? String? or all float?

dashpole · 2023-07-01T19:36:55Z

Looking at collector receivers for other protocols, and the types they support:

Protocol	collector receiver	metric types
OTLP MetricsData	otlp receiver	gauge, sum, histogram, exponentialhistogram, summary
OpenMetrics protocol	prometheus receiver	unknown, gauge, counter, stateset, info, histogram, gaugehistogram (no mapping to OTel), summary
statsd protocol	statsd receiver	counter, sampled counter, timer (histogram), gauge, set
influxdb line protocol	influxdb receiver	gauge, sum, histogram, summary
carbon plaintext protocol	carbon receiver	no types (becomes gauge or counter in receiver config)
collectd data types	collectd receiver	derive (sum), gauge, counter (overflow), absolute
signalfx proto format	signalfx receiver	gauge, (delta) counter, enum, cumulative counter

Bolded types are types which I believe don't have a 1:1 mapping to OTel types. For example, prometheus info metrics can be mapped to gauge, but the mapping isn't 1:1, as prometheus already has a gauge type. Those types would benefit from a notion of a "subtype".

dashpole · 2023-07-01T19:56:22Z

Potential subtypes:

Gauge: Prometheus unknown, collectd absolute, signalfx enum
Sum: sampled counter, overflow counter, Prometheus stateset, Prometheus info, statsd set

The only issue with statsd sampled and collectd "overflow" counters is that we would need a way to represent additional numerical information to be useful, or distinguishable from a regular counter/sum. Sampled counters would need to record the sampling rate (e.g. 0.1 for a 10% sampling rate). Overflow counters would need to record the max possible value.

Because of that, I would propose that the "subtype" field be only present on gauges to start. I would structure it similar to datapoint flags, so that it only allows a fixed set of types, rather than allow a free-form set of values.

dashpole · 2023-07-05T16:22:14Z

One issue i've just discovered is that we currently specify that Info and StatesSet metrics should become non-monotonic sums, rather than gauges. In this case we would either need to make subtype apply to at least gauges and sums.

dashpole · 2023-07-06T00:36:13Z

Prior design document

Motivation and Goals

Prometheus and other protocols have metric types which are not explicitly supported in OTLP as their own type. The impact of this is that we either have to choose to entirely drop some metric types (e.g. gaugehistogram), or represent multiple types in the protocol as the same type in OTLP. The implication of representing multiple types as a single type in OTLP is that any distinction between the "colliding" types is lost, and isn't possible to reconstruct later if/when the distinction is important.

The motivating use-case is Prometheus' unknown metric type. An unknown-typed metric is a metric which has a value, but no type information. It could be a gauge, a sum, a bucket in a histogram, or the sum of a summary -- you can't tell at scrape time. Right now, both Prometheus gauge and unknown metrics types are converted to OTLP gauges. However, it isn't possible to later determine if an OTLP gauge started out as a Prometheus gauge or unknown. This matters when trying to convert back to Prometheus, or if you are sending metrics to a backend that handles gauges differently from other metric types (e.g. doesn't allow rates on gauges).

Goal: Preserve type information for prometheus unknown, info, and stateset metric types

Requirements

Backwards-compatible changes to OTLP
No changes to OTel API or SDK. Only changes to the protocol and collector.
Collector receivers should not be required to set any new fields when constructing OTel metrics.
Collector exporters should not be required to handle the new fields to have correct behavior.

Proposal: Subtype Enums for Gauge and Sum

Prototype: open-telemetry/opentelemetry-proto#496

These additional "types" from other protocols can be thought of as "subtypes" of existing OTLP types, since they generally meet the requirements of our types, but carry additional information or stricter requirements. Since all unsupported metric types from other collector-supported protocols are gauges or sums, we only need to support gauge and sum subtypes to begin with.

Since this is a new field, using it would be opt-in. Existing collector exporters would not see a change in behavior unless they wanted to specifically handle these types differently from their OTel type. Using a different enum for gauges and sums ensures that subtypes are always properly associated with the correct OTel type (e.g. unknown with gauges). Enums also ensure that there are not naming conflicts between different possible subtypes, as they list of subtypes is part of the proto.

Alternative: Prometheus type as an additional metric attribute

Attempted in #3056

The prometheus receiver would add a prometheus.type=unknown attribute to metrics with unknown types. This would meet the goal of supporting additional type information, but would add extra labels to metrics. However, downstream consumers of this information would have "garbage" labels added to their metrics unless they filter them out with collector processors.

Alternative: Subtype String for Gauge and Sum

Instead of an Enum, we could use a string. That would allow any additional subtypes to be defined without changes to OTLP. However, that would possibly result in collisions, and we would want to define semantic conventions for the subtypes. That is likely more work, and is not worth the added flexibility.

dashpole · 2023-07-06T00:40:32Z

See #3582 for how these subtypes would be used in the prometheus specification.

dashpole · 2023-07-06T00:42:55Z

cc @open-telemetry/wg-prometheus @jmacd @bogdandrutu

I've written up a proposal for how to support "subtypes" for metrics to solve this case. Feedback on the above is appreciated.

dashpole · 2023-07-06T13:37:44Z

cc @jsuereth as well

jsuereth · 2023-07-07T12:58:38Z

I'm not sure I like representing Info + Stateset metrics as Sums. Sums imply the ability to add, either spacially or temporally, and that this is the default aggregation.

The definition you list in the PROTO file show Stateset/Info metrics requiring the SUM to have a value of 1. Is that a temporal only restriction? Do you still spatially aggregate by default?

My "litmus test" for OTLP metric types is whether or not the default aggregation makes sense. Unknown + Gauge, I feel are in pretty similar boats. It seems to me though that Info + Stateset may actually be closer to Gauge then Sum, but I'd like to better understand what I'm seeing first.

dashpole · 2023-07-07T13:36:47Z

That decision was originally made in #2380 (comment). This preserves that decision, although placing it in the proto would make it permanent.

I do think you would want to sum by default when aggregating info and stateset metrics. They both as a "count of things with properties" or a "count of things in a state". Since the definition is in the data model, you are right that we can't require the SUM to have a value of 1, since it could have been aggregated at that point. We would have to define them as a count of things in a state instead. If we ever added an API/SDK for these metrics, we would want to require that the measurements are always 1, but we can't require that in the data model.

dashpole · 2023-07-11T15:21:29Z

Notes from the Spec meeting today:

@tigrannajaryan: Makes sense to use this method to preserve prometheus types. Left feedback on the OTLP proto changes. ~~@dashpole needs to consider and document how to handle version skew between old and new versions.~~ Done here
@jsuereth : Make sure the default aggregation matches what we would expect for info and stateset.
@jmacd: Will review, and will reconsider whether info and statesets should be sums.

dashpole · 2023-07-21T13:18:55Z

I've removed stateset and info from the proto PR. For now, lets just address unknown-typed metrics, as that is not contentious, and is the most important of the types to support.

pirgeo · 2023-07-21T15:39:33Z

Excuse my ignorance, but what is the point of this Prometheus unknown in the first place? Even in the OpenMetrics spec it says

Unknown SHOULD NOT be used.

It states that it should be used when it's impossible to determine what type that data is because it comes from some 3rd party integration. That means to me that we simply don't know what it is, and that is true for at least two hops (third party -> Prometheus -> OTLP). If we haven't managed (or cared) to fix the data at that point, does it really make sense to complicate the OTLP proto just to forward the information that we actually don't know what we are dealing with? Is it not reasonable to ask whoever is sending this data to make a decision before sending it off? I don't want to seem negative here, I would simply like to understand the importance/use-cases of this change.

dashpole · 2023-07-21T16:08:59Z

There are many preexisting Prometheus applications that don't send type information. This is primarily because prometheus' data model and the promql query language don't require type information to function. Today, it is strongly recommended to include type information in Prometheus endpoints, and most client libraries will include it automatically. But it is still useful to be able to support older exporters that don't include it.

That means to me that we simply don't know what it is, and that is true for at least two hops (third party -> Prometheus -> OTLP)

I think it would just be a single hop: Older Prometheus application -> OTel collector Prometheus receiver.

If we haven't managed (or cared) to fix the data at that point, does it really make sense to complicate the OTLP proto just to forward the information that we actually don't know what we are dealing with?

Fixing types isn't straightforward today, at least in the collector. We don't have processors for changing gauges to counters, or vice-versa. Additionally, we would still need this change to make it possible for a processor to identify metrics that were previously unknown. Users also generally expect to be able to treat prometheus unknown-typed metrics as any type of metric when querying with promql, so backends that want to offer promql querying may need to treat them in special ways.

pirgeo · 2023-07-21T18:17:45Z

I see, thanks for clarifying. I guess there are many legacy things in Prometheus land that I don't know about. If we had these counter to gauge or gauge to counter converters we would not really need this notion anymore though, right? Instead of doing the special handling at query time, it would encourage users to do the special handling earlier in the pipeline. At the same time, the OTLP format would not have to include everything that Prometheus ever did/will do, including things that are mostly that way for legacy reasons.
As an example, when metrics from an app with Prometheus get scraped by the collector, one could convert all metrics that start with count (because all counters in the app are alreadt named count_xyz) to counters and leave everything else as gauges.

dashpole · 2023-07-21T18:35:31Z

If we had these counter to gauge or gauge to counter converters we would not really need this notion anymore though, right? ... one could convert all metrics that start with count (because all counters in the app are already named count_xyz) to counters and leave everything else as gauges.

It would need to exist upstream of OTel (e.g. in the Prometheus server, or in the prometheus receiver). The collector's datamodel is essentially OTLP, so we would need this change if we wanted to offer that functionality in processors. It could exist in the receiver, but requires potentially per-metric configuration at that point.

It is fairly common with prometheus to use a single configuration to scrape everything in a kubernetes cluster (example), and adding pod annotations to opt-into scraping. Being able to still collect all of the metrics, even if I don't know the types upfront, and potentially add it later is still quite useful.

pirgeo · 2023-07-22T07:46:12Z

I am just wondering: If you have unknowns today, would you do something like "transform everything that has datatype unknown to count" when you read it with PromQL? Or would you do "give me the metric xyz, and interpret it as count"? If that is the case, why not move that step to before transforming into OTLP? I get that it might be helpful to simply get all metrics and then deal with the problems on read, but users already have to configure their collectors one way or another. Why not add the conversion where prometheus is transformed to OTLP instead of putting the additional information on the OTLP messages?

I don't want to seem stubborn here. I probably just haven't spent enough time in big Prometheus deployments to see the impossibility of what I am proposing, but from the "outside" it seems a bit weird to extend OTLP (with all it's well defined data types) to say "Well, actually let's add this flag to say we don't know what that is. Let's just transport it and deal with that later.".

I also wonder if this could be a point that would encourage instrumentation authors to move away from unknown. If users were forced to set it up manually, they would put pressure on the authors of the libraries, and in turn everybody's data quality goes up (although this may be wishful thinking).

dashpole · 2023-07-24T12:45:05Z

would you do something like "transform everything that has datatype unknown to count" when you read it with PromQL? Or would you do "give me the metric xyz, and interpret it as count"?

With PromQL today, Prometheus users can say, "give me the metric xyz, and interpret it as count" (e.g. perform a rate on it).

users already have to configure their collectors one way or another. Why not add the conversion where prometheus is transformed to OTLP instead of putting the additional information on the OTLP messages?

It is common for a single, static prometheus scrape configuration to be used for the entire cluster, with limited, per-application configuration done using pod annotations or similar. It is also common for the team that manages ops workloads to be different from those managing the application being scraped, and they may not know anything about the metrics they are collecting at scrape time.

pirgeo · 2023-07-25T09:18:02Z

With PromQL today, Prometheus users can say, "give me the metric xyz, and interpret it as count" (e.g. perform a rate on it).

I assume users with dashboards already set up can do that independent of how the data was ingested (i.e., it does not matter if unknown was propagated all the way to the backend). Is that correct?

It is also common for the team that manages ops workloads to be different from those managing the application being scraped, and they may not know anything about the metrics they are collecting at scrape time.

Gotcha. Would that not be an even stronger case for making the person who creates and actually knows which metrics they produce say "This is a gauge" or "This is a counter"?

I do think that this issue is a prototype of future OTel <> Prometheus interoperability questions. I actually don't know what the plan for that is. Will OTel support everything that has ever come or will ever come out of Prometheusland? Is there a cutoff point where OTel says "We will not implement a legacy Prometheus feature in the name of OTLP data quality"? This was the point I was trying to make here. If we go the "We will do anything for Prom compatibility route", what happens if Prometheus introduces a feature that is incompatible with OTel? Will we break OTLP and release a 2.0 to deal with that? How will that affect users?

I don't think this is for us to decide here, but something for the TC (@open-telemetry/technical-committee) to discuss and decide. Personally, I think it would be fair to ask users who move their data from Prometheus to OTLP to fix their integrations on the client side (or put pressure on the authors of instrumentations to update, or even switch to OTel SDKs). Otherwise, it's a "garbage in, garbage out" situation in my eyes. I understand that it would probably mean a lot of work in a lot of places in the short term, but ultimately striving for high data quality should be a goal for all telemetry producers and should provide value for everyone.

…nown-typed metrics in OTLP (#3868) Fixes #3058 Fixes #1712 Fixes #2409 ## Changes Use metric.metadata (added in open-telemetry/opentelemetry-proto#514) to support representing prometheus metric types in OTLP.

mx-psi · 2024-08-20T11:20:33Z

Should metric metadata be described in the Metrics data model? The situation seems similar to #4053

…nown-typed metrics in OTLP (open-telemetry#3868) Fixes open-telemetry#3058 Fixes open-telemetry#1712 Fixes open-telemetry#2409 ## Changes Use metric.metadata (added in open-telemetry/opentelemetry-proto#514) to support representing prometheus metric types in OTLP.

khanhntd added the spec:metrics Related to the specification/metrics directory label Dec 21, 2022

github-actions bot assigned tigrannajaryan Dec 21, 2022

khanhntd mentioned this issue Dec 21, 2022

Prometheus Unknown MUST preserve its notion #3056

Closed

dashpole mentioned this issue Jul 5, 2023

Use metric subtypes (TBD) for prometheus info, stateset, and unknown metrics #3582

Closed

5 tasks

dashpole mentioned this issue Jul 6, 2023

Add gauge subtype for prometheus unknown type open-telemetry/opentelemetry-proto#496

Closed

dashpole mentioned this issue Oct 25, 2023

Tracker: Stabilize prometheus compatibility specification #3737

Open

dashpole mentioned this issue Nov 13, 2023

flag includes invalid characters open-telemetry/opentelemetry-collector-contrib#29158

Closed

dashpole mentioned this issue Nov 29, 2023

Add metric.metadata for supporting additional metadata on metrics open-telemetry/opentelemetry-proto#514

Merged

dashpole mentioned this issue Feb 7, 2024

Prometheus compatibility: represent Prometheus Info, StateSet and Unknown-typed metrics in OTLP #3868

Merged

5 tasks

carlosalberto closed this as completed in #3868 Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Unknown MUST preserve its notion #3058

Prometheus Unknown MUST preserve its notion #3058

khanhntd commented Dec 21, 2022

damemi commented Jun 15, 2023

dashpole commented Jul 1, 2023

dashpole commented Jul 1, 2023 •

edited

Loading

dashpole commented Jul 1, 2023 •

edited

Loading

dashpole commented Jul 5, 2023

dashpole commented Jul 6, 2023 •

edited

Loading

dashpole commented Jul 6, 2023

dashpole commented Jul 6, 2023

dashpole commented Jul 6, 2023

jsuereth commented Jul 7, 2023

dashpole commented Jul 7, 2023

dashpole commented Jul 11, 2023 •

edited

Loading

dashpole commented Jul 21, 2023

pirgeo commented Jul 21, 2023

dashpole commented Jul 21, 2023

pirgeo commented Jul 21, 2023

dashpole commented Jul 21, 2023

pirgeo commented Jul 22, 2023

dashpole commented Jul 24, 2023

pirgeo commented Jul 25, 2023

mx-psi commented Aug 20, 2024

Prometheus Unknown MUST preserve its notion #3058

Prometheus Unknown MUST preserve its notion #3058

Comments

khanhntd commented Dec 21, 2022

damemi commented Jun 15, 2023

dashpole commented Jul 1, 2023

dashpole commented Jul 1, 2023 • edited Loading

dashpole commented Jul 1, 2023 • edited Loading

dashpole commented Jul 5, 2023

dashpole commented Jul 6, 2023 • edited Loading

Motivation and Goals

Proposal: Subtype Enums for Gauge and Sum

Alternative: Prometheus type as an additional metric attribute

Alternative: Subtype String for Gauge and Sum

dashpole commented Jul 6, 2023

dashpole commented Jul 6, 2023

dashpole commented Jul 6, 2023

jsuereth commented Jul 7, 2023

dashpole commented Jul 7, 2023

dashpole commented Jul 11, 2023 • edited Loading

dashpole commented Jul 21, 2023

pirgeo commented Jul 21, 2023

dashpole commented Jul 21, 2023

pirgeo commented Jul 21, 2023

dashpole commented Jul 21, 2023

pirgeo commented Jul 22, 2023

dashpole commented Jul 24, 2023

pirgeo commented Jul 25, 2023

mx-psi commented Aug 20, 2024

dashpole commented Jul 1, 2023 •

edited

Loading

dashpole commented Jul 1, 2023 •

edited

Loading

dashpole commented Jul 6, 2023 •

edited

Loading

dashpole commented Jul 11, 2023 •

edited

Loading