OM 2.0: OM protobuf future

## Problem Statement

[OM proto](https://github.com/prometheus/OpenMetrics/blob/3bb328ab04d26b25ac548d851619f90d15090e5d/proto/openmetrics_data_model.proto) is not currently adopted (Prometheus libs and main binary is not aware of it).

Prometheus ecosystem still use and invest in [Prometheus Proto](https://github.com/prometheus/client_model/blob/master/io/prometheus/client/metrics.proto) although in the past it was attempted to be [deprecated](https://github.com/prometheus/client_model). ([proto3 version](https://github.com/prometheus/prometheus/blob/main/prompb/io/prometheus/client/metrics.proto)). Currently it's on the way to be used as a default scrape configuration (it's default for native histograms and bunch of other feature flags).

Given that, it's not clear if, as a part of [OM 2.0 WG](https://docs.google.com/document/d/1FCD-38Xz1-9b3ExgHOeDTQUKUatzgj5KbCND9t-abZY/edit?tab=t.lvx6fags1fga#heading=h.3hwe5brp1hff) we should continue OM proto, improve it or remove from OM completely and recommend the existing Prometheus proto. Note that this is a separate topic to the OM text which is the main area of the OM 2.0 focus.

### OM Proto vs Prometheus Proto

Protocols are pretty similar, both uses similar MetricFamily abstraction and have similar gauge, counter histogram, summary structures. They do differ a little bit though too:

[OM proto](https://github.com/prometheus/OpenMetrics/blob/3bb328ab04d26b25ac548d851619f90d15090e5d/proto/openmetrics_data_model.proto):
* Uses repeated [proto oneof for `Metric.MetricPoint`](https://github.com/prometheus/OpenMetrics/blob/3bb328ab04d26b25ac548d851619f90d15090e5d/proto/openmetrics_data_model.proto#L77) on the already repeated `Metric`.  The `repeated` part is interesting, because potentially encourages sending multiple points (e.g. historical too), not only current values, not sure if intended.
* Every value can be either double or int.
* Defines `MetricSet` that blocks [major optimizations possible with PrometheusProto delimited format](https://github.com/prometheus/prometheus/pull/15731).
* Uses not recommended package name format (nit).
* Lacks native histogram support.
* Is versioned.

 [Prometheus Proto](https://github.com/prometheus/client_model/blob/master/io/prometheus/client/metrics.proto) ([proto3](https://github.com/prometheus/prometheus/blob/main/prompb/io/prometheus/client/metrics.proto))
* Uses non repeated "implicit oneof" defined directly into [`Metric` for each metric value](https://github.com/prometheus/prometheus/blob/main/prompb/io/prometheus/client/metrics.proto#L144).
* Every value has to be double.
* Supports native histogram.
* Uses the delimited format that allows to send each metric family in separate message allowing [streaming parsers](https://github.com/prometheus/prometheus/pull/15731).
* Misses `Info` and `StateSet` MetricTypes (both are interpreted as gauges in Prometheus as of now).
* Has inconsistent timestamps. Some use ` google.protobuf.Timestamp timestamp = 3; // OpenMetrics-style.`, some use ` int64              timestamp_ms = 6;`. The latter is easier (and faster) to use, but `0` means not set, which blocks the use of the exact 0 millisecond timestamp (implicitly accepted in many places in Prometheus e.g. Remote Write).

To sum up, PrometheusProto is closer to what Prometheus implements now, including native histograms. It also unblocks a bit more efficient parsing. On the other hand OM Proto is consistent with OM 1.0 types and makes it a bit easier (?) to send historical samples for the same series. OM proto is also strictly versioned (read below why that's important).

### Protobuf versioning

During WG discussions there was a point made around protobuf versioning -- the fact it does not need strict minor/patch versioning as we can do a lot of changes without breaking users or user interaction.

I would argue, in the world of data heavy network protocols like OM or Remote Write that's not **practically** true. Generally, we need to use the same versioning structructure as for the text format. 

Examples:
* We add `schemaURL` attribute to MetricFamily one day. Adding field with this new information is not a breaking change. However, without a concrete minor version bump this change won't be well announced. This is also the same if our text format make a MUST on skipping unknown lines.
* The addition of Info and StateSet metric types to Prometheus Proto. One could say it's not a breaking change. Normally adding fields to protobuf is not breaking and on the protocol correctness, it's true it will not *crash* encoding/decoding. However such a change is **practically* semantically breaking, because when SDK/client upgrades and starts to generate MetricFamily for e.g. `Info` type it has to decide where to put it (a) as the new `Info` type, (b) old, deprecated for info metrics, `Gauge` type or (c) both. To not break user it would need to be (c), but it's not practically possible for complexity and efficiency reasons (not easily compressible duplicated data send over network, detecting duplicates on parse).

To sum up, some versioning and content negotiation might be needed for protobuf protocols as well.

## Proposed solution

Implementing Protobuf support, efficiently was a big task, and PrometheusProto unblocks streaming and is already adopted. There's also not many differences vs OM Proto that would motivate the ecosystem to adopt OM proto either.

Perhaps the best course of action would be:

1. Deprecate the OM 1.0 Proto.
2. Release the OM 2.0 without Protobuf schema.
3. Release the official versioned spec (1.0/0.1?) document for PrometheusProto (on prometheus.io docs) and iterate on it (e.g. 1.0/1.1/2.0 with OM types at some point and decision around timestamp 0s). Put the proto in one offcial place (prometheus/prometheus and buf registry), remove gogo parts (doable with new custom parser now).

Pros:
* Allowing separate versioning/lifetime for text vs proto (also a downside, maybe consistency is useful).
* Iterating on the adopted protocol instead of iterating on not used one, risking less adoption in future.
* No need to reimplement parsers.
* The most efficient option and we know even existing proto parsing has a lot of overhead (until we fix magic suffixes).
* Clear state of PrometheusProto.
* Less work?

Cons:
* Losing "OM" badge for protobuf protocol, although OM is Prometheus since last year.
* Inconsistency between OM 1.0 and 2.0.
* Impacting existing OM 1.0 Proto users (we don't know of any, but there might be some).

## Alternatives considered

* Iterate on OM Proto 1.0 in OM 2.0, deprecate PrometheusProto.

We could add native histograms in OM 2.0. For efficiency we could introduce delimited format. Then we kind of reimplement PrometheusProto though under OM umbrella, which is Prometheus umbrella now. Perhaps not worth it?

Iterating on adopted protocol feels better for the ecosystem too.

* Develop a completely new OM Proto 2.0 in OM 2.0, deprecate PrometheusProto.

Interesting, but do we have resources for this. The only benefit I see is the opportunity to rethink "MetricFamily" concept that does not exists (and does not make sense) in Prometheus. That would be only readability improvement, nothing more 🤔  

* Deprecate all proto protocols

At some point that was an intention. However protobuf was useful for experiments (it's the only protocol that has practical native histograms for the last few years) and it's likely to be more efficient once Prometheus switches to complex types and we finalize the gogo/custom generator aspect.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OM 2.0: OM protobuf future #296

Problem Statement

OM Proto vs Prometheus Proto

Protobuf versioning

Proposed solution

Alternatives considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OM 2.0: OM protobuf future #296

Description

Problem Statement

OM Proto vs Prometheus Proto

Protobuf versioning

Proposed solution

Alternatives considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions