Add processed/exported Span metrics. #184

carlosalberto · 2023-07-10T15:21:05Z

Initial stab at adding metrics for processed/exported/dropped Spans. Made this metrics Span-specific, instead of specifying the signal type (metrics, logs), as not all signals may have the same semantics.

Any feedback will be greatly appreciated - although I'm specially interested in:

dropped label - I think we could massage it to make it a common one, maybe spans.dropped? @jsuereth may have an opinion here.
exporter.type and processor.type - is there any value in sharing this as well? Maybe otel.component?

Fixes part of #83

reyang · 2023-07-10T15:28:53Z

docs/otel/export-metrics.md

+<!-- semconv metric.otel.exporter.spans(full) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] |  | Required |


There are 3 states from the exporter's view:

delivered (the exporter has received the confirmation from the ingestion)

dropped (the exporter decided to drop the data knowing that the ingestion didn't accept it)

unknown (the exporter has sent the data, but it has no idea whether the data is accepted or not by the ingestion)

Oh, good one, yes. So I guess my initial approached took for granted this:

dropped=false: Delivered, with confirmation from the ingestion step.

dropped=true: Either consecutive attempts were exhausted OR it's not confirmed ingestion was successful (including error on the server side).

Think it would make sense to have 3 values instead of two here (as suggested by your comment)?

(On a related note: I will follow up this PR, once it's merged, but some additional metrics, such as retries count at the export level, etc)

jsuereth

I do think processor.dropped and exporter.dropped are quite generic in name.

Are we trying to generalize a pipeline semantic where we can generically track items through a pipleine (like an ESB, something like Apache Camel, etc). or are these meant to be OTEL specific?
Will these be the same between the OTEL Collector and the SDK? IF not, I think you may want a prefix denoting that.

jsuereth · 2023-07-10T16:25:54Z

docs/otel/export-metrics.md

+<!-- semconv metric.otel.processor.spans(full) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `processor.dropped` | boolean | Whether the Span was dropped or not. [1] |  | Required |


This does not included un-sampled spans, right?

Right, this doesn't include un-sampled Spans. Will clarify that in a comment.

for reference: https://github.com/jaegertracing/jaeger-client-go/blob/8d8e8fcfd04de42b8482476abac6a902fca47c18/metrics.go#L22-L106

jsuereth · 2023-07-10T16:26:58Z

docs/otel/export-metrics.md

+<!-- semconv metric.otel.exporter.spans(full) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] |  | Required |


I'm trying to understand how to use this counter and what "dropped' actually means.

Does dropped mean you never tried to export, or that you did try and gave up?

Oh yes. It means that either the spans were dropped for an irrecoverable error (e.g. 4xx), or the max retries count has been reached. You are looking into this clarification or are thinking about some other scenarios as well? (Clarify this as well in this PR in a bit)

carlosalberto · 2023-07-11T14:27:58Z

Hey @jsuereth thanks for the questions.

or are these meant to be OTEL specific?

I'd personally go with these for now. It feels to me that defining semantic conventions that cover general values outside of OTel would require a lot of conversations previously.

Will these be the same between the OTEL Collector and the SDK? IF not, I think you may want a prefix denoting that.

So I was thinking they may be shared but after briefly checking, I'm a little more oriented towards making this SDK specific - this is because the Collector may have slightly different needs (labels/dimensions).

yurishkuro · 2023-07-11T15:16:55Z

docs/otel/README.md

+
+**Status**: [Experimental][DocumentStatus]
+
+This document defines semantic conventions for OTel components (such as processors, exporters, etc).


OTel SDK components?

yurishkuro · 2023-07-11T15:23:17Z

+1 for this work.

I am a bit troubled by this being defined as a semantic convention. There's certainly an aspect of that in terms of how the metrics are named, but I also think this should be part of the SDK spec, as in all SDKs should be required to provide these metrics about themselves.

Also, there are more to the SDK observability than just the exporter metrics. Here, for example, are the metrics that Jaeger SDKs used to expose: https://github.com/jaegertracing/jaeger-client-go/blob/master/metrics.go#L22-L106

tigrannajaryan · 2023-07-11T15:41:18Z

It would be nice to have these metrics shared between SDKs and Collector.

yurishkuro · 2023-07-11T19:12:01Z

It would be nice to have these metrics shared between SDKs and Collector.

There needs to be a clear distinction between SDK-produced metrics and Collector-produced, because they can co-exist in the same process.

I am not fond of the naming scheme being proposed here. I would rather see a hierarchy:

otel
- sdk
  - trace
    - tracer
      - stats on # of traces / spans started, finished, etc.
    - processor
      - batch
        
        queue_length
        
        dropped_spans
    - exporter
      - otlp
        
        batches
        
        spans
- collector
  - potentially similar sub-hierarchy

tigrannajaryan · 2023-07-11T19:15:35Z

There needs to be a clear distinction between SDK-produced metrics and Collector-produced, because they can co-exist in the same process.

Are you envisioning Collector using Otel SDK and thus Collector processor will need to emit both sets of metrics? That makes sense, however do we need the metric names to be different? They can differ by an attribute.

yurishkuro · 2023-07-11T19:28:21Z

Are you envisioning Collector using Otel SDK and thus Collector processor will need to emit both sets of metrics?

The use case we had in the Jaeger land where Jaeger backend components themselves were instrumented for tracing and would emit SDK-scoped metrics, which would then conflict with collector-specific metrics.

however do we need the metric names to be different? They can differ by an attribute.

I don't know if we have this in the guidelines somewhere, but I consider attributes to be "optional" dimensions of the metrics that can be dropped (aggregated over) without destroying semantic meaning of the metric. E.g. tagging a metric with "region" is good, one can aggregate over it if by-region breakdown is not needed. But aggregating metrics from SDK and Collector into one time series does not make sense semantically (e.g. you would double-count spans if you did), so I would expect them to have different names, not just different attributes.

tigrannajaryan · 2023-07-11T20:29:02Z

I don't know if we have this in the guidelines somewhere, but I consider attributes to be "optional" dimensions of the metrics that can be dropped (aggregated over) without destroying semantic meaning of the metric. E.g. tagging a metric with "region" is good, one can aggregate over it if by-region breakdown is not needed. But aggregating metrics from SDK and Collector into one time series does not make sense semantically (e.g. you would double-count spans if you did), so I would expect them to have different names, not just different attributes.

I don't think we have a guideline like that. I am not sure such a guideline can be applicable to all and every use case.

Here is an existing use case that would break if we were to adopt such a guideline. Collector processors currently emit metrics about the number of data points processed. There can be multiple processors chained sequentially. Each processor emits a measurement, with the value of the "processor" dimension set equal to the processor name. Aggregating over the "processor" dimension will indeed result in double-counting. It seems you are suggesting that a practice like that is invalid and we shouldn't be doing this in the Collector, but I am not sure what would be a reasonable alternate.

Collector also has similar metrics for receiver and exporter components, where component name is a dimension.

As a comparison Prometheus has the following guideline:

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful).

Of course the interpretation of "should be meaningful" is up for debate. I could argue that sum() is meaningful, it tells how many times datapoints were processed overall by different processors (including SDK and Collector). Not a very useful aggregation, but Prometheus doesn't require it to be useful, only meaningful. :-)

If we change this to "drop" metric (which Collector also records) the sum() becomes quite useful: it gives the total number of dropped data points dropped by all processors, i.e. anywhere in the Collector.

Here is another scenario that I think shows that a guideline "aggregating away a dimension should not result in double counting" is unnecessarily restrictive:

Let's imagine a system where an incoming request requires participation of service A and service B, where service A calls service B for help. Both service A and service B emit a very standard "http.request.count" metric with "service.name" dimension. Do we consider it meaningless to aggregate away the "service.name" of "http.request.count" metric because it would mean double-counting of incoming request and thus this recording practice is invalid? What doe we expect service A and B to do instead?

yurishkuro · 2023-07-12T04:09:37Z

I like the Prometheus guideline. It's practical, and says almost the same thing as I am saying. It's not a strict rule, since there is no one-size-fits-all solution, it expects a judgement call of how generous we want to be with the interpretation of "meaningful". An even more generous interpretation is every time we bump a counter it's because of some event, so we can call the metric "events" and make everything else an attribute, and the sum() is the total count of events - meaningful, yet impractical.

So we'd need some other criteria of what a practical "meaningful" is. I am not strongly opposed to having identical metrics from SDK and collector and only separating them by attribute (named how, btw?), but I am not seeing a strong argument why that would be a good idea either.

A specific practical challenge we had in the past in Jaeger is that we used the prom-client as the metrics SDK, and it didn't allow redefining a metric with the same name twice in the same process (even with different attributes). So having two unrelated layers in the code emit the same metric required some special tricks, wasn't possible with off-the-shelf prom-client. Not saying it's a problem with the OTEL SDK, just shows that there was a little more to "meaningful" in the Prom's guideline.

If we already have an established pattern of having different components emit identically named metrics, then we could continue with that pattern. It reminds me of another discussion about resource-identifying attributes vs. all metric attributes: here we designate some attributes as "semantics-changing".

jmacd · 2023-07-18T15:47:16Z

I've been diagnosing an OTel Collector that is instrumented with an SDK that exports through OTel Collector components arranged as an SDK exporter. The result is similar to what @yurishkuro described, I find myself confused because I'm getting metrics about dropped spans in two ways, both the collector pipeline's primary drops and the collector's instrumentation (secondary) drops.

Although they are both literally "spans dropped", I would rather not try to make meaning from an aggregation of the two.

I think it would be nice for us to define the semantics carefully for what we want to count, and then use separate but similar metric names for SDKs and collectors. The meaning of "Drop" has to be made very clear. Definitions for "Refused" and "Failed" and other adjectives we use are the important part here: we can re-use the definitions without re-using the same metric names.

dashpole

+1 to using the same metrics in SDKs and the collector. Would instrumentation scope be the right way to distinguish between collector and SDK usage of them? instrumentation scope should become a label in prometheus (to yuri's line of questioning)

dashpole · 2023-07-18T18:27:23Z

docs/otel/export-metrics.md

+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] |  | Required |
+| `exporter.type` | string | Type of exporter being used. | `OtlpGrpcSpanExporter` | Recommended |


Is this essentially the same as the instrumentation scope name?

dashpole · 2023-07-18T18:35:13Z

docs/otel/export-metrics.md

+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] |  | Required |
+| `exporter.type` | string | Type of exporter being used. | `OtlpGrpcSpanExporter` | Recommended |


Would different instances of the same "exporter" get a different type, or the same type? E.g. if I have two OTLP grpc span exporters, would I be able to tell how many spans each was exporting? In the collector today, I believe we can tell the difference between two instances of the same component.

Same type ;) I don't think we have a (clear) notion regarding differentiating two processors/exporters of the same type (at least in the SDK).

tigrannajaryan · 2023-07-19T15:04:32Z

So we'd need some other criteria of what a practical "meaningful" is. I am not strongly opposed to having identical metrics from SDK and collector and only separating them by attribute (named how, btw?)

@yurishkuro I think for me this is a better argument against using the same metric. I don't know what the attribute name would be.

@dashpole Using Scope name is a possibility but I am not sure what would be the values that we would use to distinguish between SDKs and Collector.

tigrannajaryan · 2023-07-19T15:08:17Z

@open-telemetry/collector-approvers @open-telemetry/collector-maintainers any thoughts on this? What are the current metrics we use in the Collector and do you think we could adopt in the Collector the conventions that this PR suggests?

djaglowski · 2023-07-19T16:32:48Z

@open-telemetry/collector-approvers @open-telemetry/collector-maintainers any thoughts on this? What are the current metrics we use in the Collector and do you think we could adopt in the Collector the conventions that this PR suggests?

The spanmetric connector emits two metrics:

calls, a sum describing the number of spans observed
duration, a histogram of span durations

Both metrics can optionally have a namespace prepended via user confiugration.

I think calls could be changed to otel.processor.spans. However, whether or not the span is dropped is not directly determined by the connector so we would need to identify a way to handle this attribute.

jpkrohling · 2023-07-19T20:05:46Z

any thoughts on this? What are the current metrics we use in the Collector and do you think we could adopt in the Collector the conventions that this PR suggests?

We don't seem to have a list of metrics emitted by the collector. We have metrics that are generated by the collector facilities (otelcol_exporter_enqueue_failed_metric_points), and metrics generated by components directly (like otelcol_loadbalancer_backend_latency_bucket).

I would absolutely welcome aligning the names/prefixes with the SDKs and I believe the ones @carlosalberto specifically proposed would be relatively easy to adopt.

For reference, these are the ones I have in mind:

https://github.com/open-telemetry/opentelemetry-collector/blob/b5e511ce31f22fd3d4817236792245fe1bd88ef8/obsreport/obsreport_exporter.go#L88-L122

jmacd · 2023-07-27T23:33:09Z

@jpkrohling Thank you for highlighting what is, I think, a source of confusion about the collectors metrics. For a given unit of data (span, metric data point, log record), how many distinct standard outcomes are there? The exporterhelper generates at 1 and the obsreport genrates 2, I think:

How many spans were successfully sent (obsreport)
How many spans were dropped because they could not be sent successfully (obsreport)
How many spans were dropped because a queue was full meaning send never attempted (helper)

Note that when an export is retried because configuration allows it, the items should not be counted multiple times, an in-flight export is not counted until its retries have been exhausted.

jmacd · 2023-07-27T23:36:04Z

@tigrannajaryan @dashpole

@dashpole Using Scope name is a possibility but I am not sure what would be the values that we would use to distinguish between SDKs and Collector.

We haven't been very prescriptive about scope names, but we could be for these special cases. For example, I prefer these names to be short so I might suggest scope name "open-telemetry/collector" (w/ collector release version) and "open-telemetry/sdk" (w/ SDK release version--which language identifiable via resource).

jmacd · 2023-07-27T23:48:52Z

The list of three metrics above (#184 (comment)) is, I think, incomplete when we consider how to handle PartialSuccess responses. When an item is dropped by the recipient for some kind of malformed data problem those spans should not be counted as successfully sent (category 1), they should be counted in a 4th category, which I believe other collector components refer to as "refused".

How many spans were successfully sent ("sent")
How many spans were dropped because they could not be sent successfully ("failed")
How many spans were dropped because a queue was full meaning send never attempted ("dropped")
How many spans were received and rejected by the consumer ("refused")

carlosalberto · 2023-09-07T12:38:54Z

Hey @jmacd

I prefer these names to be short so I might suggest scope name "open-telemetry/collector" (w/ collector release version) and "open-telemetry/sdk"

There's the case of different exporters, e.g. "OtlpGrpcExporter", "OtlpHttpExporter", "OtlpHttpMyFeatureExporter", etc.

incomplete when we consider how to handle PartialSuccess responses

IIUC, you want to count the value from PartialSuccess as "refused", more than failed/dropped, right? In theory adding the dimension to something like status=success|dropped|refused should help?

jmacd · 2023-10-24T00:36:02Z

@carlosalberto after reviewing the Collector and trying to make its metrics consistent with your proposal here, the result I came to is here: carlosalberto#1

This proposes one metric per component, with three levels of detail to cover basic, normal, and detailed use-cases.

jmacd · 2023-10-24T19:04:05Z

This was presented in today's Specification SIG, and while some discussion already began over the PR mentioned above (#184 (comment)), the agreement today was that I should open a new PR in this repository with a new PR history, then present it at next meetings of Collector SIG and Sem-Conv SIG.

jjatria · 2023-11-07T09:08:05Z

docs/otel/export-metrics.md

+<!-- semconv metric.otel.processor.spans(full) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `processor.dropped` | boolean | Whether the Span was dropped or not. [1] |  | Required |


Is this missing the otel. prefix? So otel.processor.dropped instead of processor.dropped?

jmacd · 2023-12-12T03:56:11Z

Closing in favor of #598

Add processed/exported Span metrics.

60eee99

carlosalberto requested review from a team July 10, 2023 15:21

github-actions bot assigned jsuereth Jul 10, 2023

reyang reviewed Jul 10, 2023

View reviewed changes

jsuereth reviewed Jul 10, 2023

View reviewed changes

yurishkuro reviewed Jul 11, 2023

View reviewed changes

dashpole reviewed Jul 18, 2023

View reviewed changes

jack-berg mentioned this pull request Jul 19, 2023

Change processedSpans&processedLogs labels open-telemetry/opentelemetry-java#5631

Closed

dashpole mentioned this pull request Aug 15, 2023

Add metrics for export RPCs GoogleCloudPlatform/opentelemetry-operations-go#707

Open

Merge branch 'main' into add-span-dropped-metrics

f65095c

carlosalberto mentioned this pull request Aug 28, 2023

OTel components observability WG question open-telemetry/community#1657

Closed

carlosalberto added 2 commits September 6, 2023 16:26

Merge branch 'main' into add-span-dropped-metrics

8679683

Merge branch 'main' into add-span-dropped-metrics

3ce6cd1

This was referenced Oct 6, 2023

obsreport's TracesDropped/MetricsDropped/LogsDropped are not actually used open-telemetry/opentelemetry-collector#5056

Open

[exporterhelper] make enqueue failures available open-telemetry/opentelemetry-collector#8674

Merged

jmacd mentioned this pull request Oct 31, 2023

Semantic conventions for telemetry pipeline monitoring open-telemetry/oteps#238

Closed

jjatria reviewed Nov 7, 2023

View reviewed changes

jmacd mentioned this pull request Dec 12, 2023

Pipeline monitoring metrics [WIP Draft] #598

Closed

jmacd closed this Dec 12, 2023


		Status: [Experimental][DocumentStatus]

		This document defines semantic conventions for OTel components (such as processors, exporters, etc).

Add processed/exported Span metrics. #184

Add processed/exported Span metrics. #184

Conversation

carlosalberto commented Jul 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsuereth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlosalberto commented Jul 11, 2023

Choose a reason for hiding this comment

yurishkuro commented Jul 11, 2023 • edited Loading

tigrannajaryan commented Jul 11, 2023

yurishkuro commented Jul 11, 2023

tigrannajaryan commented Jul 11, 2023

yurishkuro commented Jul 11, 2023 • edited Loading

tigrannajaryan commented Jul 11, 2023

yurishkuro commented Jul 12, 2023

jmacd commented Jul 18, 2023

dashpole left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Jul 19, 2023

tigrannajaryan commented Jul 19, 2023

djaglowski commented Jul 19, 2023

jpkrohling commented Jul 19, 2023

jmacd commented Jul 27, 2023

jmacd commented Jul 27, 2023

jmacd commented Jul 27, 2023

carlosalberto commented Sep 7, 2023

jmacd commented Oct 24, 2023

jmacd commented Oct 24, 2023

Choose a reason for hiding this comment

jmacd commented Dec 12, 2023

yurishkuro commented Jul 11, 2023 •

edited

Loading

yurishkuro commented Jul 11, 2023 •

edited

Loading

dashpole left a comment •

edited

Loading