Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add processed/exported Span metrics. #184

Conversation

carlosalberto
Copy link
Contributor

Initial stab at adding metrics for processed/exported/dropped Spans. Made this metrics Span-specific, instead of specifying the signal type (metrics, logs), as not all signals may have the same semantics.

Any feedback will be greatly appreciated - although I'm specially interested in:

  • dropped label - I think we could massage it to make it a common one, maybe spans.dropped? @jsuereth may have an opinion here.
  • exporter.type and processor.type - is there any value in sharing this as well? Maybe otel.component?

Fixes part of #83

@carlosalberto carlosalberto requested review from a team July 10, 2023 15:21
<!-- semconv metric.otel.exporter.spans(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] | | Required |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 states from the exporter's view:

  1. delivered (the exporter has received the confirmation from the ingestion)
  2. dropped (the exporter decided to drop the data knowing that the ingestion didn't accept it)
  3. unknown (the exporter has sent the data, but it has no idea whether the data is accepted or not by the ingestion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good one, yes. So I guess my initial approached took for granted this:

  • dropped=false: Delivered, with confirmation from the ingestion step.
  • dropped=true: Either consecutive attempts were exhausted OR it's not confirmed ingestion was successful (including error on the server side).

Think it would make sense to have 3 values instead of two here (as suggested by your comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(On a related note: I will follow up this PR, once it's merged, but some additional metrics, such as retries count at the export level, etc)

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think processor.dropped and exporter.dropped are quite generic in name.

  1. Are we trying to generalize a pipeline semantic where we can generically track items through a pipleine (like an ESB, something like Apache Camel, etc). or are these meant to be OTEL specific?
  2. Will these be the same between the OTEL Collector and the SDK? IF not, I think you may want a prefix denoting that.

<!-- semconv metric.otel.processor.spans(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `processor.dropped` | boolean | Whether the Span was dropped or not. [1] | | Required |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not included un-sampled spans, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this doesn't include un-sampled Spans. Will clarify that in a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<!-- semconv metric.otel.exporter.spans(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] | | Required |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand how to use this counter and what "dropped' actually means.

Does dropped mean you never tried to export, or that you did try and gave up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. It means that either the spans were dropped for an irrecoverable error (e.g. 4xx), or the max retries count has been reached. You are looking into this clarification or are thinking about some other scenarios as well? (Clarify this as well in this PR in a bit)

@carlosalberto
Copy link
Contributor Author

Hey @jsuereth thanks for the questions.

or are these meant to be OTEL specific?

I'd personally go with these for now. It feels to me that defining semantic conventions that cover general values outside of OTel would require a lot of conversations previously.

Will these be the same between the OTEL Collector and the SDK? IF not, I think you may want a prefix denoting that.

So I was thinking they may be shared but after briefly checking, I'm a little more oriented towards making this SDK specific - this is because the Collector may have slightly different needs (labels/dimensions).


**Status**: [Experimental][DocumentStatus]

This document defines semantic conventions for OTel components (such as processors, exporters, etc).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTel SDK components?

@yurishkuro
Copy link
Member

yurishkuro commented Jul 11, 2023

+1 for this work.

I am a bit troubled by this being defined as a semantic convention. There's certainly an aspect of that in terms of how the metrics are named, but I also think this should be part of the SDK spec, as in all SDKs should be required to provide these metrics about themselves.

Also, there are more to the SDK observability than just the exporter metrics. Here, for example, are the metrics that Jaeger SDKs used to expose: https://github.com/jaegertracing/jaeger-client-go/blob/master/metrics.go#L22-L106

@tigrannajaryan
Copy link
Member

It would be nice to have these metrics shared between SDKs and Collector.

@yurishkuro
Copy link
Member

It would be nice to have these metrics shared between SDKs and Collector.

There needs to be a clear distinction between SDK-produced metrics and Collector-produced, because they can co-exist in the same process.

I am not fond of the naming scheme being proposed here. I would rather see a hierarchy:

  • otel
    • sdk
      • trace
        • tracer
          • stats on # of traces / spans started, finished, etc.
        • processor
          • batch
            • queue_length
            • dropped_spans
        • exporter
          • otlp
            • batches
            • spans
    • collector
      • potentially similar sub-hierarchy

@tigrannajaryan
Copy link
Member

There needs to be a clear distinction between SDK-produced metrics and Collector-produced, because they can co-exist in the same process.

Are you envisioning Collector using Otel SDK and thus Collector processor will need to emit both sets of metrics? That makes sense, however do we need the metric names to be different? They can differ by an attribute.

@yurishkuro
Copy link
Member

yurishkuro commented Jul 11, 2023

Are you envisioning Collector using Otel SDK and thus Collector processor will need to emit both sets of metrics?

The use case we had in the Jaeger land where Jaeger backend components themselves were instrumented for tracing and would emit SDK-scoped metrics, which would then conflict with collector-specific metrics.

however do we need the metric names to be different? They can differ by an attribute.

I don't know if we have this in the guidelines somewhere, but I consider attributes to be "optional" dimensions of the metrics that can be dropped (aggregated over) without destroying semantic meaning of the metric. E.g. tagging a metric with "region" is good, one can aggregate over it if by-region breakdown is not needed. But aggregating metrics from SDK and Collector into one time series does not make sense semantically (e.g. you would double-count spans if you did), so I would expect them to have different names, not just different attributes.

@tigrannajaryan
Copy link
Member

I don't know if we have this in the guidelines somewhere, but I consider attributes to be "optional" dimensions of the metrics that can be dropped (aggregated over) without destroying semantic meaning of the metric. E.g. tagging a metric with "region" is good, one can aggregate over it if by-region breakdown is not needed. But aggregating metrics from SDK and Collector into one time series does not make sense semantically (e.g. you would double-count spans if you did), so I would expect them to have different names, not just different attributes.

I don't think we have a guideline like that. I am not sure such a guideline can be applicable to all and every use case.

Here is an existing use case that would break if we were to adopt such a guideline. Collector processors currently emit metrics about the number of data points processed. There can be multiple processors chained sequentially. Each processor emits a measurement, with the value of the "processor" dimension set equal to the processor name. Aggregating over the "processor" dimension will indeed result in double-counting. It seems you are suggesting that a practice like that is invalid and we shouldn't be doing this in the Collector, but I am not sure what would be a reasonable alternate.

Collector also has similar metrics for receiver and exporter components, where component name is a dimension.

As a comparison Prometheus has the following guideline:

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful).

Of course the interpretation of "should be meaningful" is up for debate. I could argue that sum() is meaningful, it tells how many times datapoints were processed overall by different processors (including SDK and Collector). Not a very useful aggregation, but Prometheus doesn't require it to be useful, only meaningful. :-)

If we change this to "drop" metric (which Collector also records) the sum() becomes quite useful: it gives the total number of dropped data points dropped by all processors, i.e. anywhere in the Collector.

Here is another scenario that I think shows that a guideline "aggregating away a dimension should not result in double counting" is unnecessarily restrictive:

Let's imagine a system where an incoming request requires participation of service A and service B, where service A calls service B for help. Both service A and service B emit a very standard "http.request.count" metric with "service.name" dimension. Do we consider it meaningless to aggregate away the "service.name" of "http.request.count" metric because it would mean double-counting of incoming request and thus this recording practice is invalid? What doe we expect service A and B to do instead?

@yurishkuro
Copy link
Member

I like the Prometheus guideline. It's practical, and says almost the same thing as I am saying. It's not a strict rule, since there is no one-size-fits-all solution, it expects a judgement call of how generous we want to be with the interpretation of "meaningful". An even more generous interpretation is every time we bump a counter it's because of some event, so we can call the metric "events" and make everything else an attribute, and the sum() is the total count of events - meaningful, yet impractical.

So we'd need some other criteria of what a practical "meaningful" is. I am not strongly opposed to having identical metrics from SDK and collector and only separating them by attribute (named how, btw?), but I am not seeing a strong argument why that would be a good idea either.

A specific practical challenge we had in the past in Jaeger is that we used the prom-client as the metrics SDK, and it didn't allow redefining a metric with the same name twice in the same process (even with different attributes). So having two unrelated layers in the code emit the same metric required some special tricks, wasn't possible with off-the-shelf prom-client. Not saying it's a problem with the OTEL SDK, just shows that there was a little more to "meaningful" in the Prom's guideline.

If we already have an established pattern of having different components emit identically named metrics, then we could continue with that pattern. It reminds me of another discussion about resource-identifying attributes vs. all metric attributes: here we designate some attributes as "semantics-changing".

@jmacd
Copy link
Contributor

jmacd commented Jul 18, 2023

I've been diagnosing an OTel Collector that is instrumented with an SDK that exports through OTel Collector components arranged as an SDK exporter. The result is similar to what @yurishkuro described, I find myself confused because I'm getting metrics about dropped spans in two ways, both the collector pipeline's primary drops and the collector's instrumentation (secondary) drops.

Although they are both literally "spans dropped", I would rather not try to make meaning from an aggregation of the two.

I think it would be nice for us to define the semantics carefully for what we want to count, and then use separate but similar metric names for SDKs and collectors. The meaning of "Drop" has to be made very clear. Definitions for "Refused" and "Failed" and other adjectives we use are the important part here: we can re-use the definitions without re-using the same metric names.

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to using the same metrics in SDKs and the collector. Would instrumentation scope be the right way to distinguish between collector and SDK usage of them? instrumentation scope should become a label in prometheus (to yuri's line of questioning)

| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] | | Required |
| `exporter.type` | string | Type of exporter being used. | `OtlpGrpcSpanExporter` | Recommended |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this essentially the same as the instrumentation scope name?

| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `exporter.dropped` | boolean | Whether the Span was dropped or not. [1] | | Required |
| `exporter.type` | string | Type of exporter being used. | `OtlpGrpcSpanExporter` | Recommended |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would different instances of the same "exporter" get a different type, or the same type? E.g. if I have two OTLP grpc span exporters, would I be able to tell how many spans each was exporting? In the collector today, I believe we can tell the difference between two instances of the same component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same type ;) I don't think we have a (clear) notion regarding differentiating two processors/exporters of the same type (at least in the SDK).

@tigrannajaryan
Copy link
Member

So we'd need some other criteria of what a practical "meaningful" is. I am not strongly opposed to having identical metrics from SDK and collector and only separating them by attribute (named how, btw?)

@yurishkuro I think for me this is a better argument against using the same metric. I don't know what the attribute name would be.

@dashpole Using Scope name is a possibility but I am not sure what would be the values that we would use to distinguish between SDKs and Collector.

@tigrannajaryan
Copy link
Member

@open-telemetry/collector-approvers @open-telemetry/collector-maintainers any thoughts on this? What are the current metrics we use in the Collector and do you think we could adopt in the Collector the conventions that this PR suggests?

@djaglowski
Copy link
Member

@open-telemetry/collector-approvers @open-telemetry/collector-maintainers any thoughts on this? What are the current metrics we use in the Collector and do you think we could adopt in the Collector the conventions that this PR suggests?

The spanmetric connector emits two metrics:

  • calls, a sum describing the number of spans observed
  • duration, a histogram of span durations

Both metrics can optionally have a namespace prepended via user confiugration.

I think calls could be changed to otel.processor.spans. However, whether or not the span is dropped is not directly determined by the connector so we would need to identify a way to handle this attribute.

@jpkrohling
Copy link
Member

any thoughts on this? What are the current metrics we use in the Collector and do you think we could adopt in the Collector the conventions that this PR suggests?

We don't seem to have a list of metrics emitted by the collector. We have metrics that are generated by the collector facilities (otelcol_exporter_enqueue_failed_metric_points), and metrics generated by components directly (like otelcol_loadbalancer_backend_latency_bucket).

I would absolutely welcome aligning the names/prefixes with the SDKs and I believe the ones @carlosalberto specifically proposed would be relatively easy to adopt.

For reference, these are the ones I have in mind:

https://github.com/open-telemetry/opentelemetry-collector/blob/b5e511ce31f22fd3d4817236792245fe1bd88ef8/obsreport/obsreport_exporter.go#L88-L122

@jmacd
Copy link
Contributor

jmacd commented Jul 27, 2023

@jpkrohling Thank you for highlighting what is, I think, a source of confusion about the collectors metrics. For a given unit of data (span, metric data point, log record), how many distinct standard outcomes are there? The exporterhelper generates at 1 and the obsreport genrates 2, I think:

  1. How many spans were successfully sent (obsreport)
  2. How many spans were dropped because they could not be sent successfully (obsreport)
  3. How many spans were dropped because a queue was full meaning send never attempted (helper)

Note that when an export is retried because configuration allows it, the items should not be counted multiple times, an in-flight export is not counted until its retries have been exhausted.

@jmacd
Copy link
Contributor

jmacd commented Jul 27, 2023

@tigrannajaryan @dashpole

@dashpole Using Scope name is a possibility but I am not sure what would be the values that we would use to distinguish between SDKs and Collector.

We haven't been very prescriptive about scope names, but we could be for these special cases. For example, I prefer these names to be short so I might suggest scope name "open-telemetry/collector" (w/ collector release version) and "open-telemetry/sdk" (w/ SDK release version--which language identifiable via resource).

@jmacd
Copy link
Contributor

jmacd commented Jul 27, 2023

The list of three metrics above (#184 (comment)) is, I think, incomplete when we consider how to handle PartialSuccess responses. When an item is dropped by the recipient for some kind of malformed data problem those spans should not be counted as successfully sent (category 1), they should be counted in a 4th category, which I believe other collector components refer to as "refused".

  1. How many spans were successfully sent ("sent")
  2. How many spans were dropped because they could not be sent successfully ("failed")
  3. How many spans were dropped because a queue was full meaning send never attempted ("dropped")
  4. How many spans were received and rejected by the consumer ("refused")

@carlosalberto
Copy link
Contributor Author

Hey @jmacd

I prefer these names to be short so I might suggest scope name "open-telemetry/collector" (w/ collector release version) and "open-telemetry/sdk"

There's the case of different exporters, e.g. "OtlpGrpcExporter", "OtlpHttpExporter", "OtlpHttpMyFeatureExporter", etc.

incomplete when we consider how to handle PartialSuccess responses

IIUC, you want to count the value from PartialSuccess as "refused", more than failed/dropped, right? In theory adding the dimension to something like status=success|dropped|refused should help?

@jmacd
Copy link
Contributor

jmacd commented Oct 24, 2023

@carlosalberto after reviewing the Collector and trying to make its metrics consistent with your proposal here, the result I came to is here: carlosalberto#1

This proposes one metric per component, with three levels of detail to cover basic, normal, and detailed use-cases.

@jmacd
Copy link
Contributor

jmacd commented Oct 24, 2023

This was presented in today's Specification SIG, and while some discussion already began over the PR mentioned above (#184 (comment)), the agreement today was that I should open a new PR in this repository with a new PR history, then present it at next meetings of Collector SIG and Sem-Conv SIG.

<!-- semconv metric.otel.processor.spans(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `processor.dropped` | boolean | Whether the Span was dropped or not. [1] | | Required |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing the otel. prefix? So otel.processor.dropped instead of processor.dropped?

@jmacd
Copy link
Contributor

jmacd commented Dec 12, 2023

Closing in favor of #598

@jmacd jmacd closed this Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants