-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Otel Agent Collector is not showing correct value for dropped spans metrics #34279
Comments
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
The status code returned is the clue here. 413 means entity too large. The spans were explicitly refused. This is the correct behavior. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
Component(s)
No response
Describe the issue you're reporting
I have an OTEL Collector instance deployed in Gateway mode. When I query the metric for dropped spans (Grafana Explore menu) I get no data even though I experienced dropped spans at that exact timestamp. I would like to alert on "dropped span" events and for that I am starting with the following query:
otelcol_processor_dropped_spans_total{cluster_name="orion", service_name="otelcol-contrib"} @1721123498
but the query returns a count of 0:
otelcol_processor_dropped_spans_total{cluster_name="orion",instance=":8888",job="otel-agent",processor="memory_limiter",service_instance_id="6b4xxxxx-fxxx-4xxx-axxx-e1fxxxxxxxxx",service_name="otelcol-contrib",service_version="0.104.0"} 0
The OTEL Gateway receives spans, logs and metrics exported by agents running on multiple K8s clusters. On one of the K8s clusters I experienced data loss on a traces pipeline. Using LogQL I can confirm the dropped spans as below:
{namespace="monitoring", app="opentelemetry-collector", cluster_name="orion"} | json | level=~"error|warn" | ts=~"^1721123498.*"
and the output:
{"level":"error","ts":1721123498.01548,"caller":"exporterhelper/queue_sender.go:90","msg":"Exporting failed. Dropping data.","kind":"exporter","data_type":"traces","name":"zipkin/tempo","error":"no more retries left: failed the request with status code 413","dropped_items":2393,"stacktrace":"go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\tgo.opentelemetry.io/collector/exporter@v0.104.0/exporterhelper/queue_sender.go:90\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\tgo.opentelemetry.io/collector/exporter@v0.104.0/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\tgo.opentelemetry.io/collector/exporter@v0.104.0/internal/queue/consumers.go:43"}
What's strange is that when I use the otelcol_processor_refused_spans_total metric:
otelcol_processor_refused_spans_total{cluster_name="orion", service_name="otelcol-contrib"} @1721123498
I get some results:
otelcol_processor_refused_spans_total{cluster_name="orion",instance=":8888",job="otel-agent",processor="memory_limiter",service_instance_id="6bXXXXXX-fXXX-4XXX-aXXX-e1fXXXXXXXXX",service_name="otelcol-contrib",service_version="0.104.0"} 38111
Although this metric may work for alerting I would ideally expect to get results from the more specific otelcol_processor_dropped_spans_total metric.
What am I missing ?
The text was updated successfully, but these errors were encountered: