Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanmetrics connector generating extreme grpc traffic #20306

Closed
devrimdemiroz opened this issue Mar 24, 2023 · 7 comments
Closed

spanmetrics connector generating extreme grpc traffic #20306

devrimdemiroz opened this issue Mar 24, 2023 · 7 comments
Labels
bug Something isn't working connector/spanmetrics

Comments

@devrimdemiroz
Copy link

Component(s)

connector/spanmetrics

What happened?

Description

I replaced the spanmetrics processor config on opentelemetry demo app with the new spanmetrics connector. The otlp grpc receiver observed traffic increased almost 10,000 times. Accordingly calls (previously calls_total) and related spanmetrics also linearly explode. See the screenshots at the bottom.

Steps to Reproduce

Following configuration is used in replacement for spanmetrics processor:

connectors:
  spanmetrics:
      histogram:
        explicit:
          buckets: [ 100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms ]
      dimensions:
        - name: http.method
          default: GET
        - name: http.status_code
      dimensions_cache_size: 1000
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
....

service:
  pipelines:
    traces/spanmetrics:
      receivers: [otlp]
      exporters: [spanmetrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      exporters: [prometheus]

Expected Result

The expected result is to be inline with spanmetrics processor runs.

When processor runs:

SpanmetricsProcessor

Actual Result

When connector runs:

SpanmetricsConnector

Collector version

0.74.0

Environment information

Environment

Images

IMAGE_VERSION=1.3.1
IMAGE_NAME=ghcr.io/open-telemetry/demo

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:
        cors:
          allowed_origins:
            - "http://*"
            - "https://*"
exporters:
  otlp:
    endpoint: "localhost:4317"
    tls:
      insecure: true
  logging:
  prometheus:
    endpoint: "otelcol:9464"
    resource_to_telemetry_conversion:
      enabled: true
    enable_open_metrics: true
connectors:
  spanmetrics:
      histogram:
        explicit:
          buckets: [ 100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms ]
      dimensions:
        - name: http.method
          default: GET
        - name: http.status_code
      dimensions_cache_size: 1000
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

processors:
  batch:
  transform:
    metric_statements:
      - context: metric
        statements:
          - set(description, "Measures the duration of inbound HTTP requests") where name == "http.server.duration"


service:
  pipelines:
    traces/spanmetrics:
      receivers: [otlp]
      exporters: [spanmetrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      exporters: [prometheus]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [transform, batch]
      exporters: [prometheus]

Log output

No response

Additional context

No response

@devrimdemiroz devrimdemiroz added bug Something isn't working needs triage New item requiring triage labels Mar 24, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@kovrus
Copy link
Member

kovrus commented Mar 24, 2023

@devrimdemiroz The reason is #19216, spanmetrics generates metrics from spans per resource scope now so the number of generated replicas will grow with the number of resource scopes. I've opened a PR to toggle on/off this functionality or filter resource attributes #19467 but we decided to close it because this can be achieved with the transform processor's keep_keys function.

@kovrus kovrus removed the needs triage New item requiring triage label Mar 24, 2023
@devrimdemiroz
Copy link
Author

@kovrus, I truly appreciate your quick response! If you could provide me with a little bit more on the transform processor configuration I need to add, you'll be an absolute time-saver for me. Thanks in advance!

@kovrus
Copy link
Member

kovrus commented Mar 27, 2023

@devrimdemiroz something like that will reduce the number of resource scopes to the number of the services that produce telemetry. If we want to allow the old behavior, one resource scope for everything, we should wrap #19467 up.

...

processors:
  transform:
    trace_statements:
    - context: resource
      statements:
      - keep_keys(attributes, ["service.name"])

...
service:
  pipelines:
    traces/spanmetrics:
      receivers: [otlp]
      processors: [transform]
      exporters: [spanmetrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      exporters: [prometheus]
...

@devrimdemiroz
Copy link
Author

@kovrus, thank you for sharing the precise configuration; it works perfectly. However, I'm unsure if it's absolutely necessary or not. My goal is to create a more straightforward and comprehensible configuration using the new connector config. To achieve this, I've had to add a layer that I haven't used or been familiar with before, which wasn't required by the previous processor. I'm not questioning the importance or potential benefits it may offer; I'm merely curious about the rationale behind some extra lines that aren't immediately clear. Nevertheless, I would recommend including it as part of the default spanmetrics connector config in the documentation. Since the transform config works, I'll consider this matter resolved. Thanks for your time.

@kovrus
Copy link
Member

kovrus commented Mar 30, 2023

@devrimdemiroz yes, we should add a more comprehensive readme for the span metrics connector and its differences from the processor. I've tried to call out that more metrics will be generated when using the connector here, but we probably should provide a better explanation.

The transform processor with keep_keys controls the number of generated metrics resource scopes. There definitely will be cases when resource attributes will have high cardinality and that will result in more metrics generated. I agree that it is not evident from the documentation.

@djaglowski I think, we should probably revisit #19467 and allow users to control what attributes are going to be added to generated metrics resource scopes. Maybe, by default, we can keep resource service.name, service.namespace, and service.isntance.id attributes that would define generated metrics resource scopes (wdyt @gouthamve)? We can use keep_keys for that but then dimensions configuration parameter of the spanmetrics won't work, since resource attributes will be effected by keep_keys.

@djaglowski
Copy link
Member

My only concern is that we may find ourselves needing to add more and more "transform" capabilities to this connector as well as others. However, if emitting consolidated metrics based on resource attributes appears to be a particularly common case, then I support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/spanmetrics
Projects
None yet
Development

No branches or pull requests

3 participants