Skip to content

internal traces are not generated when telemetry.useOtelWithSDKConfigurationForInternalTelemetry feature gate is set #9715

Closed
@SophieDeBenedetto

Description

Describe the bug
Following the instructions outlined in the resources below to enable this feature gate and configure the telemetry service to emit internal spans does not result in internal spans being emitted:

No errors are logged, but neither are any internal spans emitted.

Steps to reproduce

  • Add the telemetry.useOtelWithSDKConfigurationForInternalTelemetry feature gate
  • Add this telemetry service configuration for internal span emission
traces:
      processors:
        batch:
          exporter:
            otlp:
              protocol: grpc/protobuf
              endpoint: <pod IP>:4317

Ensure the necessary OTEL_* env vars are set like OTEL_SERVICE_NAME and OTEL_EXPORTER_OTLP_TRACES_HEADERS

Then, send some trace traffic to your collector.

What did you expect to see?
Internal spans emitted from the collector.

What did you see instead?
No internal spans emitted, no error behavior either though.

What version did you use?
0.95.0

What config did you use?

---
service:
  # For now we only ingest traces. For metrics we use datadog and for logs fluent-bit.
  pipelines:
    traces/unsampled:
      receivers:
        - otlp/auth
        - otlp/octomesh
      processors:
        # Ordering matters!
        # https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/README.md
        # In order for unsampled metrics to be correct, we have unfortunately to process all of the trace data before sampling.
        # This ensures that the metrics do not have access to un-redacted attributes.
        - memory_limiter
        - batch
        - transform/octomesh
        - groupbyattrs/compaction
        - transform/peer-service
        - transform/datastores
        - redaction/allow-list
        - attributes/euii
        - transform/error-recording
        - transform/resource-allow-list
      exporters:
        - datadog/connector

    traces/sampled:
      receivers:
        - datadog/connector
      processors:
        # Ordering matters!
        # https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/README.md
        - memory_limiter
        - probabilistic_sampler
      exporters:
        - ${env:OTELCOL_TRACE_EXPORTER}
        # Enable when debugging locally or set `OTELCOL_TRACE_EXPORTER` to `logging`
        # - logging

     metrics/unsampled:
       receivers:
         - prometheus
         - otlp/auth
         - datadog/connector
       processors:
         - memory_limiter
         - batch
       exporters:
         - ${env:OTELCOL_METRICS_EXPORTER}
         # Enable when debugging locally or set `OTELCOL_METRICS_EXPORTER` to `logging`
         # - logging

  extensions:
    - health_check
    - basicauth
  telemetry:
    logs:
      encoding: json
    metrics:
      level: detailed
    # Configure the collector's internal telemetry so that internal spans are emitted
    traces:
      processors:
        batch:
          exporter:
            otlp:
              protocol: grpc/protobuf
              endpoint: localhost:4317
extensions:
  health_check: {}
  basicauth:
    htpasswd:
      inline: |
        ${env:OTELCOL_BASIC_AUTH}

# The pipeline details
receivers:
  otlp/auth:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        auth:
          authenticator: basicauth
      http:
        endpoint: 0.0.0.0:4318
        auth:
          authenticator: basicauth
  otlp/octomesh:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14317
      http:
        endpoint: 0.0.0.0:14318

  # The prometheus receiver scrapes metrics needed for the OpenTelemetry Collector Dashboard.
  # https://app.datadoghq.com/dash/integration/30773/opentelemetry-collector-metrics-dashboard
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otelcol'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

processors:
  memory_limiter:
    check_interval: 1s
    # Maximum amount of memory, in MiB, targeted to be allocated by the process heap.
    # Note that typically the total memory usage of process will be about 50MiB higher than this value.
    # This defines the hard limit.
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/memorylimiterprocessor
    limit_percentage: 90
    spike_limit_percentage: 20

  probabilistic_sampler:
    hash_seed: 22
    sampling_percentage:  ${env:OTEL_COL_SAMPLING_PERCENTAGE}

  redaction/allow-list:
    ${file:redaction-allow-list.yaml}

  attributes/euii:
    ${file:attributes-euii.yaml}

  transform/octomesh:
    ${file:octomesh.yaml}

  transform/peer-service:
    ${file:peer-service.yaml}

  transform/resource-allow-list:
    ${file:transform-processor.yaml}

  transform/error-recording:
    ${file:error-recording.yaml}

  transform/datastores:
    ${file:datastores.yaml}

  # TODO: Tweak export batch sizes to DD based on this article
  # https://docs.datadoghq.com/opentelemetry/otel_collector_datadog_exporter/?tab=kubernetesgateway#2-configure-the-datadog-exporter
  batch: {}

  # This processor will compact traces by grouping spans by common resource and instrumentation attributes,
  # that way subsequent steps in the pipeline will have less data to process.
  groupbyattrs/compaction:

connectors:
  # The Datadog Connector is a connector component that computes Datadog APM Stats pre-sampling in the event
  # that your traces pipeline is sampled using components such as the tailsamplingprocessor or probabilisticsamplerprocessor.
  # The sampled pipeline should be duplicated and the datadog connector should be added to the
  # pipeline that is not being sampled to ensure that Datadog APM Stats are accurate in the backend.
  # See https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/datadogconnector
  datadog/connector:
    traces:
      # The Datadog Connector to _only_ compute stats for the root span local to this trace. That should be `server` and `consumer` spans.
      # Hopefully that will be the ingress span.
      # https://github.com/DataDog/datadog-agent/blob/main/pkg/trace/traceutil/trace.go#L114
      compute_stats_by_span_kind: true
      # We will not use the connector for unsampled client metrics here
      peer_tags_aggregation: true
      peer_tags:
        - _dd.base_service
        # - amqp.destination
        # - amqp.exchange
        # - amqp.queue
        # - aws.queue.name
        # - bucketname
        - cassandra.cluster
        - db.cassandra.contact.points
        - db.couchbase.seed.nodes
        - db.hostname
        - db.instance
        - db.name
        - db.system
        # - grpc.host
        # - hazelcast.instance
        - hostname
        - host.name
        - http.host
        - messaging.destination
        - messaging.destination.name
        - messaging.kafka.bootstrap.servers
        - messaging.rabbitmq.exchange
        - messaging.system
        # - mongodb.db
        # - msmq.queue.path
        - net.peer.name
        - network.destination.name
        - peer.hostname
        - peer.service
        # - queuename
        - rpc.service
        - rpc.system
        - server.address
        # - streamname
        # - tablename
        # - topicname
      trace_buffer: 100

exporters:
  datadog:
    api:
      site: datadoghq.com
      key: ${env:DD_API_KEY}
    traces:
      trace_buffer: 100
  logging:
    verbosity: detailed
    sampling_initial: 1
    sampling_thereafter: 1
  file/no_rotation:
    path: /tmp/trace-output/output.json

Environment

Additional context
I was chatting about this in Slack with @codeboten and we both looked through opentelemetry-collector code and found that this feature flag isn't being used to do anything regarding the tracer provider. Looking through code here https://github.com/search?q=repo%3Aopen-telemetry%2Fopentelemetry-collector%20extendedConfig&type=code and I only see it used when the meter reader is initialized. That initialization seems to only use it to output a log statement. @codeboten was not able to get internal traces generated either.

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcollector-telemetryhealthchecker and other telemetry collection issuesrelease:required-for-gaMust be resolved before GA release

    Type

    No type

    Projects

    • Status

      Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions