Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[resourcetotelemetry] - Using resource_to_telemetry_conversion enabled causes broken internal otelcol_ metrics #14900

Open
gillg opened this issue Oct 12, 2022 · 20 comments
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed pkg/resourcetotelemetry priority:p2 Medium

Comments

@gillg
Copy link
Contributor

gillg commented Oct 12, 2022

What happened?

Description

There is a prometheus exporter error failed to convert metric otelcol_xxxxxxxxxxxxx: duplicate label names if you enable

resource_to_telemetry_conversion:
      enabled: true

Steps to Reproduce

Add a receiver :

prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: ['localhost:8888']

and enable resource_to_telemetry_conversion at prometheus exporter level.
(I didn't tested yet with prometheusremotewriteexporter)

Collector version

0.61.0

Environment information

Environment

OS: windows server 2019

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: ['localhost:8888']
        - job_name: windows-exporter
          scrape_interval: 10s
          static_configs:
            - targets: ['localhost:9182']

exporters:
  prometheus:
    endpoint: "0.0.0.0:9095"
    send_timestamps: true
    metric_expiration: 30s
    enable_open_metrics: false
    resource_to_telemetry_conversion:
      enabled: true

processors:
  batch:
    send_batch_size: 50
    timeout: 5s
  resourcedetection:
    detectors: ["gke","aks","eks","ec2","gce","azure","ecs"]
    ec2:
      tags:
        - ^Name$
  resource/metrics:
    attributes:
    - key: job_from
      from_attribute: job
      action: insert
    #- key: job
    #  action: delete
  memory_limiter:
    check_interval: 1s
    limit_mib: 256
    spike_limit_percentage: 30

extensions:
  health_check:
  memory_ballast:
    size_mib: 32

service:
  extensions: [memory_ballast,health_check]
  telemetry:
    logs:
      level: info
    metrics:
      level: normal
      address: ":8888"
  pipelines:
    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, resourcedetection, resource/metrics, batch]
      exporters: [prometheus]

Log output

2022-10-12T12:11:59.062Z        error   prometheusexporter@v0.61.0/collector.go:367     failed to convert metric otelcol_process_cpu_seconds: duplicate label names     {"kind": "exporter", "data_type": "metrics", "name": "prometheus"
}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.61.0/collector.go:367
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.13.0/prometheus/registry.go:455
2022-10-12T12:11:59.067Z        error   prometheusexporter@v0.61.0/collector.go:367     failed to convert metric otelcol_processor_batch_batch_send_size: duplicate label names {"kind": "exporter", "data_type": "metrics", "name": "pro
metheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.61.0/collector.go:367
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.13.0/prometheus/registry.go:455
2022-10-12T12:11:59.067Z        error   prometheusexporter@v0.61.0/collector.go:367     failed to convert metric otelcol_exporter_enqueue_failed_spans: duplicate label names   {"kind": "exporter", "data_type": "metrics", "name": "pro
metheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.61.0/collector.go:367
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.13.0/prometheus/registry.go:455

Additional context

No response

@gillg gillg added bug Something isn't working needs triage New item requiring triage labels Oct 12, 2022
@evan-bradley evan-bradley added priority:p2 Medium exporter/prometheus and removed needs triage New item requiring triage labels Oct 12, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@gillg gillg changed the title [prometheusexporter] - Using resource_to_telemetry_conversion enabled causes broken internal otelc_ metrics [prometheusexporter] - Using resource_to_telemetry_conversion enabled causes broken internal otelcol_ metrics Oct 12, 2022
@gillg
Copy link
Contributor Author

gillg commented Oct 12, 2022

I found the reason, the duplicated label is service.instance.id.
Internaly the value is a random uuid, but once scrapped by prometheus receiver, the "instance" become "localhost:8888".
Then if we enable resources as attributes, the instance is converted to service.instance.id and it cause a duplicated label.

@jojotong
Copy link

jojotong commented Dec 2, 2022

Any updates? I met the same problem.

@jojotong
Copy link

jojotong commented Dec 2, 2022

I tested it with prometheusremotewriteexporter, it's no problem.
BUT how to resolve this with prometheus exporter?

@Aneurysm9
Copy link
Member

This is handled in the PRW exporter by duplicate detection logic added here. @dashpole is this logic Prometheus-specific or should it be incorporated in the resource to telemetry helper? It doesn't seem right for that helper to be creating duplicate attributes.

@dashpole
Copy link
Contributor

dashpole commented Jan 3, 2023

is this logic Prometheus-specific or should it be incorporated in the resource to telemetry helper? It doesn't seem right for that helper to be creating duplicate attributes.

I think it should be incorporated into the resource-to-telemetry helper.

@Aneurysm9 Aneurysm9 changed the title [prometheusexporter] - Using resource_to_telemetry_conversion enabled causes broken internal otelcol_ metrics [resourcetotelemetry] - Using resource_to_telemetry_conversion enabled causes broken internal otelcol_ metrics Jan 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

Pinging code owners for pkg/resourcetotelemetry: @mx-psi. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 6, 2023
@mx-psi mx-psi removed the Stale label Mar 8, 2023
@mx-psi
Copy link
Member

mx-psi commented Mar 8, 2023

I am not sure what the intended fix for this issue is. Do we want the helper to do nothing when adding the labels would result in duplicates?

Internaly the value is a random uuid, but once scrapped by prometheus receiver, the "instance" become "localhost:8888".
Then if we enable resources as attributes, the instance is converted to service.instance.id and it cause a duplicated label.

What does this imply in terms of the generated OTLP payload? I don't quite understand this sentence, I assume this is some Prometheus-specific behavior that I am not aware of

@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 8, 2023
@mx-psi
Copy link
Member

mx-psi commented May 8, 2023

@Aneurysm9 can you help answering #14900 (comment) ? Thanks!

@github-actions github-actions bot removed the Stale label May 26, 2023
@dashpole
Copy link
Contributor

Maybe instead of producing an error, it should prefer one over the other. It seems like the otel-generated one should take precedence? I.e. only change instance to service.instance.id if it doesn't already exist?

@oskoi
Copy link

oskoi commented Jun 30, 2023

@Aneurysm9 @dashpole Hi!

I ran into the same issue and it's blocking us.

prometheusexporter already has deduplication logic for labels. Maybe it's worth expanding to this case as it's done in PRW exporter?

@krsmanovic
Copy link

krsmanovic commented Aug 3, 2023

I have fixed the issue using the metric relabeling configuration in the receiver:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector-self'
          scrape_interval: 30s
          static_configs:
            - targets: ['localhost:8888']
          metric_relabel_configs:
            - action: labeldrop
              regex: "service_instance_id|service_name"

Explanation

Prometheus exporter is automatically generating not only service.instance.id label (thanks @gillg), but in my case it was also rendering new service.name label value. Original service.name label value on the receiver was otelcol-contrib, however, the exporter is setting that value to match job name, in this case otel-collector-self.

Collector version I have tested this on is 0.78.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 3, 2023
Copy link
Contributor

github-actions bot commented Dec 2, 2023

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 2, 2023
@mx-psi mx-psi reopened this Dec 4, 2023
Copy link
Contributor

github-actions bot commented Feb 5, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 5, 2024
@mx-psi mx-psi added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Feb 5, 2024
@netsandbox
Copy link

I have fixed the issue using the metric relabeling configuration in the receiver:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector-self'
          scrape_interval: 30s
          static_configs:
            - targets: ['localhost:8888']
          metric_relabel_configs:
            - action: labeldrop
              regex: "service_instance_id|service_name"

Explanation

Prometheus exporter is automatically generating not only service.instance.id label (thanks @gillg), but in my case it was also rendering new service.name label value. Original service.name label value on the receiver was otelcol-contrib, however, the exporter is setting that value to match job name, in this case otel-collector-self.

Collector version I have tested this on is 0.78.

Besides service_instance_id|service_name I also had to add http_scheme|net_host_port to get rid of the error messages:

2024-03-11T11:11:21.249Z        error   prometheusexporter@v0.96.0/collector.go:381     failed to convert metric otelcol_http_server_request_size: duplicate label names in constant and variable labels for metric "otelcol_http_server_request_size_total"    {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.96.0/collector.go:381
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.19.0/prometheus/registry.go:457
2024-03-11T11:11:21.250Z        error   prometheusexporter@v0.96.0/collector.go:381     failed to convert metric otelcol_http_server_duration: duplicate label names in constant and variable labels for metric "otelcol_http_server_duration"  {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.96.0/collector.go:381
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.19.0/prometheus/registry.go:457
2024-03-11T11:11:21.253Z        error   prometheusexporter@v0.96.0/collector.go:381     failed to convert metric otelcol_http_server_response_size: duplicate label names in constant and variable labels for metric "otelcol_http_server_response_size_total"  {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.96.0/collector.go:381
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.19.0/prometheus/registry.go:457

@acar-ctpe
Copy link

acar-ctpe commented Apr 10, 2024

@netsandbox @krsmanovic how did you find out which labels are conflicting?

walnuts1018 added a commit to walnuts1018/infra that referenced this issue Jun 26, 2024
@samuelchrist
Copy link

I see the same error

github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.100.0/collector.go:386
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/client_golang@v1.19.0/prometheus/registry.go:457
2024-07-03T04:21:50.742Z        error   prometheusexporter@v0.100.0/collector.go:386    failed to convert metric otelcol_http_server_response_size: duplicate label names in constant and variable labels for metric "otelcol_http_server_response_size_total"     {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}

I have added label drop. But still the issue is present

      prometheus:
        config:
          scrape_configs:
          - job_name: otel-collector-agent
            scrape_interval: 15s
            static_configs:
            - targets:
              - ${env:MY_POD_IP}:8888
            metric_relabel_configs:
            - action: labeldrop
              regex: "service_instance_id|service_name|http_scheme|net_host_port"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed pkg/resourcetotelemetry priority:p2 Medium
Projects
None yet
Development

No branches or pull requests