Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/prometheus] honor_labels set to true and scraping a prometheus pushgateway not working #33742

Closed
paebersold-tyro opened this issue Jun 25, 2024 · 6 comments
Labels
bug Something isn't working needs triage New item requiring triage receiver/prometheus Prometheus receiver

Comments

@paebersold-tyro
Copy link
Contributor

Component(s)

receiver/prometheus

What happened?

Description

Scraping a Prometheus pushgateway with honor_labels: true results in a scrape endpoint failure. Suspect this is due to the scrape metrics having both instance and jobs labels (from #15239) but would like clarification that this is the problem. Also is there any work around (other than setting honor_labels: false). Attempted doing a label drop with metric_relabel_config but that did not work.

Steps to Reproduce

Prometheus receiver config

          - job_name: test-pushgateway
            scrape_interval: 30s
            scrape_timeout: 10s
            honor_labels: true
            scheme: http
            kubernetes_sd_configs:
            - role: pod
              namespaces:
                names:
                - app-platform-monitoring
            relabel_configs:
            # and pod is running
            - source_labels: [__meta_kubernetes_pod_phase]
              regex: Running
              action: keep
            # and pod is ready
            - source_labels: [__meta_kubernetes_pod_ready]
              regex: true
              action: keep
            # and only metrics endpoints
            - source_labels: [__meta_kubernetes_pod_container_port_name]
              action: keep
              regex: metrics

Expected Result

Endpoint is scraped, job and instances labels from the pushgateway are used.

Actual Result

Endpoint scrape failure (see log message below)

Collector version

0.102.0

Environment information

Environment

OS: Kubernetes 1.29

OpenTelemetry Collector configuration

receiver:
    prometheus:
      config:
          - job_name: test-pushgateway
            scrape_interval: 30s
            scrape_timeout: 10s
            honor_labels: true
            scheme: http
            kubernetes_sd_configs:
            - role: pod
              namespaces:
                names:
                - app-platform-monitoring
            relabel_configs:
            # and pod is running
            - source_labels: [__meta_kubernetes_pod_phase]
              regex: Running
              action: keep
            # and pod is ready
            - source_labels: [__meta_kubernetes_pod_ready]
              regex: true
              action: keep
            # and only metrics endpoints
            - source_labels: [__meta_kubernetes_pod_container_port_name]
              action: keep
              regex: metrics
exporter:
  debug: {}
service:
  pipeline:
    metrics:
      exporters: [debug]
      processors: []
      receivers: [prometheus]

Log output

2024-06-24T06:20:36.193Z        warn    internal/transaction.go:125     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1719210036190, "target_labels": "{__name__=\"up\", instance=\"10.18.67.171:9091\", job=\"test-pushgateway\"}"}

Additional context

sample of metrics that are returned from the pushgateway

app_platform_attestation{feature="coredns",instance="",job="cluster",team="bob",test="TestCoreDNSNameResolution"} 1
app_platform_attestation{feature="coredns",instance="",job="cluster",team="bob",test="TestIsCoreDNSDeployed"} 1
app_platform_attestation{feature="coredns",instance="",job="cluster",team="bob",test="TestIsCoreDNSServiceAvailable"} 1
push_failure_time_seconds{feature="coredns",instance="",job="cluster"} 0
push_time_seconds{feature="coredns",instance="",job="cluster"} 1.7192055849949868e+09
@paebersold-tyro paebersold-tyro added bug Something isn't working needs triage New item requiring triage labels Jun 25, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Jun 25, 2024
@dashpole
Copy link
Contributor

Can you set the log level of the collector to debug to see the detailed error message for why the scrape failed?

@dashpole
Copy link
Contributor

I think it should be:

service:
    logs:
        level: DEBUG

@paebersold-tyro
Copy link
Contributor Author

Hello, debug log output (seems the empty instance label may be the issue as suspected)

2024-06-27T01:40:49.045Z	debug	scrape/scrape.go:1650	Unexpected error	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "test-pushgateway", "target": "http://10.18.67.95:9091/metrics", "series": "app_platform_attestation{feature=\"coredns\",instance=\"\",job=\"cluster\",team=\"bob\",test=\"TestCoreDNSNameResolution\"}", "error": "job or instance cannot be found from labels"}
2024-06-27T01:40:49.045Z	debug	scrape/scrape.go:1346	Append failed	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "test-pushgateway", "target": "http://10.18.67.95:9091/metrics", "error": "job or instance cannot be found from labels"}
2024-06-27T01:40:49.045Z	warn	internal/transaction.go:125	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1719452449041, "target_labels": "{__name__=\"up\", instance=\"10.18.67.95:9091\", job=\"test-pushgateway\"}"}

@dashpole
Copy link
Contributor

This should've been fixed by #33565. Can you try upgrading to v0.103.0?

@paebersold-tyro
Copy link
Contributor Author

Thank you for that, 0.103.0 fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

2 participants