Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All metrics scraped from push gateway have same job label #34237

Closed
kpanic9 opened this issue Jul 24, 2024 · 5 comments
Closed

All metrics scraped from push gateway have same job label #34237

kpanic9 opened this issue Jul 24, 2024 · 5 comments
Assignees
Labels
bug Something isn't working receiver/prometheus Prometheus receiver

Comments

@kpanic9
Copy link

kpanic9 commented Jul 24, 2024

Component(s)

receiver/prometheus

What happened?

Description

In our setup, we have a push gateway which jobs push metrics to. Each of the metrics pushed by the jobs have a different job label. We want to preserve the job label set by the metric publishing job. But when we set honor_labels: true in the otel prometheus receiver configuration for the scrape job, all metrics scraped from the push gateway has a single value for the job label. The value set for the job label is from a one set of metrics pushed by a job.

Steps to Reproduce

Configure a push gateway, push few metrics to it with different values for job label.
Configure OTEL collector to scrape push gateway.
Check the values for the job label.

Expected Result

Metrics scraped from push gateway should have the job label value set by the metrics publisher.

Actual Result

All metrics scraped from push gateway has a single value for the job label.

Collector version

v0.103.0

Environment information

No response

OpenTelemetry Collector configuration

receivers:
      prometheus/2:
        config:
          scrape_configs:
          - honor_labels: true
            job_name: app-platform-pushgateway
            kubernetes_sd_configs:
            - namespaces:
                names:
                - app-platform-monitoring
              role: pod
            relabel_configs:
            - action: keep
              regex: Running
              source_labels:
              - __meta_kubernetes_pod_phase
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_ready
            - action: keep
              regex: metrics
              source_labels:
              - __meta_kubernetes_pod_container_port_name
            scheme: http
            scrape_interval: 30s
            scrape_timeout: 10s

Log output

No response

Additional context

No response

@kpanic9 kpanic9 added bug Something isn't working needs triage New item requiring triage labels Jul 24, 2024
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Jul 24, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bacherfl
Copy link
Contributor

bacherfl commented Jul 24, 2024

I was looking into this to get a better understanding of this receiver, and also reproduced this behavior. Looking at the code, it seems like after every scrape for a certain scrape config, all gathered metrics are put into the same resource, which is created in the initTransaction method: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/internal/transaction.go#L358 - the name of the resource will be set based on the job label - which is the name of the scrape config if honor_labels is set to false, but with honor_labels set to true it can happen that we have multiple different values for that label.
When attaching the different data points of metrics with the same name, e.g. the job and instance labels are among the labels which will not be added to the datapoint attributes (see.

func getSortedNotUsefulLabels(mType pmetric.MetricType) []string {
) - so from my understanding this could be an explanation for why all different instances of a metric are aggregated under the same job label.

Question to the code owners - @Aneurysm9 @dashpole - is the creation of one resource per transaction intended, or does the logic need to be adjusted to account for the possibility of having different values for the job label in case honor_labels is set to true, i.e. should in this case multiple resources be created in the same transaction, based on the set of different values for the job label?

@dashpole
Copy link
Contributor

Good find. The logic needs to be adjusted to account for multiple resources. We should create a new, unique resource for each combination of job + instance.

@dashpole dashpole removed the needs triage New item requiring triage label Jul 31, 2024
@bacherfl
Copy link
Contributor

Good find. The logic needs to be adjusted to account for multiple resources. We should create a new, unique resource for each combination of job + instance.

Thanks for the response @dashpole - I would be happy to work on a PR for this. I already have a PoC implementation that should fix this which needs some polishing and tests, but I should have a PR ready this week

evan-bradley added a commit that referenced this issue Aug 27, 2024
…rom `job`/`instance` label pairs (#34344)

**Description:** This PR solves a bug where metrics with different
`job`/`instance` labels were added into the same resource. This can
happen with the `honor_labels` being set to `true`, in which case those
labels are not taken by the scrape config, but from the individual data
points that are aggregated during a scrape iteration.

This change also affects the use of relabel configs, if the job or
instance labels of gathered metrics are changed by those. Here a new
resource for each distinct job/instance label will be created, with the
matching metrics being added to those. The additional scrape metrics
(number of scraped samples, scrape duration, up, etc.) will be put into
a resource representing the scrape config.

**Link to tracking Issue:** #34237

**Testing:** Added Unit tests and adapted relevant e2e tests

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…rom `job`/`instance` label pairs (open-telemetry#34344)

**Description:** This PR solves a bug where metrics with different
`job`/`instance` labels were added into the same resource. This can
happen with the `honor_labels` being set to `true`, in which case those
labels are not taken by the scrape config, but from the individual data
points that are aggregated during a scrape iteration.

This change also affects the use of relabel configs, if the job or
instance labels of gathered metrics are changed by those. Here a new
resource for each distinct job/instance label will be created, with the
matching metrics being added to those. The additional scrape metrics
(number of scraped samples, scrape duration, up, etc.) will be put into
a resource representing the scrape config.

**Link to tracking Issue:** open-telemetry#34237

**Testing:** Added Unit tests and adapted relevant e2e tests

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
@kpanic9
Copy link
Author

kpanic9 commented Sep 17, 2024

Tested the fix, it works. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

3 participants