Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheusreceiver] Scrape fails when trying to labeldrop job or instance labels in metric_relabel_configs #9986

Closed
Mario-Hofstaetter opened this issue May 12, 2022 · 11 comments
Labels
bug Something isn't working closed as inactive comp:prometheus Prometheus related issues priority:p1 High receiver/prometheus Prometheus receiver Stale

Comments

@Mario-Hofstaetter
Copy link

Describe the bug

Using

          metric_relabel_configs:
            - action: labeldrop
              regex: job

or

          metric_relabel_configs:
            - action: labeldrop
              regex: instance

causes prometheusreceiver to fail all targets. Full config example below. Log output:

.\otelcol.exe --config .\fail-config.yaml                                                                                                                                                                                       in pwsh at 21:27:52
2022-05-12T21:28:03.474+0200    info    builder/exporters_builder.go:255        Exporter was built.     {"kind": "exporter", "name": "prometheus"}
2022-05-12T21:28:03.526+0200    info    builder/pipelines_builder.go:224        Pipeline was built.     {"kind": "pipeline", "name": "metrics"}
2022-05-12T21:28:03.526+0200    info    builder/receivers_builder.go:226        Receiver was built.     {"kind": "receiver", "name": "prometheus", "datatype": "metrics"}
2022-05-12T21:28:03.527+0200    info    service/telemetry.go:109        Setting up own telemetry...
2022-05-12T21:28:03.527+0200    info    service/telemetry.go:129        Serving Prometheus metrics      {"address": "localhost:8888", "level": "basic", "service.instance.id": "f28247cd-38ab-4e4a-aac7-16c51d09e868", "service.version": "latest"}
2022-05-12T21:28:03.527+0200    info    service/service.go:76   Starting extensions...
2022-05-12T21:28:03.527+0200    info    service/service.go:81   Starting exporters...
2022-05-12T21:28:03.527+0200    info    builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "prometheus"}
2022-05-12T21:28:03.529+0200    info    builder/exporters_builder.go:48 Exporter started.       {"kind": "exporter", "name": "prometheus"}
2022-05-12T21:28:03.529+0200    info    service/service.go:86   Starting processors...
2022-05-12T21:28:03.529+0200    info    builder/pipelines_builder.go:54 Pipeline is starting... {"kind": "pipeline", "name": "metrics"}
2022-05-12T21:28:03.529+0200    info    builder/pipelines_builder.go:65 Pipeline is started.    {"kind": "pipeline", "name": "metrics"}
2022-05-12T21:28:03.529+0200    info    service/service.go:91   Starting receivers...
2022-05-12T21:28:03.529+0200    info    builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "prometheus"}
2022-05-12T21:28:03.546+0200    info    builder/receivers_builder.go:73 Receiver started.       {"kind": "receiver", "name": "prometheus"}
2022-05-12T21:28:03.547+0200    info    service/collector.go:251        Starting otelcol...     {"Version": "0.51.0", "NumCPU": 12}
2022-05-12T21:28:03.547+0200    info    service/collector.go:146        Everything is ready. Begin running and processing data.
2022-05-12T21:28:27.184+0200    warn    internal/otlp_metricsbuilder.go:161     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652383706850, "target_labels": "map[__name__:up app:otelcol instance:localhost:8888 job:localmetrics]"}

Using for example label service_instance_id in regex works fine ✔ So the config is not to blame.

Steps to reproduce

Run otelcol with the configuration provided below, scraping localhost:8888 will fail.

What did you expect to see?

This was an ill-fated attempt to get rid of the new job and instances labels added in #9115 , which I guess is not possible that way because those are added by the exporter rather than by the receiver

However, using this configuration should not render the receiver broken, should it? Using another non existing label name like regex: doesnotexist does not break the config.

What did you see instead?

Scraping of all targets fails.

What version did you use?
Version: v0.51.0 , binary from asset otelcol_0.51.0_windows_amd64.tar.gz

What config did you use?

Full otelcol config yaml (>>>> CLICK ME <<<<)
exporters:
  prometheus:
    endpoint: 0.0.0.0:7299

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: localmetrics
          scrape_interval: 15s

          metric_relabel_configs:
            - action: labeldrop
              regex: job   # this or "instance" causes "Failed to scrape Prometheus endpoint"

          static_configs:
            - targets: [localhost:8888] # Self diagnostic metrics of otelcol
              labels:
                app: otelcol

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [prometheus]

  # Otel-Collector Self-Diagnostics
  telemetry:
    metrics:
      address: localhost:8888

Environment
OS: Windows 10 21H2

@Mario-Hofstaetter Mario-Hofstaetter added the bug Something isn't working label May 12, 2022
@dmitryax
Copy link
Member

cc @Aneurysm9 @dashpole

@dmitryax dmitryax added priority:p1 High comp:prometheus Prometheus related issues labels May 13, 2022
@dashpole
Copy link
Contributor

The first scrape of the self-obs endpoint can sometimes fail, as there is a race between the prometheus receiver starting up, and the self-obs endpoint starting up. If it only happens once, then it probably is just that happening.

Are there any other symptoms other than the log line?

@Mario-Hofstaetter
Copy link
Author

Mario-Hofstaetter commented May 16, 2022

@dashpole

Are there any other symptoms other than the log line?

Every metric target fails, for real. No metrics are shown in the exporter endpoint besides the following.
I used static_configs and file_sd_configs, each and every target fails. up metric is zero.

Replacing the static target with http://localhost:9182 (windows_exporter on my machine) does not change the behavior.

Logs see below. The DEBUG log prints

"error": "job or instance cannot be found from labels

(using debug loglevel for an "Unexpected error" is not optimal)

This error message can be found here:

errNoJobInstance = errors.New("job or instance cannot be found from labels")

It looks like job and instance labels are mandatory in OTLP:

if job == "" || instance == "" {
return errNoJobInstance

For the prometheus exporter (no OTLP used in the example configuration), this is somewhat inconvenient. In the prometheus world, generally the prometheus server adds the job and instance labels to its targets. Using honor_labels is not the default use case.


Changed telemetry config to:

  telemetry:
    logs:
      level: debug
    metrics:
      level: detailed
      address: localhost:8888

http://localhost:7299/metrics:

# HELP scrape_duration_seconds Duration of the scrape
# TYPE scrape_duration_seconds gauge
scrape_duration_seconds{app="otelcol"} 0.0039815
# HELP scrape_samples_post_metric_relabeling The number of samples remaining after metric relabeling was applied
# TYPE scrape_samples_post_metric_relabeling gauge
scrape_samples_post_metric_relabeling{app="otelcol"} 0
# HELP scrape_samples_scraped The number of samples the target exposed
# TYPE scrape_samples_scraped gauge
scrape_samples_scraped{app="otelcol"} 1
# HELP scrape_series_added The approximate number of new series in this scrape
# TYPE scrape_series_added gauge
scrape_series_added{app="otelcol"} 0
# HELP up The scraping was successful
# TYPE up gauge
up{app="otelcol"} 0
http://localhost:8888/metrics: (CLICK ME)
# HELP otelcol_exporter_enqueue_failed_log_records Number of log records failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_log_records counter
otelcol_exporter_enqueue_failed_log_records{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 0
# HELP otelcol_exporter_enqueue_failed_metric_points Number of metric points failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_metric_points counter
otelcol_exporter_enqueue_failed_metric_points{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 0
# HELP otelcol_exporter_enqueue_failed_spans Number of spans failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_spans counter
otelcol_exporter_enqueue_failed_spans{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 0
# HELP otelcol_exporter_sent_metric_points Number of metric points successfully sent to destination.
# TYPE otelcol_exporter_sent_metric_points counter
otelcol_exporter_sent_metric_points{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 20
# HELP otelcol_process_cpu_seconds Total CPU user and system time in seconds
# TYPE otelcol_process_cpu_seconds gauge
otelcol_process_cpu_seconds{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 2.09375
# HELP otelcol_process_memory_rss Total physical memory (resident set size)
# TYPE otelcol_process_memory_rss gauge
otelcol_process_memory_rss{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 5.0855936e+07
# HELP otelcol_process_runtime_heap_alloc_bytes Bytes of allocated heap objects (see 'go doc runtime.MemStats.HeapAlloc')
# TYPE otelcol_process_runtime_heap_alloc_bytes gauge
otelcol_process_runtime_heap_alloc_bytes{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 1.2428176e+07
# HELP otelcol_process_runtime_total_alloc_bytes Cumulative bytes allocated for heap objects (see 'go doc runtime.MemStats.TotalAlloc')
# TYPE otelcol_process_runtime_total_alloc_bytes gauge
otelcol_process_runtime_total_alloc_bytes{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 1.03036496e+08
# HELP otelcol_process_runtime_total_sys_memory_bytes Total bytes of memory obtained from the OS (see 'go doc runtime.MemStats.Sys')
# TYPE otelcol_process_runtime_total_sys_memory_bytes gauge
otelcol_process_runtime_total_sys_memory_bytes{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 3.2546008e+07
# HELP otelcol_process_uptime Uptime of the process
# TYPE otelcol_process_uptime counter
otelcol_process_uptime{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 60.008410500000004
# HELP otelcol_receiver_accepted_metric_points Number of metric points successfully pushed into the pipeline.
# TYPE otelcol_receiver_accepted_metric_points counter
otelcol_receiver_accepted_metric_points{receiver="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest",transport="http"} 20
# HELP otelcol_receiver_refused_metric_points Number of metric points that could not be pushed into the pipeline.
# TYPE otelcol_receiver_refused_metric_points counter
otelcol_receiver_refused_metric_points{receiver="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest",transport="http"} 0
Logs (CLICK ME)
.\otelcol.exe --config .\config-fail.yaml                                                                                                                                                                                                                                 in pwsh at 00:39:03
2022-05-17T00:39:08.279+0200    info    builder/exporters_builder.go:255        Exporter was built.     {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:08.326+0200    info    builder/pipelines_builder.go:224        Pipeline was built.     {"kind": "pipeline", "name": "metrics"}
2022-05-17T00:39:08.326+0200    info    builder/receivers_builder.go:226        Receiver was built.     {"kind": "receiver", "name": "prometheus", "datatype": "metrics"}
2022-05-17T00:39:08.326+0200    info    service/service.go:82   Starting extensions...
2022-05-17T00:39:08.326+0200    info    service/service.go:87   Starting exporters...
2022-05-17T00:39:08.326+0200    info    builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:08.328+0200    info    builder/exporters_builder.go:48 Exporter started.       {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:08.328+0200    info    service/service.go:92   Starting processors...
2022-05-17T00:39:08.328+0200    info    builder/pipelines_builder.go:54 Pipeline is starting... {"kind": "pipeline", "name": "metrics"}
2022-05-17T00:39:08.328+0200    info    builder/pipelines_builder.go:65 Pipeline is started.    {"kind": "pipeline", "name": "metrics"}
2022-05-17T00:39:08.329+0200    info    service/service.go:97   Starting receivers...
2022-05-17T00:39:08.329+0200    info    builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "prometheus"}
2022-05-17T00:39:08.329+0200    debug   discovery/manager.go:265        Starting provider       {"kind": "receiver", "name": "prometheus", "provider": "static/0", "subs": "map[localmetrics:{}]"}
2022-05-17T00:39:08.329+0200    debug   discovery/manager.go:299        Discoverer channel closed       {"kind": "receiver", "name": "prometheus", "provider": "static/0"}
2022-05-17T00:39:09.199+0200    info    builder/receivers_builder.go:73 Receiver started.       {"kind": "receiver", "name": "prometheus"}
2022-05-17T00:39:09.199+0200    info    service/telemetry.go:109        Setting up own telemetry...
2022-05-17T00:39:09.200+0200    info    service/telemetry.go:129        Serving Prometheus metrics      {"address": "localhost:8888", "level": "detailed", "service.instance.id": "fe3e6bf9-c1c7-4ed5-9275-692a74100e00", "service.version": "latest"}
2022-05-17T00:39:09.200+0200    info    service/collector.go:252        Starting otelcol...     {"Version": "0.48.0", "NumCPU": 4}
2022-05-17T00:39:09.200+0200    info    service/collector.go:142        Everything is ready. Begin running and processing data.
2022-05-17T00:39:14.652+0200    debug   scrape/scrape.go:1522   Unexpected error        {"kind": "receiver", "name": "prometheus", "scrape_pool": "localmetrics", "target": "http://localhost:8888/metrics", "series": "otelcol_exporter_enqueue_failed_log_records{exporter=\"prometheus\",service_instance_id=\"fe3e6bf9-c1c7-4ed5-9275-692a74100e00\",service_version=\"latest\"}", "error": "job or instance cannot be found from labels"}
2022-05-17T00:39:14.652+0200    debug   scrape/scrape.go:1307   Append failed   {"kind": "receiver", "name": "prometheus", "scrape_pool": "localmetrics", "target": "http://localhost:8888/metrics", "error": "job or instance cannot be found from labels"}
2022-05-17T00:39:14.653+0200    warn    internal/otlp_metricsbuilder.go:159     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652740754340, "target_labels": "map[__name__:up app:otelcol instance:localhost:8888 job:localmetrics]"}
2022-05-17T00:39:14.653+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: up {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:14.653+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_duration_seconds    {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:14.653+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_samples_scraped     {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:14.653+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_samples_post_metric_relabeling      {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:14.653+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_series_added        {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:29.342+0200    debug   scrape/scrape.go:1522   Unexpected error        {"kind": "receiver", "name": "prometheus", "scrape_pool": "localmetrics", "target": "http://localhost:8888/metrics", "series": "otelcol_exporter_enqueue_failed_log_records{exporter=\"prometheus\",service_instance_id=\"fe3e6bf9-c1c7-4ed5-9275-692a74100e00\",service_version=\"latest\"}", "error": "job or instance cannot be found from labels"}
2022-05-17T00:39:29.342+0200    debug   scrape/scrape.go:1307   Append failed   {"kind": "receiver", "name": "prometheus", "scrape_pool": "localmetrics", "target": "http://localhost:8888/metrics", "error": "job or instance cannot be found from labels"}
2022-05-17T00:39:29.342+0200    warn    internal/otlp_metricsbuilder.go:159     Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1652740769340, "target_labels": "map[__name__:up app:otelcol instance:localhost:8888 job:localmetrics]"}
2022-05-17T00:39:29.343+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_samples_post_metric_relabeling      {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:29.343+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_series_added        {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:29.343+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: up {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:29.343+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_duration_seconds    {"kind": "exporter", "name": "prometheus"}
2022-05-17T00:39:29.343+0200    debug   prometheusexporter@v0.48.0/accumulator.go:81    accumulating metric: scrape_samples_scraped     {"kind": "exporter", "name": "prometheus"}

otelcol_exporter_enqueue_failed_log_records happens to be the first metric of http://localhost:8888. If windows_exporter is used, the log prints the first metric name of that endpoint.

@gouthamve
Copy link
Member

The receiver needs the job and instance labels for it to work, and you'll need to upsert / remove the attributes in the attributes processor rather than drop them in the receiver itself.

@dashpole
Copy link
Contributor

In the prometheus world, generally the prometheus server adds the job and instance labels to its targets. Using honor_labels is not the default use case.

Exactly. The collector's Prometheus receiver is basically a prometheus server. Using honor_labels is expected if you are scraping from a prometheus server. The prometheus exporter is the prom equivalent of a federated Prometheus endpoint.

The issue with omitting job and instance labels from the prometheus exporter endpoint is that if you scrape more than one target (e.g. two instances of an application), those metrics will collide. The prometheus exporter would display metrics from only one of the two applications (randomly, based on scrape timings) in the example.

@gouthamve is correct that dropping the service.name and service.instance.id resource attributes would cause job and instance labels to disappear on your Prometheus exporter.

Given the current way the receiver is designed, we need a way to reference a target, which we use job and instance for.

@Mario-Hofstaetter
Copy link
Author

Mario-Hofstaetter commented May 17, 2022

@gouthamve is correct that dropping the service.name and service.instance.id resource attributes would cause job and instance labels to disappear on your Prometheus exporter.

Thank you @dashpole, @gouthamve , by using the resourceprocessor (not the attributesprocessor, that one did not work), I was able to get rid of the job label in prometheus exporter, by deleting the service.name attribute:

processors:
  resource/nojob:
    attributes:
      - key: service.name
        action: delete

As someone who is new to OTEL, this is not immediately unterstood, the prometheus exporter docs currently do not mention from which otel attributes the prometheus labels are generated.

For me this issue more or less is solved with that. Whether there will be an improved error message if someone attempts what I did in the original post (metric_relabel_configs), or not, is up to the maintainers. Maybe this was an exotic thing for me to try.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 8, 2022
@atoulme atoulme added the receiver/prometheus Prometheus receiver label Mar 12, 2023
@github-actions
Copy link
Contributor

Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot removed the Stale label May 26, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 26, 2023
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working closed as inactive comp:prometheus Prometheus related issues priority:p1 High receiver/prometheus Prometheus receiver Stale
Projects
None yet
Development

No branches or pull requests

5 participants