-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[prometheusreceiver] Scrape fails when trying to labeldrop job or instance labels in metric_relabel_configs #9986
Comments
The first scrape of the self-obs endpoint can sometimes fail, as there is a race between the prometheus receiver starting up, and the self-obs endpoint starting up. If it only happens once, then it probably is just that happening. Are there any other symptoms other than the log line? |
Every metric target fails, for real. No metrics are shown in the exporter endpoint besides the following. Replacing the static target with Logs see below. The
(using debug loglevel for an "Unexpected error" is not optimal) This error message can be found here:
It looks like opentelemetry-collector-contrib/receiver/prometheusreceiver/internal/otlp_transaction.go Lines 106 to 107 in 3e9cf6c
For the prometheus exporter (no OTLP used in the example configuration), this is somewhat inconvenient. In the prometheus world, generally the prometheus server adds the Changed telemetry:
logs:
level: debug
metrics:
level: detailed
address: localhost:8888
http://localhost:8888/metrics: (CLICK ME)# HELP otelcol_exporter_enqueue_failed_log_records Number of log records failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_log_records counter
otelcol_exporter_enqueue_failed_log_records{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 0
# HELP otelcol_exporter_enqueue_failed_metric_points Number of metric points failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_metric_points counter
otelcol_exporter_enqueue_failed_metric_points{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 0
# HELP otelcol_exporter_enqueue_failed_spans Number of spans failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_spans counter
otelcol_exporter_enqueue_failed_spans{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 0
# HELP otelcol_exporter_sent_metric_points Number of metric points successfully sent to destination.
# TYPE otelcol_exporter_sent_metric_points counter
otelcol_exporter_sent_metric_points{exporter="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 20
# HELP otelcol_process_cpu_seconds Total CPU user and system time in seconds
# TYPE otelcol_process_cpu_seconds gauge
otelcol_process_cpu_seconds{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 2.09375
# HELP otelcol_process_memory_rss Total physical memory (resident set size)
# TYPE otelcol_process_memory_rss gauge
otelcol_process_memory_rss{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 5.0855936e+07
# HELP otelcol_process_runtime_heap_alloc_bytes Bytes of allocated heap objects (see 'go doc runtime.MemStats.HeapAlloc')
# TYPE otelcol_process_runtime_heap_alloc_bytes gauge
otelcol_process_runtime_heap_alloc_bytes{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 1.2428176e+07
# HELP otelcol_process_runtime_total_alloc_bytes Cumulative bytes allocated for heap objects (see 'go doc runtime.MemStats.TotalAlloc')
# TYPE otelcol_process_runtime_total_alloc_bytes gauge
otelcol_process_runtime_total_alloc_bytes{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 1.03036496e+08
# HELP otelcol_process_runtime_total_sys_memory_bytes Total bytes of memory obtained from the OS (see 'go doc runtime.MemStats.Sys')
# TYPE otelcol_process_runtime_total_sys_memory_bytes gauge
otelcol_process_runtime_total_sys_memory_bytes{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 3.2546008e+07
# HELP otelcol_process_uptime Uptime of the process
# TYPE otelcol_process_uptime counter
otelcol_process_uptime{service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest"} 60.008410500000004
# HELP otelcol_receiver_accepted_metric_points Number of metric points successfully pushed into the pipeline.
# TYPE otelcol_receiver_accepted_metric_points counter
otelcol_receiver_accepted_metric_points{receiver="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest",transport="http"} 20
# HELP otelcol_receiver_refused_metric_points Number of metric points that could not be pushed into the pipeline.
# TYPE otelcol_receiver_refused_metric_points counter
otelcol_receiver_refused_metric_points{receiver="prometheus",service_instance_id="ff7cd12a-04db-4cf5-b515-19e73b9361ef",service_version="latest",transport="http"} 0 Logs (CLICK ME)
|
So I would say this is working as expected. So the way to do this is by setting the correct attributes using the https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/attributesprocessor |
The receiver needs the |
Exactly. The collector's Prometheus receiver is basically a prometheus server. Using The issue with omitting job and instance labels from the prometheus exporter endpoint is that if you scrape more than one target (e.g. two instances of an application), those metrics will collide. The prometheus exporter would display metrics from only one of the two applications (randomly, based on scrape timings) in the example. @gouthamve is correct that dropping the Given the current way the receiver is designed, we need a way to reference a target, which we use job and instance for. |
Thank you @dashpole, @gouthamve , by using the resourceprocessor (not the processors:
resource/nojob:
attributes:
- key: service.name
action: delete As someone who is new to OTEL, this is not immediately unterstood, the prometheus exporter docs currently do not mention from which otel attributes the prometheus labels are generated. For me this issue more or less is solved with that. Whether there will be an improved error message if someone attempts what I did in the original post ( |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Describe the bug
Using
or
causes
prometheusreceiver
to fail all targets. Full config example below. Log output:Using for example label
service_instance_id
inregex
works fine ✔ So the config is not to blame.Steps to reproduce
Run otelcol with the configuration provided below, scraping
localhost:8888
will fail.What did you expect to see?
This was an ill-fated attempt to get rid of the new
job
andinstances
labels added in #9115 , which I guess is not possible that way because those are added by theexporter
rather than by thereceiver
However, using this configuration should not render the receiver broken, should it? Using another non existing label name like
regex: doesnotexist
does not break the config.What did you see instead?
Scraping of all targets fails.
What version did you use?
Version:
v0.51.0
, binary from assetotelcol_0.51.0_windows_amd64.tar.gz
What config did you use?
Full otelcol config yaml (>>>> CLICK ME <<<<)
Environment
OS: Windows 10 21H2
The text was updated successfully, but these errors were encountered: