-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus receiver fails on federate endpoint when job and instance labels are missing #32555
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Shouldn't a federated endpoint always have job + instance labels? |
I am not sure, it is not the case for aggregated in-clsuter metrics on OpenShift. Is the requirement documented somewhere? |
How do you know that honor_labels is the cause? The error message indicates that the receiver failed to scrape the endpoint, and looks like it already has a job + instance label. I believe you need to enable debug logging to see the detailed error message, since it is in the prometheus server code at debug level. |
After switching |
Interesting... Can you enable debug logs to see what the error is that causes the scrape to fail? |
There is only warning log that is pasted in the first comment. |
I was able to reproduce it again for scraping a different metric:
The metric
collector logs
|
Hi all, I have been looking a bit into this, and can confirm that this error does appear for aggregated metrics which do not have a prometheus:
config:
scrape_configs:
- job_name: 'federate'
scrape_interval: 10s
honor_labels: false
params:
'match[]':
- '{__name__="cluster:node_cpu:sum_rate5m"}'
metrics_path: '/federate'
static_configs:
- targets:
- "prometheus-k8s.monitoring:9090" The exact location where the error is happening seems to be in the opentelemetry-collector-contrib/receiver/prometheusreceiver/internal/transaction.go Line 368 in f61e5c1
Here it detects that, due to being an aggregated metric, not both of these labels are available.
Now, the question is how to approach fixing this. Would it make sense to, if |
I'm on-board with that solution |
Thanks for the swift response, I will keep you posted when I have a PR ready |
alright, I have just created a draft PR to address this: #33565 @dashpole @Aneurysm9 feel free to have a look when you have a moment to spare to see if this goes into the right direction |
…h no job/instance label (#33565) **Description:** This PR fixes the retrieval of metrics where either the `job` or `instance` label is missing, and `honor_labels` is set to `true`. This can be the case for aggregated metrics coming from a federate endpoint. This PR introduces a fallback to using the `job`/`instance` labels from the scrape config for such metrics. **Link to tracking Issue:** Fixes #32555 **Testing:** - Added a Unit test - Verified using the following config: ``` receivers: prometheus: config: scrape_configs: - job_name: 'federate' scrape_interval: 10s honor_labels: true params: 'match[]': - '{__name__="cluster:node_cpu:sum_rate5m"}' metrics_path: '/federate' static_configs: - targets: - "localhost:9090" exporters: debug: verbosity: detailed otlphttp: endpoint: ${env:OTLP_ENDPOINT} pipelines: metrics: receivers: [otlp,prometheus] exporters: [otlphttp, debug] ``` This was tested on a `kind` K8s cluster running the prometheus operator, with a port forward for the `prometheus-k8s` service created by the prometheus operator (therefore the `localhost:9090` address in the target). --------- Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com> Co-authored-by: David Ashpole <dashpole@google.com>
Component(s)
receiver/prometheus
What happened?
Description
The Prometheus receiver fails to scrape the federate endpoint when
honor_labels: true
and the target metric does not haveinstance
andjob
labels, which can be the case for aggregated metrics.Log message:
Steps to Reproduce
Expected Result
Working scraping of the federate endpoint with
honor_labels: true
for metrics that don't haveinstance
andjob
labels.Actual Result
From
opentelemetry-collector-contrib/receiver/prometheusreceiver/internal/transaction.go
Line 129 in 13fca79
Collector version
0.93.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
Kubernetes
OpenTelemetry Collector configuration
Log output
Additional context
honor_labels: true
https://prometheus.io/docs/prometheus/latest/federation/The text was updated successfully, but these errors were encountered: