-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Receiver: scraping metrics from multiple targets that emits metrics of the same name and label keys #4986
Comments
Does your repro differ from the example in open-telemetry/opentelemetry-collector#1076? |
I'm not sure about which repro you are referring to, but I can provide you simple steps by which you can reproduce the same error at your end.
Exactly same problem like this: https://gist.github.com/liamawhite/bc957be682fc3ae5558ef5ab1636858f |
@liamawhite I'm really stuck because of this error. Please provide some hotfix |
@arpitjindal97 Could you share a configuration example please (if it doesn't contain sensitive information)? I'm not sure what's the root cause for this bug specifically (probably the internal cache is using map based on labels as a key), but you shouldn't have multiple applications producing the same set of metric labels anyway, as this may cause issues down the pipeline for batching / aggregation etc. Collector will consider metric with the same labels as a single metric which may lead to incorrect aggregations. @bogdandrutu could you confirm this please? So ideally you'd want to add instance-specific labels to your metrics as described in open-telemetry/opentelemetry-collector#1076 (comment). This can also be solved on the Prometheus config side, e.g. for Kubernetes Pods you may define additional labels receivers:
prometheus:
config:
scrape_configs:
...
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name I suppose we could try to detect this situtation and at least make the Prometheus receiver emit errors suggesting to resolve the issue by adding unique labels. |
Sorry, I read the issue open-telemetry/opentelemetry-collector#1076 more carefully and apparently it complains about the behavior of the Prometheus Exporter (i.e. when collector exposes collected metrics via a Prometheus endpoint), not Prometheus Receiver (when collector scrapes metrics from other applications via scraping their Prometheus endpoints). @arpitjindal97 Based on your comments above I think you're having an issue with Prometheus Receiver, so it's unrelated to open-telemetry/opentelemetry-collector#1076?
As I described above, your issue seems to be completely different from that example, so please provide a specific example reproducing your issue if you think there is a bug in Prometheus Receiver which needs to be fixed. |
Local Environmentconfig.yml
otelcol Download
Grafana instance 1
Grafana instance 2
Navigate to
|
@nilebox regarding your comment https://github.com/open-telemetry/opentelemetry-collector/issues/1774#issuecomment-696400632, I agree that this issue could be addressed by additional relabeling inside the Prometheus receiver config.
I'm not exactly sure what a Prometheus server will do in this situation. Shouldn't the This is difficult to answer in practice, because if we represent counters with DELTA temporality, then there is simply not a problem interleaving points and/or having multiple identical timeseries. When representing counters with CUMULATIVE temporality, it's not possible to simply interleave points when timeseries collide: you have to aggregate these timeseries into a single cumulative series, which requires a lot more code. Abstractly speaking, I think it should be fine to have multiple targets with the same name and label keys. Concretely speaking, in a Prometheus receiver, this is difficult to reason about. |
@jmacd This is a great point. From looking at this code though, it seems that 2 pods will likely have equal OC resources but different OC nodes, since the "host" part is only used in the node, and not in the OC resource. OT model doesn't have "node", but it gets converted to OT resource attributes: https://github.com/open-telemetry/opentelemetry-collector/blob/99ea2df3c7ae9f6a24a982d897678cd5d564cffa/translator/internaldata/oc_to_resource.go#L102-L104 So theoretically it should work fine for collector grouping metrics by resource? |
Unassigned myself as I'm not actively working on Prometheus receiver anymore. |
Just want to add a note that this is still happening, even on 0328a79 |
…pen-telemetry#4817) (open-telemetry#4986) * Implement unmarshal traces with jsoniter, make 40x faster than jsonpb. Signed-off-by: Jimmie Han <hanjinming@outlook.com> * ptrace: json unmarshaller use defer * Use jsoniter unmarshaller as default trace unmarshaler. Update unit test style design. * Add kvlist support
I am fairly sure this is where it is happening: opentelemetry-collector-contrib/exporter/prometheusexporter/accumulator.go Lines 143 to 156 in 351337c
So we have an "accumulator", which stores the lastValue for each metric and then spits out the metric for conversion to Prometheus format here:
Now the problem is that the accumulator creates a signature based on only the attributes for the metric, and not the resource attributes. This signature is used to deduplicate the metrics, and this means when two targets expose the same metrics, then only one metric is stored and the other is overridden. I'll start working on a test for this and then the fix. |
Fixes: #4986 See: open-telemetry#4986 (comment) The new test fails on old code. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
…rrectly (#11463) * Duplicate metrics from multiple targets Fixes: #4986 See: #4986 (comment) The new test fails on old code. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Add changelog entry Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Fix linting issues Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Redo changelog for new process Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Reopen of open-telemetry/opentelemetry-collector#1076
This issue is not resolved and can be seen in latest version
@liamawhite
The text was updated successfully, but these errors were encountered: