-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error encoding and sending metric family: write tcp 172.31.204.123:8889->172.31.42.221:60282: write: broken pipe #32371
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hello @zhulei-pacvue, did this error happen on startup, or after the collector had been running for some time? Have you seen this happen repeatedly when running the collector, or was this only one time? Can you share more about what kind of environment the collector was running in? |
@crobert-1 Thank you! This error often occurs after the collector has been running for some time. After the service is restarted, it can run normally. |
We're experiencing the same issue:
Usually there's about 10 or 20 such logs that happen at about the same time. Some of then happen at the exact same time, while others are a few ms apart. In the last week alone, this happens about once or twice a day: We're running this on OpenShift using Kubernetes v1.26.13+8f85140. The OpenTelemetry collector runs as a container in a pod with a Quarkus service sending metrics and traces to the OpenTelemetry Collector. Here is the Kubernetes Deployment resource:
The OpenTelemetry Collector config:
CPU and memory usage of the OpenTelemetry Collector shows nothing abnormal when the error occurs: |
Same issue since we upgraded from 0.88.0 to 0.100.0 |
Hi @crobert-1 , is there any ongoing effort to fix/mitigate this issue?
We also tried to upgrade to latest release 0.97.0 - didn't change. We have the otel-collector running on a dedicated node without hard limit, only requests:
Memory usage & CPU usage are well under limit. Seems it starts happening when we increase the metrics ingest volume. Is there any fix/mitigation for this issue? via a param or config change? Thanks |
My apologies, I'm largely unfamiliar with this component and its functionality. I'm not aware of any ongoing effort to address this. @Aneurysm9 do you have any suggestions here? |
Adding an extra datapoint. I stumbled upon this issue when I configured a k8s livenessProbe pointing to the prometheusexport endpoint. |
getting to this now. Based on https://groups.google.com/g/prometheus-users/c/7UFP7MVJjRk/m/gB7R6goxAwAJ, it sounds like this happens when the client disconnects before the metrics are returned. Maybe you should increase the timeout for the client calling the exporter? Otherwise, the best we could do here is probably silence the error message by removing this line, or offer a config option to do that:
|
I suspect this issue is caused by k8s liveness and readiness probes, which may close the connection before the response has been written. I think otherwise, you would want to know if this is happen, but in this case this seems like the intended behavior. Is there some path or URL parameter we can use for probing that will skip sending metrics but still be a good indicator of health? |
Component(s)
exporter/prometheus
Describe the issue you're reporting
When I use Prometheusexporter, otelcol frequently reports errors as follows:
2024-04-15T01:39:07.597Z error prometheusexporter@v0.97.0/log.go:23 error encoding and sending metric family: write tcp 172.31.204.123:8889->172.31.42.221:60282: write: broken pipe
{"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.97.0/log.go:23
github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1.2
github.com/prometheus/client_golang@v1.19.0/prometheus/promhttp/http.go:192
github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
github.com/prometheus/client_golang@v1.19.0/prometheus/promhttp/http.go:210
net/http.HandlerFunc.ServeHTTP
net/http/server.go:2166
net/http.(*ServeMux).ServeHTTP
net/http/server.go:2683
go.opentelemetry.io/collector/config/confighttp.(*decompressor).ServeHTTP
go.opentelemetry.io/collector/config/confighttp@v0.97.0/compression.go:160
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.49.0/handler.go:225
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.49.0/handler.go:83
net/http.HandlerFunc.ServeHTTP
net/http/server.go:2166
go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
go.opentelemetry.io/collector/config/confighttp@v0.97.0/clientinfohandler.go:26
net/http.serverHandler.ServeHTTP
net/http/server.go:3137
net/http.(*conn).serve
net/http/server.go:2039
Version:
otelcol:0.97.0
Importent configurations:
The text was updated successfully, but these errors were encountered: