-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak with distributed tracing enabled #13990
Comments
Memory usage (master-1 is the member with tracing enabled around 1:00 PM CEST on the graph - middle of the time line there): pprof heap snapshots without and with tracing enabled taken ever 1 hours (starting just after starting |
I can't find the |
No custom commits by me. This is the etcd distributed as part of OKD |
It looks like Openshift customized the etcd? @hexfusion could you confirm this? I just downloaded the official 3.5.0, and did a quick verification below.
|
cc @lilic |
I can confirm its not an upstream binary, this is the downstream repo the build comes from[1]. The changes would be minimal to etcd itself. 3.5.0 uses a pretty old version of otel (pre v1) so its possible that they had a bug as well. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Required to graduate the distributed tracing. |
ping @baryluk |
Hey @baryluk, can you confirm that issue was addressed? You closed the issue as "not planned" so I wanted to double check. |
cc @dashpole |
@serathius I was not able to reproduce the issue. |
Great, closing issue as fixed. Thanks for looking into this. |
What happened?
Adding
--experimental-enable-distributed-tracing
works, but causes a memory leak, of about 1GB per hour in our setup. Instead of expected ~2GB, it got to about 12GB in 7 hours.What did you expect to happen?
Stable memory usage around 1.8-2.0 GB of RSS.
How can we reproduce it (as minimally and precisely as possible)?
Run with
--experimental-enable-distributed-tracing
for few hours. It is sufficient to enable it on one member.Anything else we need to know?
The tracing collector endpoint doesn't need to be configured or listening. Having
otelcol
on 4317 doesn't change anything (beyond actually making tracing work).Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
etcd, Kubernetes, OKD / Openshift 4.9, 3 members.
etcd --experimental-enable-distributed-tracing --logger=zap --log-level=info --initial-advertise-peer-urls=https://10.10.0.102:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-1.example.com.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-1.example.com.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-1.example.com..crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-1.example.com.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://10.10.0.102:2379 --listen-client-urls=https://0.0.0.0:2379,unixs://10.10.0.102:0 --listen-peer-urls=https://0.0.0.0:2380 --metrics=extensive --listen-metrics-urls=https://0.0.0.0:9978
running in
cri-o
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No fatal issues in the logs.
The text was updated successfully, but these errors were encountered: