-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opentelemetry-operator manager crashes during instrumentation injection attempt #3303
Comments
does the manager pod have any reason for its crash? OOMKilled maybe? I haven't been able to reproduce this. |
There was no reason at all. It just died and a new pod started. I performed a similar deployment on a Minikube and it works fine but crashes on our production Kubernetes. |
you can follow the guide here on how to enable debug logs https://github.com/open-telemetry/opentelemetry-operator/blob/main/DEBUG.md, is it possible the operator doesn't have the permission to do mutation? |
We already added The operator gives itself the required permissions, so it's probably not the problem. We use the default resources. The only possible reason for the error I can think of is that the cluster has no direct Internet access, but it can pull Docker images from the configured Docker proxy. |
We upgraded the operator version to 0.110.0 and the chart to 0.71.0 and it still crashes with absolutely no details in the log nor in the describe section of the pod. these are the parameters we use to deploy the helm chart:
|
@omerozery Can you share any logs from the operator? Given you have debug logs enabled you should be seeing something. |
@jaronoff97 it's the same issue I described above. You can see the whole log up to the point that the service crashes in the issue description. |
Sorry, that was unclear from Omer's comment. Without more information or a way to reproduce this, there's not much I can do to assist unfortunately. If you'd like, we can follow up in slack (CNCF slack, #otel-operator channel) and go through some more specific kubernetes debugging steps? |
Maybe there was a nil pointer exception and the operator pod was restarted. I would suggest using |
That's what we did. See the log above. |
Can you share the |
We found the OOMKILL in the dmesg (kernel messages) of the host, not the kubelet (kubernetes level) and not in the containerd (the containerd level). Just to be clear there was plenty of free memory on the k8s host. We run telemetry with the default configuration, so I guess the problem is there. |
I do not know how to reproduce it, but adding resources solved the problem for us. |
Component(s)
auto-instrumentation
What happened?
Description
opentelemetry-operator manager crashes
Steps to Reproduce
Expected Result
A side-car is added to the pod and the service is instrumented with open-telemetry.
Actual Result
opentelemetry-operator crashes with the log seen below.
Kubernetes Version
1.25
Operator version
v0.109.0
Collector version
v0.69.0
Environment information
Environment
OS: Rocky Linux 9.3
Log output
Additional context
There are no additional log messages. The manager just disappears.
The text was updated successfully, but these errors were encountered: