-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receiving ssl handshake error from kubernetes APIserver and opentelemetry webhook #2956
Comments
This is certainly odd... have you opened the port in your cluster's firewall rules? We have this recommendation for GKE, but it's probably a similar issue for EKS... |
There is no firewalling issue in different layers, none in CNI, none in API-server. worth mentioning the webhook server is already working! for example, if you try to configure opentelemetryCollector with a wrong configuration, it either get error or get mutated by the webhook-server. |
i see... I haven't seen this issue before. I wonder if your API server is unhealthy actually given its a bunch of connection resets. |
Same problem here |
same here |
@melquisedequecosta98 @parkedwards can you share:
|
Hello. I managed to resolve this issue by removing the entire operator and installing it again. ############################################################################## operator-sdk olm uninstall kubectl get mutatingwebhookconfiguration -A kubectl delete mutatingwebhookconfiguration minstrumentation.kb.io- mopampbridge.kb.io- mopentelemetrycollectorbeta.kb.io-wrrtn mpod.kb.io- (Change for names from your eks) kubectl get validatingwebhookconfiguration -A kubetl delete validatingwebhookconfiguration vinstrumentationcreateupdate.kb.io- vinstrumentationdelete.kb.io- vopampbridgecreateupdate.kb.io- vopampbridgedelete.kb.io- vopentelemetrycollectorcreateupdatebeta.kb.io- vopentelemetrycollectordeletebeta.kb.io- (Change for names from your eks) operator-sdk olm install ############################################################################## The problem is that "mutatingwebhookconfiguration" and "validatingwebhookconfiguration" were causing some kind of conflict with TLS and are not removed with "operator-sdk olm uninstall" and need to be removed by hand. |
The following actions are already done but the issue still exist.
But I found a few more issues related to the same error in the other operators and services (e.g. gatekeeper): the issue seems to be related to the connection pool in the newer golang versions, I am not sure, but at least that's what I've got from tracking multiple issues. |
i am also facing the same issue in GKE, is anyone can help us here, we are using these versions helm version:- 3.14 |
When I was seeing these errors, the we were seeing OOM errors in our deployment of the operator. Once we increase the resources (memory) this issue appears to have gone away |
I'm also experiencing this problem in my self-built k8s cluster.
|
opentelemetry-operator pod log,My error logs I don't understand why this error is reported {"level":"INFO","timestamp":"2024-11-12T20:43:03.203093427Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"} |
Component(s)
No response
Describe the issue you're reporting
Description:
We are observing some ssl handshake errors between otel operator on default port 9443 (webhook server) and internal IPs of kubernetes API server.
Steps to reproduce:
Deploying operator helm chart with only a few changes(for our use case) including:
Expected Result:
The opentelemetry operator and collectors works fine, But we are receiving the following logs from the operator pod saying a TLS handshake error happening time to time between API-server and otel operator webhook server. We couldn't see any issue though on validatiingWebhook and MutatingWebhook and they both seems working fine.
10.40.76.248 is the internal service IP of kubernetes API server
10.40.99.143 is the pod ip of the opentelemetry operator
Troubleshooting steps:
To make sure there is no rate limit happening between API server and otel operator, we've checked the API-server logs as well as priority and fairness for handling requests by API-server and we didn't observe anything subspecies behaviour there:
The certificate generated for otel operator also checked and it is valid:
Test environment:
Kubernetes version: v1.27.13-eks-3af4770
Provider: EKS
Operator version: 0.96.0
The text was updated successfully, but these errors were encountered: