-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lots of clueless "Transport endpoint is not connected" messages in ingress controller #7266
Comments
This error indicates that there was a problem accepting the inbound connection (i.e from an external client). The error message originates here: This might indicate that the client has already disconnected when the connection is processed, or could indicate some OS-level issues. |
Is there a way to know the incoming IP and port of this inbound connection? I also search the Nginx access logs, but it was silent after disabling external health checks, and the proxy warnings were still happening at the same rate. |
These connections aren't making it to nginx. You can increase the proxy's log level by setting a pod annotation like:
This will configure the proxy to include the client address on these log lines. |
I tried searching pods with IP 10.244.0.146 and .123 (
|
Looking a little more closely at this: we won't actually have client IPs associated with the Either way, though, this is an OS error encountered when the proxy is accepting the inbound connection (i.e. before the connection is processed or forwarded to nginx). I've put up a branch that should at least improve the error messages a bit. If you want to try this build, you can use the following pod annotation:
|
Thank you for the branch. Now, we can see the client IPs in error messages. However, they are the same as I noticed before: the public IP address, and 2 private ones, which I can't find by running
|
Is If everything's working properly except for the log messages, you can disable these warnings by setting a log level like |
No, nodes have 10.131 internal IPs. Except the public IP in the logs that matches one of the nodes. Yes, we have some services with probes and all of them are in running state. About disabling the logs, I'm not sure whether this issue is related to some timeouts between nginx and one of our services. I'm still not confident on ignoring it. However, the logs can't be more helpful than the ones from your branch, I'm afraid. I still don't know who are 10.244.0.146 and 10.244.0.123. Maybe this is the only way to understand what really is going on. |
Maybe these IP addresses are from DOKS managed master nodes and I should contact them. |
For the warnings like:
These connections are definitely not reaching nginx. This is an error from the operating system when the proxy tried to accept the connection before passing it to nginx. It's notable that there are other similar errors on other connections immediately preceding these accept errors, though:
This may be a |
Thank you very much for your support. I'll close this issue and reach DO. |
Bug Report
What is the issue?
In the nginx ingress controller proxy, there 2 messages every 3 seconds:
I have no idea which endpoint it is talking about. I tried to find something related in the debug proxy, but I couldn't.
I disabled external health checks and the messages didn't stop, so the connections are probably from inside the cluster.
How can it be reproduced?
Maybe creating another cluster with the same environment (see Environment section).
Logs, error output, etc
(If the output is long, please create a gist and
paste the link here.)
linkerd check
outputEnvironment
Possible solution
Improve logs to tell the 2 ends of the failing connection?
Additional context
Just some context on why I want to solve this warning message. Sometimes, once in a couple of days, linkerd fails to contact a service due to timeout. However, the service logs show no error. I'm not sure whether these warnings are related to the timeout error, but I'm trying to solve every warning (and error) message I find.
The text was updated successfully, but these errors were encountered: