-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarting Lighthouse sometimes stalls due to in-use sockets #2254
Comments
This may also be the source of some CI failures:
https://github.com/sigp/lighthouse/runs/2001822504?check_suite_focus=true |
Turns out libp2p has From the socket docs for
We always bind to From what I have gathered from the stack overflow thread you linked above and https://stackoverflow.com/a/14388707/4914568 Maybe ensuring that we close all libp2p listener streams(https://docs.rs/libp2p/0.35.1/libp2p/swarm/struct.ExpandedSwarm.html#method.remove_listener) before dropping libp2p would ensure that we don't have any hanging connections? |
Properly closing all connections and using |
I just upgraded to v1.3.0 and I did not have this issue compared to recent previous versions. Was this recently fixed? |
Not as far as I'm aware. As @pawanjay176 pointed out we're already using We could maybe try |
We've had a number of version updates since this, I've not witnessed this issue. Lets re-open if it arises again. |
Several users have reported seeing this recently so I'm going to reopen it (related GitHub issue: #3500). |
Still borked
I've stopped lighthouse half a day ago and can't start it back up, just keeps failing with this. |
@karalabe Half a day seems too long for socket reuse. Are you 100% you don't have something listening on 9000/9001? Since v4.5.0 we also listen on 9001 UDP for QUIC. If there's something listening, it should show under
|
This seems weird. Seems there's a "thing" in macos, where sometimes a UDP port survives the owner process terminating... and there seems to be no way of closing it afterwards. With netstat, I can indeed see UDP 9000 open:
However, lsof does not find port 9000 assigned to any process. Seems this thing is a recurring event on macos and there's no real answer anywhere: https://stackoverflow.com/questions/40512274/release-udp-port-used-by-dead-process-on-os-x |
Hah, dafuq ethereum/go-ethereum#18443 |
Description
Some users have reported that Lighthouse cannot be restarted quickly due to TCP ports not being freed immediately after process exit. After a bit of research, it seems that this is a consequence of TCP's design, and that most operating systems wait 30-120 seconds after socket closure in order to avoid delayed packets being sent to a new listener. This thread has a good summary: https://stackoverflow.com/questions/3229860/what-is-the-meaning-of-so-reuseaddr-setsockopt-option-linux/3233022#3233022
If we establish that Lighthouse's networking stack is robust against delayed packets, we could opt into receiving them by setting the
SO_REUSEADDR
flag when binding TCP sockets. Actually doing this could be a bit tricky, because we might have to punch through Tokio & LibP2P's abstractions, but perhaps they already provide configuration options.Until then, anyone who experiences issues rebinding sockets can wait out the
TIME_WAIT
period. You can see sockets in this state using a command like:Version
v1.1.3, likely v1.2.0 as well
The text was updated successfully, but these errors were encountered: