-
Notifications
You must be signed in to change notification settings - Fork 75
[FIX] Fix retry config option #827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 seconds seems very low for anything to change in circustances. I see the other project modified it to 30 seconds and that might work better with geth
which limits incoming connection attempts from an IP to 30 seconds as well.
Another option would be to set this to 30 seconds and connect-max-retries
to 0, so by default it won't waste time since we know the most probable reason for not connecting is TooManyPeers
and just want to move on quickly to another node.
Okay I see that if the remote peer actually disconnected then this retry doesn't kick in. So it only applies if we failed to reach the other node completely? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on RLPxConnectionHandler
it seems like the retry logic only kicks in if connection or sending a message fails, not when a disconnect message is received.
I think 5 seconds may be too low; perhaps it should be differentiated between initial connection failing vs a write failing to an already established connection, so you can quickly dismiss false leads and try to give more time to recover from temporary glitches with nodes that worked before but are perhaps restarting. But if this has a positive impact then I don't see much harm in it.
This reconnect kicks in when rlpx connection failed either:
If we received disconnect then we then we just finish up. One case it maybe useful to have the reconnect set to Thats why i thought
Overall as i mention whole mechanism should be improved a little, but taking into account upcoming release this change semed as low hanging fruit to speed up peer connection establishment. |
Description
Our
PeerActor
uses this option to retry outgoing connection when initial try fails. Setting it to1 minute
means that this actor sits and do nothing for 1 minute, while takingPeerManagerActor
outgoing connection slot. It can slow down acquiring new connections. In our other project we also tweaked that option.In longer perspective we should probably delete this mechanism, and make
PeerManagerActor
one thing responsible for initiating connections and re-connection.