[FIX] Fix retry config option #827

KonradStaniec · 2020-12-03T10:13:25Z

Description

Our PeerActor uses this option to retry outgoing connection when initial try fails. Setting it to 1 minute means that this actor sits and do nothing for 1 minute, while taking PeerManagerActor outgoing connection slot. It can slow down acquiring new connections. In our other project we also tweaked that option.

In longer perspective we should probably delete this mechanism, and make PeerManagerActor one thing responsible for initiating connections and re-connection.

aakoshh

5 seconds seems very low for anything to change in circustances. I see the other project modified it to 30 seconds and that might work better with geth which limits incoming connection attempts from an IP to 30 seconds as well.

Another option would be to set this to 30 seconds and connect-max-retries to 0, so by default it won't waste time since we know the most probable reason for not connecting is TooManyPeers and just want to move on quickly to another node.

aakoshh · 2020-12-03T11:03:15Z

Okay I see that if the remote peer actually disconnected then this retry doesn't kick in. So it only applies if we failed to reach the other node completely?

aakoshh

Based on RLPxConnectionHandler it seems like the retry logic only kicks in if connection or sending a message fails, not when a disconnect message is received.

I think 5 seconds may be too low; perhaps it should be differentiated between initial connection failing vs a write failing to an already established connection, so you can quickly dismiss false leads and try to give more time to recover from temporary glitches with nodes that worked before but are perhaps restarting. But if this has a positive impact then I don't see much harm in it.

KonradStaniec · 2020-12-03T11:20:46Z

This reconnect kicks in when rlpx connection failed either:

during trying to establish it or
if we had conneciton established and it died.

If we received disconnect then we then we just finish up.

One case it maybe useful to have the reconnect set to 1 is when we had connection established and it died (lets say remote peer restarted), then instead of going to PeerManagerActor we just re-connect from PeerActor level, although I am not sure that this case is justification for slowing down the connection establishment.

Thats why i thought 5s is good middle ground i.e

in case of lost connection we will try to quickly re-establish it
during initial establishment we will go quickly over failure

Overall as i mention whole mechanism should be improved a little, but taking into account upcoming release this change semed as low hanging fruit to speed up peer connection establishment.

…ig-option

[FIX] Fix retry config option

69861ad

KonradStaniec requested a review from ntallar December 3, 2020 10:13

jmendiola222 requested a review from aakoshh December 3, 2020 10:20

aakoshh reviewed Dec 3, 2020

View reviewed changes

aakoshh approved these changes Dec 3, 2020

View reviewed changes

Merge remote-tracking branch 'origin/develop' into fix/fix-retry-conf…

23cad31

…ig-option

KonradStaniec merged commit 23cad31 into develop Dec 3, 2020

KonradStaniec deleted the fix/fix-retry-config-option branch December 3, 2020 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Fix retry config option #827

[FIX] Fix retry config option #827

KonradStaniec commented Dec 3, 2020

aakoshh left a comment •

edited

Loading

aakoshh commented Dec 3, 2020

aakoshh left a comment •

edited

Loading

KonradStaniec commented Dec 3, 2020

[FIX] Fix retry config option #827

[FIX] Fix retry config option #827

Conversation

KonradStaniec commented Dec 3, 2020

Description

aakoshh left a comment • edited Loading

Choose a reason for hiding this comment

aakoshh commented Dec 3, 2020

aakoshh left a comment • edited Loading

Choose a reason for hiding this comment

KonradStaniec commented Dec 3, 2020

aakoshh left a comment •

edited

Loading

aakoshh left a comment •

edited

Loading