feat(peerEvents): add a peerEvent Identified #843

lchenut · 2023-01-17T15:58:53Z

This PR was originally creates to remove a temporary sleepAsync which was a temporary patch against races conditions which were, in theory, fixed by #807.
As #807 didn't solved the problem (explained in details in this comment), the goal of this current PR changed to add a peerEvent callback in order to safely remove the sleepAsync.

As a side note, I also remove some duplicate code. Services were indeed being stopped twice when a switch stopped generating warnings each time this happened.

kaiserd · 2024-05-02T10:30:26Z

Why is this in draft state still?

lchenut · 2024-06-13T13:22:10Z

After a bit of research on this PR, here's why it exists:
This PR removes await sleepAsync(200.millis) after await self.peerAvailable.wait(). The problem is that removing it creates a race condition. The asyncEvent peerAvailable is fired when a new peer joins. Once it's fired the loop (in the proc innerRun) restarts from the beginning until these two lines of code:

    var connectedPeers = switch.connectedPeers(Direction.Out)    
    connectedPeers.keepItIf(RelayV2HopCodec in switch.peerStore[ProtoBook][it] and

We get the newly connected peers and only keeping it if RelayV2HopCodec is in the switch.peerStore. The issue here is that the peerStore may not be up to date at this point, hence the sleepAsync(200.millis).

The way it's currently handled is certainly not a perfect solution, but if we want to remove this sleep, we either have to change when the peerStore is filled or when the handlePeerJoined callback is called. Either way, after looking at it, it'll change the call order of different asynchronous procedure and could cause other race conditions if done badly.

With that in mind, I'm not sure what to do about this PR... What do you think @diegomrsantos @kaiserd?

Edits:

Upgrade flow refactoring #807 should have solved this issue, but apparently didn't.
Another solution is to add another callback handlePeerJoinedAndIdentified (or another name) which will be called after the peerStore is filled.
The alternative is to let the sleepAsync(200.millis). Even though it's not perfect, if the identify takes longer than that, it'll fail in the same way.

diegomrsantos · 2024-06-14T11:37:54Z

libp2p/connmanager.nim


  PeerEvent* = object
    case kind*: PeerEventKind
-    of PeerEventKind.Joined:
+    of PeerEventKind.Joined, PeerEventKind.Identified:


why do we need this?

For the autorelay service, we don't need to know if we're the initiator or not. But, for other protocols using this new event, it could be interesting for them to know if the peer identified is Incoming or Outgoing. Identified and Joined are, in the end, really similar, they're just called at two different times.

It might be fine as this case is minor, but it's better to add functionality only when needed.

TBH, on second thoughts, we could have used initiator for autorelay. As we use the direction here: https://github.com/vacp2p/nim-libp2p/blob/master/libp2p/services/autorelayservice.nim#L106

So clearly, it might be needed.

diegomrsantos · 2024-06-14T11:38:14Z

libp2p/dialer.nim

@@ -195,6 +195,10 @@ proc internalConnect(
    try:
      self.connManager.storeMuxer(muxed)
      await self.peerStore.identify(muxed)
+      asyncSpawn self.connManager.triggerPeerEvents(


why asyncSpawn?

Because we do not know what's the callback will be. It could possibly be an infinite loop for example. AsyncSpawn is safer to avoid this kind of behavior.

Is it? Running an infinite loop in the background doesn't sound like a good alternative. How about we use await with a timeout?

What I mean by that is that the peerEvent is public, we do not control what libp2p user could do here.
Another point to consider is that this triggerPeerEvent(Identified) is called in important part of the code. Let's take a case where the callback do a lot of computation (let's say 200ms), every time we dial a peer or we accept a connection, 200 ms is spent waiting for the callback to finish before anything else can be done.
And last point, if you look at how the other events are called, they also use asyncSpawn (https://github.com/vacp2p/nim-libp2p/blob/master/libp2p/connmanager.nim#L310-L317)

I'm still not convinced. AFAIK, chronos is single-threaded and once this task is executed it won't yield control until it stops running. It's not executed in parallel.

In the case of the scenario of 200 milliseconds it'll slow down every dial by this amount. I think that it's really preferable to give the newly created connection to the user directly and await for these after the dial (or accept) is completed.

After a private conversation, it's indeed better to not use asyncSpawn here 👍
It's now change to await instead

libp2p/services/autorelayservice.nim

diegomrsantos · 2024-06-14T11:39:46Z

libp2p/switch.nim

@@ -204,6 +204,9 @@ proc upgrader(switch: Switch, trans: Transport, conn: Connection) {.async.} =
  let muxed = await trans.upgrade(conn, Opt.none(PeerId))
  switch.connManager.storeMuxer(muxed)
  await switch.peerStore.identify(muxed)
+  asyncSpawn switch.connManager.triggerPeerEvents(


why asyncSpawn?

Same as above.

libp2p/switch.nim

codecov-commenter · 2024-06-17T09:36:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.54%. Comparing base (02c96fc) to head (4e2d963).
Report is 25 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master     #843    +/-   ##
========================================
  Coverage   84.53%   84.54%            
========================================
  Files          91       92     +1     
  Lines       15517    16407   +890     
========================================
+ Hits        13118    13872   +754     
- Misses       2399     2535   +136

Files	Coverage Δ
libp2p/connmanager.nim	`92.75% <ø> (-0.18%)`	⬇️
libp2p/dialer.nim	`91.25% <100.00%> (+0.53%)`	⬆️
libp2p/services/autorelayservice.nim	`95.14% <100.00%> (+0.24%)`	⬆️
libp2p/switch.nim	`91.48% <100.00%> (-0.23%)`	⬇️

... and 74 files with indirect coverage changes

Revert sleepAsync

b01f187

Menduist marked this pull request as draft January 17, 2023 16:00

lchenut added 2 commits June 13, 2024 16:05

Merge remote-tracking branch 'origin/master' into revert-autorelay-sleep

fb135e1

feat: added PeerIdentified callback

22575b9

lchenut self-assigned this Jun 13, 2024

lchenut added 2 commits June 14, 2024 13:05

feat: replace new callback by a new peer event

6172cc2

chore: formatting

54da9a1

lchenut changed the title ~~Revert sleepAsync~~ feat(peerEvents): add a peerEvent Identified Jun 14, 2024

chore: formatting v2

077e55c

lchenut marked this pull request as ready for review June 14, 2024 11:28

diegomrsantos reviewed Jun 14, 2024

View reviewed changes

libp2p/services/autorelayservice.nim Outdated Show resolved Hide resolved

diegomrsantos reviewed Jun 14, 2024

View reviewed changes

libp2p/switch.nim Show resolved Hide resolved

lchenut added 2 commits June 14, 2024 13:53

chore: renames trace log

8749a03

fix: change asyncSpawn triggerEvent to await

4e2d963

Merge branch 'master' into revert-autorelay-sleep

3da05b2

diegomrsantos approved these changes Jun 21, 2024

View reviewed changes

lchenut merged commit 100f318 into master Jun 21, 2024
8 checks passed

lchenut deleted the revert-autorelay-sleep branch June 21, 2024 11:07

diegomrsantos mentioned this pull request Jun 25, 2024

fix: stop services after transports #1138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(peerEvents): add a peerEvent Identified #843

feat(peerEvents): add a peerEvent Identified #843

lchenut commented Jan 17, 2023 •

edited

Loading

kaiserd commented May 2, 2024

lchenut commented Jun 13, 2024 •

edited

Loading

diegomrsantos Jun 14, 2024

lchenut Jun 14, 2024

diegomrsantos Jun 14, 2024

lchenut Jun 14, 2024

diegomrsantos Jun 14, 2024

lchenut Jun 14, 2024

diegomrsantos Jun 14, 2024

lchenut Jun 14, 2024

diegomrsantos Jun 14, 2024

lchenut Jun 14, 2024

lchenut Jun 17, 2024

diegomrsantos Jun 14, 2024

lchenut Jun 14, 2024

codecov-commenter commented Jun 17, 2024

feat(peerEvents): add a peerEvent Identified #843

feat(peerEvents): add a peerEvent Identified #843

Conversation

lchenut commented Jan 17, 2023 • edited Loading

kaiserd commented May 2, 2024

lchenut commented Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 17, 2024

Codecov Report

lchenut commented Jan 17, 2023 •

edited

Loading

lchenut commented Jun 13, 2024 •

edited

Loading