-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PubSub subscribe on connection race condition #809
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## unstable #809 +/- ##
===========================================
Coverage ? 83.69%
===========================================
Files ? 82
Lines ? 14907
Branches ? 0
===========================================
Hits ? 12476
Misses ? 2431
Partials ? 0 |
@@ -382,7 +380,8 @@ method subscribePeer*(p: PubSub, peer: PeerId) {.base, gcsafe.} = | |||
## messages | |||
## | |||
|
|||
discard p.getOrCreatePeer(peer, p.codecs) | |||
let pubSubPeer = p.getOrCreatePeer(peer, p.codecs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a remote peer pubsubpeer
but with the local peer pubsub.codecs
and adds the remote peer to the local peer's peers table. When this executes https://github.com/vacp2p/nim-libp2p/blob/master/libp2p/protocols/pubsub/pubsub.nim#L379 the remote peer continues with the wrong codecs, as it's already present in the table.
This PR fixes a subtle race condition that I discovered while working on #807. Since 807 renders the upgrade "faster" (less callback and asyncSpawn etc), it's highlights some bugs, which is cool
I think this race condition can also happen without 807.
1 - A peer connect to us, and opens the pubsub stream very fast, preloaded with the list of topic its subscribed to
2 - We see that we have a topic in common, so we GRAFT him
3 - "sendConn" stream is setup.
Most of the time, 3 happens before 2. But when 2 happens first, the GRAFT is simply swallowed by this:
https://github.com/status-im/nim-libp2p/blob/1711c204eab2596e4bd42b162d630bc262704daf/libp2p/protocols/pubsub/pubsubpeer.nim#L253
and we end up with an assymetric mesh.
This PR simply wait for both direction to be setup before handling the peer messages.
It's a simplistic fix, and doesn't handle more complex cases (ie, the send stream getting closed, but not the receive one)
The better fix would be to get rid of this silent swallow, and make sure it never happens, but that's a much bigger effort