Peers Cannot Successfully Handshake #70
Description
In local testing with NOISE
enabled, I get failed to negotiate security protocol: error reading handshake message: noise: message is too short
often. For a background on this issue and more context: OffchainLabs/prysm#5372 .
This happens often enough with two peers trying to connect and performing a NOISE handshake with each other. On further exploration this was because each peer regarded themselves as the initiator of the handshake in the same connection, which lead to malformed handshake responses being produced. This was because each peer read the initiating payload from the other peer as a response to their initial handshake. I am not sure how this happens, this would point to the issue being in Swarm rather than Noise. To clarify this only happens when each peer dials each other at the same time.
I explored this issue in SECIO too, and did notice that there were instances where both peers dialed each other the same time and shared the same connection, instead of having separate connections for each respective dial. However SECIO differs from NOISE in the fact that there is no fixed initiator
or responder
. Each peer is expected to write the initial proposal payload into the stream at the same time, and also read the corresponding peer's proposal from the same stream. Therefore SECIO wouldn't fail from the same issue as NOISE.
To reproduce this you can run the following test:
func createHost(t *testing.T, port int) (host.Host, net.IP) {
ipAddr = net.ParseIP("127.0.0.1")
listen, err := multiaddr.NewMultiaddr(fmt.Sprintf("/ip4/%s/tcp/%d", ipAddr, port))
if err != nil {
t.Fatalf("Failed to p2p listen: %v", err)
}
h, err := libp2p.New(context.Background(), []libp2p.Option{libp2p.ListenAddrs(listen), libp2p.Security(noise.ID, noise.New)}...)
if err != nil {
t.Fatal(err)
}
return h, ipAddr
}
func TestPeer_FailedHandshake(t *testing.T) {
h1, ipaddr := createHost(t, 5000)
defer h1.Close()
h1Addr, err := multiaddr.NewMultiaddr(fmt.Sprintf("/ip4/%s/tcp/%d/p2p/%s", ipaddr, 5000, h1.ID()))
if err != nil {
t.Fatal(err)
}
addrInfo1, err := peer.AddrInfoFromP2pAddr(h1Addr)
if err != nil {
t.Fatal(err)
}
h2, ipaddr := createHost(t, 5001)
defer h2.Close()
h2Addr, err := multiaddr.NewMultiaddr(fmt.Sprintf("/ip4/%s/tcp/%d/p2p/%s", ipaddr, 5001, h2.ID()))
if err != nil {
t.Fatal(err)
}
addrInfo2, err := peer.AddrInfoFromP2pAddr(h2Addr)
if err != nil {
t.Fatal(err)
}
wg := new(sync.WaitGroup)
go func() {
wg.Add(1)
for i := 0; i < 20; i++ {
if err := h1.Connect(context.Background(), *addrInfo2); err != nil {
t.Fatal(err)
}
if len(h1.Network().Peers()) != 1 {
t.Error("Peer not connected, has 0 peers")
}
if err := h1.Network().ClosePeer(h2.ID()); err != nil {
t.Fatal(err)
}
if len(h1.Network().Peers()) != 0 {
t.Error("Peer not disconnected, has 1 peer")
}
}
wg.Done()
}()
// use another peer to constantly connect/disconnect with first peer.
for i := 0; i < 20; i++ {
if err := h2.Connect(context.Background(), *addrInfo1); err != nil {
t.Fatal(err)
}
if len(h2.Network().Peers()) != 1 {
t.Error("Peer not connected, has 0 peers")
}
if err := h2.Network().ClosePeer(h1.ID()); err != nil {
t.Fatal(err)
}
if len(h2.Network().Peers()) != 0 {
t.Error("Peer not disconnected, has 1 peer")
}
}
wg.Wait()
}
You can split out noise for secio, and you will notice that the error disappears