-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Specification
There seems to be a problem with the GRPC client calls in a very specific circumstance. In the test tests/nat/noNAT.test.ts:239
'agents in different namespaces can ping each other via seed node'
we are trying to connect to the same seed node from two agents. the sequence is
- seed1 starts
- seed2 starts
- agent1 starts
- agent2 fails to start.
The problem here is during the polykey.start() sequence in agent2 we are calling nodeConnectionManager.syncNodeGraph()
. this is making a agent-agent GRPC call. It seems that any client GRPC call to the seed1 causes a error ErrorGRPCClientCall: Generic call error - 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error
to be thrown.
This only happens within this test. trying the same procedure manually causes no problem. Replicating the circumstances using polykey agents and NodeConnectionManager
s to make the GRPC calls works fine as well.
The error implies that the the stream is being closed because of a protocol error during communication. it is hard to tell where this is coming from.
We need to work out why this problem is happening.
Some things to verify;
- If we see
WARN:ConnectionForward 127.0.0.1:53659:Client Error: Error: write EPIPE
is it because theclientSocket
is closed and thetlsSocket
is trying to write data to it and thus emitting anEPIPE
. - If
GRPCClient
is triggering theclientSocket
to be closed on theConnectionForward
- Why is
GRPCClient
closing, and what is causing the subsequent protocol errorErrorGRPCClientCall: Generic call error - 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error
.
Additional context
- Related Tests for NAT-Traversal and Hole-Punching #357
- Related comments
Tasks
- Explore and discuss the problem further.
- Fix up logger naming and handle connections to
0.0.0.0
to prevent accidental usage: GRPC call failure when multiple nodes connecting to 1 seed node. #369 (comment) - Refactor how
createConnection
works in NCM so it's not outputting that it's creating a connection multiple times, even though it fetches it from the cache GRPC call failure when multiple nodes connecting to 1 seed node. #369 (comment) - Change the
withConnF
error handling to use theResourceRelease
and pass the error in there, and reduce all the relevant error instances to a limited set of exceptions, preferably only those that relate to connection failures. Consider how this may be delegated by connectivity state changes though (let's avoid duplicate logic between connectivity state and exceptions there). GRPC call failure when multiple nodes connecting to 1 seed node. #369 (comment)