Skip to content

GRPC call failure when multiple nodes connecting to 1 seed node. #369

@tegefaulkes

Description

@tegefaulkes

Specification

There seems to be a problem with the GRPC client calls in a very specific circumstance. In the test tests/nat/noNAT.test.ts:239 'agents in different namespaces can ping each other via seed node' we are trying to connect to the same seed node from two agents. the sequence is

  1. seed1 starts
  2. seed2 starts
  3. agent1 starts
  4. agent2 fails to start.

The problem here is during the polykey.start() sequence in agent2 we are calling nodeConnectionManager.syncNodeGraph(). this is making a agent-agent GRPC call. It seems that any client GRPC call to the seed1 causes a error ErrorGRPCClientCall: Generic call error - 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error to be thrown.

This only happens within this test. trying the same procedure manually causes no problem. Replicating the circumstances using polykey agents and NodeConnectionManagers to make the GRPC calls works fine as well.

The error implies that the the stream is being closed because of a protocol error during communication. it is hard to tell where this is coming from.

We need to work out why this problem is happening.

Some things to verify;

  1. If we see WARN:ConnectionForward 127.0.0.1:53659:Client Error: Error: write EPIPE is it because the clientSocket is closed and the tlsSocket is trying to write data to it and thus emitting an EPIPE.
  2. If GRPCClient is triggering the clientSocket to be closed on the ConnectionForward
  3. Why is GRPCClient closing, and what is causing the subsequent protocol error ErrorGRPCClientCall: Generic call error - 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error.

Additional context

Tasks

  1. Explore and discuss the problem further.
  2. Fix up logger naming and handle connections to 0.0.0.0 to prevent accidental usage: GRPC call failure when multiple nodes connecting to 1 seed node. #369 (comment)
  3. Refactor how createConnection works in NCM so it's not outputting that it's creating a connection multiple times, even though it fetches it from the cache GRPC call failure when multiple nodes connecting to 1 seed node. #369 (comment)
  4. Change the withConnF error handling to use the ResourceRelease and pass the error in there, and reduce all the relevant error instances to a limited set of exceptions, preferably only those that relate to connection failures. Consider how this may be delegated by connectivity state changes though (let's avoid duplicate logic between connectivity state and exceptions there). GRPC call failure when multiple nodes connecting to 1 seed node. #369 (comment)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions