Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mute denied nodes #322
Mute denied nodes #322
Changes from 6 commits
f449dc6
df6bd74
6feafc7
d39f6b5
8ed38de
9b42bf4
3d6bdbc
5571cf8
41a6060
47f5df8
8d2441b
bbfe118
7bb4a41
f328b5f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I test this locally the nodes keep trying to reconnect every 5-ish seconds so there's plenty of churn here on this. Is there a way to disconnect the socket and trigger an error on the node side? I think (but not sure) that nodes keep trying on error but not forever, so if we could do that it might be a solution to the bandwidth consumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about adding
Muted
state toConnMultiplex
(might need a rename there) to then stop parsing messages inStreamHandler::handle
, but still keeping the connection open. There is no way to keep it open without reading messages into buffers in actix-web AFAIK, but not doing the deserialization should help a lot (not if we are limited on the bandwidth though).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying, instead of stopping the
NodeConnector
actor when muting a node, we'd change the state toConnMultiplex::Muted
? We'd still need theconn_id
/msg.id()
to be able to look it up in theself.multiplex
, but maybe you're thinking of using something non-serde
to look up that key and avoid deserializing the fullNodeMessage
?Tell me though, is the gain you're thinking about that it's cheaper to keep the node connected but not deserialize any of their messages than closing the actor and create a new one ever
n
seconds when they reconnect?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the following hack: I added a
muted: bool
to theNodeConnector
actor and instead of muting a node withctx.close(None); ctx.stop();
I didself.muted = true
and then changed theStreahHandler::handle
to check this boolean as the first thing.That would keep 1
NodeConnector
actor around for every node but not process any of their messages. It feels pretty dirty and I'm not convinced it's better to keep the node connected rather than recreating them, but maybe you're right and that this is cheaper.It feels to me like the right way to do this is for
telemetry
to close the connection with a proper reason and then change substrate to behave properly:telemetry
closes the connection with aCloseCode::Policy
(1008) (or possiblyCloseCode::Abnormal
(1006));substrate
does not try to reconnect.telemetry
closes the connection with aCloseCode::Again
(1013) and the node is free to try to reconnect later (maybe with exponential backoff if we want to get fancy).That seems like the only way we have to reduce both processing resources and bandwidth. Wdyt?