websocket: Fix connection stability on decrypt messages #393
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR greatly improves the WebSocket connection stability by relying on the interval buffers of tungstenite instead of buffering at a higher level. The fix passes through the messages to the tungstenite socket directly.
This is a long-lasting issue (reproducible on all older versions silently with IO errors) that manifested as a decryption error after the state fixes:
Poll::Pending
state poisoning #327Issue context:
WebSocketStream
already has a 128KiB buffer for writingInvestigation
We have noted several errors that manifested as crypto/nosie decoding failures on our Kusama validators:
Upon further investigation, the errors affected only WebSocket connections. The issue could be reproduced by running a local node in Kusama with more than 500 peers in and out. As well as running subp2p-explorer with adjusted protocols:
The issue also reproduced on the zombinet PR, which uses litep2p:
Testing Done
Performance
Tested the performance with litep2p-perf using the following branch:
From the performance table, we are within 3% of the original buggy implementation. I would lean towards a normal variation in our results. Therefore, the performance remains unimpacted.
Repro Case
Have added a custom user protocol as part of our testing to filter out these errors.
Before this PR, the TCP was unaffected and the websocket reproduces the decrypt failure. After this PR, the test passes.
Closes: paritytech/polkadot-sdk#8525