-
Notifications
You must be signed in to change notification settings - Fork 16
websocket/stream: Fix unexpected EOF on Poll::Pending
state poisoning
#327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…em allocs Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
dmitry-markin
approved these changes
Feb 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch with a poisoned state being left!
lexnv
added a commit
that referenced
this pull request
Feb 20, 2025
## [0.9.1] - 2025-01-19 This release enhances compatibility between litep2p and libp2p by backporting the latest Yamux updates. Additionally, it includes various improvements and fixes to boost the stability and performance of the WebSocket stream and the multistream-select protocol. ### Changed - yamux: Switch to upstream implementation while keeping the controller API ([#320](#320)) - req-resp: Replace SubstreamSet with FuturesStream ([#321](#321)) - cargo: Bring up to date multiple dependencies ([#324](#324)) - build(deps): bump hickory-proto from 0.24.1 to 0.24.3 ([#323](#323)) - build(deps): bump openssl from 0.10.66 to 0.10.70 ([#322](#322)) ### Fixed - websocket/stream: Fix unexpected EOF on `Poll::Pending` state poisoning ([#327](#327)) - websocket/stream: Avoid memory allocations on flushing ([#325](#325)) - multistream-select: Enforce `io::error` instead of empty protocols ([#318](#318)) - multistream: Do not wait for negotiation in poll_close ([#319](#319)) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
github-merge-queue bot
pushed a commit
to paritytech/polkadot-sdk
that referenced
this pull request
Feb 20, 2025
This PR updates litep2p to version 0.9.1. The yamux config is entirely removed to mirror the libp2p yamux upstream version. While at it, I had to bump indexmap and URL as well. ## [0.9.1] - 2025-01-19 This release enhances compatibility between litep2p and libp2p by using the latest Yamux upstream version. Additionally, it includes various improvements and fixes to boost the stability and performance of the WebSocket stream and the multistream-select protocol. ### Changed - yamux: Switch to upstream implementation while keeping the controller API ([#320](paritytech/litep2p#320)) - req-resp: Replace SubstreamSet with FuturesStream ([#321](paritytech/litep2p#321)) - cargo: Bring up to date multiple dependencies ([#324](paritytech/litep2p#324)) - build(deps): bump hickory-proto from 0.24.1 to 0.24.3 ([#323](paritytech/litep2p#323)) - build(deps): bump openssl from 0.10.66 to 0.10.70 ([#322](paritytech/litep2p#322)) ### Fixed - websocket/stream: Fix unexpected EOF on `Poll::Pending` state poisoning ([#327](paritytech/litep2p#327)) - websocket/stream: Avoid memory allocations on flushing ([#325](paritytech/litep2p#325)) - multistream-select: Enforce `io::error` instead of empty protocols ([#318](paritytech/litep2p#318)) - multistream: Do not wait for negotiation in poll_close ([#319](paritytech/litep2p#319)) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
github-actions bot
pushed a commit
to paritytech/polkadot-sdk
that referenced
this pull request
Feb 20, 2025
This PR updates litep2p to version 0.9.1. The yamux config is entirely removed to mirror the libp2p yamux upstream version. While at it, I had to bump indexmap and URL as well. ## [0.9.1] - 2025-01-19 This release enhances compatibility between litep2p and libp2p by using the latest Yamux upstream version. Additionally, it includes various improvements and fixes to boost the stability and performance of the WebSocket stream and the multistream-select protocol. ### Changed - yamux: Switch to upstream implementation while keeping the controller API ([#320](paritytech/litep2p#320)) - req-resp: Replace SubstreamSet with FuturesStream ([#321](paritytech/litep2p#321)) - cargo: Bring up to date multiple dependencies ([#324](paritytech/litep2p#324)) - build(deps): bump hickory-proto from 0.24.1 to 0.24.3 ([#323](paritytech/litep2p#323)) - build(deps): bump openssl from 0.10.66 to 0.10.70 ([#322](paritytech/litep2p#322)) ### Fixed - websocket/stream: Fix unexpected EOF on `Poll::Pending` state poisoning ([#327](paritytech/litep2p#327)) - websocket/stream: Avoid memory allocations on flushing ([#325](paritytech/litep2p#325)) - multistream-select: Enforce `io::error` instead of empty protocols ([#318](paritytech/litep2p#318)) - multistream: Do not wait for negotiation in poll_close ([#319](paritytech/litep2p#319)) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> (cherry picked from commit 42e9de7)
teor2345
pushed a commit
to autonomys/polkadot-sdk
that referenced
this pull request
Mar 24, 2025
…ech#7640) - conflicts resolved using --strategy-option=theirs This PR updates litep2p to version 0.9.1. The yamux config is entirely removed to mirror the libp2p yamux upstream version. While at it, I had to bump indexmap and URL as well. This release enhances compatibility between litep2p and libp2p by using the latest Yamux upstream version. Additionally, it includes various improvements and fixes to boost the stability and performance of the WebSocket stream and the multistream-select protocol. - yamux: Switch to upstream implementation while keeping the controller API ([paritytech#320](paritytech/litep2p#320)) - req-resp: Replace SubstreamSet with FuturesStream ([paritytech#321](paritytech/litep2p#321)) - cargo: Bring up to date multiple dependencies ([paritytech#324](paritytech/litep2p#324)) - build(deps): bump hickory-proto from 0.24.1 to 0.24.3 ([paritytech#323](paritytech/litep2p#323)) - build(deps): bump openssl from 0.10.66 to 0.10.70 ([paritytech#322](paritytech/litep2p#322)) - websocket/stream: Fix unexpected EOF on `Poll::Pending` state poisoning ([paritytech#327](paritytech/litep2p#327)) - websocket/stream: Avoid memory allocations on flushing ([paritytech#325](paritytech/litep2p#325)) - multistream-select: Enforce `io::error` instead of empty protocols ([paritytech#318](paritytech/litep2p#318)) - multistream: Do not wait for negotiation in poll_close ([paritytech#319](paritytech/litep2p#319)) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
lexnv
added a commit
that referenced
this pull request
May 22, 2025
This PR greatly improves the WebSocket connection stability by relying on the interval buffers of tungstenite instead of buffering at a higher level. The fix passes through the messages to the tungstenite socket directly. This is a long-lasting issue (reproducible on all older versions silently with IO errors) that manifested as a decryption error after the state fixes: - #325 - #327 Issue context: - node is under stress due to handling multiple substreams - the issue affected only long running WebSocket substreams and manifested as an IO error from crypto/noise decoding - tungstenite `WebSocketStream` already has a 128KiB buffer for writing - litep2p has a **redundant** 8 KiB buffer for writing - litep2p buffered internally multiple packets, tunstenite accepted the batch. I expect this creates a wrongly framed packet that fails to decode at the crypto/noise level ## Investigation We have noted several errors that manifested as crypto/nosie decoding failures on our Kusama validators: - paritytech/polkadot-sdk#8525 ```rust litep2p::crypto::noise: failed to decrypt message error=Decrypt ``` Upon further investigation, the errors affected only WebSocket connections. The issue could be reproduced by running a local node in Kusama with more than 500 peers in and out. As well as running subp2p-explorer with adjusted protocols: ```yaml 2025-05-15T14:58:08.095961Z ERROR {peer_id=peer_id=12D3KooWGsDvWrbApFTCpF8h7YCKHuvJbok6HAq5ZnPgE9LGWnsv}: litep2p::crypto::noise: failed to decrypt message for bigger buffers error=Decrypt peer=PeerId("12D3KooWSa5SbCHGKpNeSs3Qak2TrM5gTkEBrPfvo6TyxhUpEHeu") 2025-05-15T14:58:08.096419Z DEBUG {peer_id=peer_id=12D3KooWGsDvWrbApFTCpF8h7YCKHuvJbok6HAq5ZnPgE9LGWnsv}: litep2p::websocket::connection: connection closed with error peer=PeerId("12D3KooWSa5SbCHGKpNeSs3Qak2TrM5gTkEBrPfvo6TyxhUpEHeu") error=Decode(Io(Custom { kind: Other, error: "failed to decrypt message bigger buffers: decrypt error 12D3KooWSa5SbCHGKpNeSs3Qak2TrM5gTkEBrPfvo6TyxhUpEHeu" })) ``` The issue also reproduced on the zombinet PR, which uses litep2p: - paritytech/polkadot-sdk#8461 ```yaml 2025-05-14 09:37:30.805 INFO tokio-runtime-worker sync: Warp sync is complete, continuing with state sync. 2025-05-14 09:37:33.189 ERROR tokio-runtime-worker litep2p::crypto::noise: failed to decrypt message error=Decrypt 2025-05-14 09:37:33.283 ERROR tokio-runtime-worker litep2p::crypto::noise: failed to decrypt message error=Decrypt 2025-05-14 09:37:34.764 ERROR tokio-runtime-worker litep2p::crypto::noise: failed to decrypt message error=Decrypt 2025-05-14 09:37:35.656 INFO tokio-runtime-worker substrate: ⚙️ State sync, Downloading state, 22%, 2.21 Mib (0 peers), best: #0 (0xc5e7…d059), finalized #0 (0xc5e7…d059), ⬇ 707.8kiB/s ⬆ 0.5kiB/s 2025-05-14 09:37:40.657 INFO tokio-runtime-worker substrate: ⚙️ State sync, Downloading state, 22%, 2.21 Mib (3 peers), best: #0 (0xc5e7…d059), finalized #0 (0xc5e7…d059), ⬇ 1.0kiB/s ⬆ 1.0kiB/s ``` ## Testing Done ### Performance Tested the performance with litep2p-perf using the following branch: - https://github.com/lexnv/litep2p-perf/compare/lexnv/websocket-tests?expand=1 | Status | Data Size | Time (s) | Bandwidth (Mbit/s) | |------------|-----------|----------|-------------------| | **Before** | | | | | Uploaded | 256.00 MiB| 15.1152 | 135.49 | | Downloaded | 256.00 MiB| 13.2296 | 154.80 | | **After** | | | | | Uploaded | 256.00 MiB| 15.7178 | 130.30 | | Downloaded | 256.00 MiB| 13.2435 | 154.64 | From the performance table, we are within 3% of the original buggy implementation. I would lean towards a normal variation in our results. Therefore, the performance remains unimpacted. ### Repro Case Have added a custom user protocol as part of our testing to filter out these errors. - The protocol opens 16 outbound substreams on the connection established event. Therefore, it will handle 16 outbound substreams and 16 inbound substreams - The outbound substreams will push a configurable number of packets, each of size 128 bytes, to the remote peer. While the inbound substreams will read the same number of packets from the remote peer. Before this PR, the TCP was unaffected and the websocket reproduces the decrypt failure. After this PR, the test passes. Closes: paritytech/polkadot-sdk#8525 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the following dependencies:
The tokio tungstenite update changed the API of webscoket messages, which motivated the following changes:
poll_ready
orpoll_flush
returnedPoll::Pending
, the websocket state would forever remain poisoned.UnexpectedEof
on consecutive calls, which led to the connection being terminatedpoll_read
implementation is simplified as well to make the code more readable