Skip to content

Conversation

@janos
Copy link
Member

@janos janos commented Dec 1, 2025

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

It is observed that strange stream resets were happening on testnet clusters:

"time"="2025-11-25 12:02:15.157880" "level"="debug" "logger"="node/retrieval" "msg"="failed to get chunk" "chunk_address"="11a5d1877aabf2136ec1e8caff88c58be3a2a62ccebaea335cd92581bec66435" "peer_address"="34f289a8f9ac96d725dd3753dce1bbe828dd14c90a5638277c0e444a58064918" "peer_proximity"=2 "error"="read delivery: stream reset (remote): code: 0x0: transport error: stream reset by remote, error code: 0 peer 34f289a8f9ac96d725dd3753dce1bbe828dd14c90a5638277c0e444a58064918"

This PR solved the issue that cause such errors. It is required to proprely close the channel in case that the final message with the internal error is sent.

Regression tests are made that use the real libp2p server in order to reproduce this problem. They may be a bit too much integrational, but they are needed for validating that the issue and the fix. If the FullClose is removed, the tests are failing with the same log messages as on real infrastructure.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants