Skip to content

Conversation

@michael-yxchen
Copy link
Contributor

A full node may join new RaptorCast groups from the same validator while it's syncing or stuck. In such cases, it may rebroadcast to an expired group because its local round hasn't advanced, and expired groups are not garbage-collected. As a result, many full nodes end up receiving unsolicited Raptor chunks from groups they no longer belong to.

The root cause is that the node selects the rebroadcast group based on its local round. This PR resolves the issue by overloading the epoch field in the Raptor chunk header setting it to the round number for secondary raptorcast. This allows syncing or stuck full nodes to use the round number to identify and service the correct group

@michael-yxchen michael-yxchen force-pushed the michael/raptorcast_groupid branch 2 times, most recently from 11c8ca3 to 18f89d4 Compare October 20, 2025 18:43
@michael-yxchen michael-yxchen marked this pull request as ready for review October 20, 2025 19:11
xinyuan-dev
xinyuan-dev previously approved these changes Oct 21, 2025
@michael-yxchen michael-yxchen force-pushed the michael/chunk-validate branch 2 times, most recently from 17fa315 to 1166d82 Compare October 22, 2025 20:06
@michael-yxchen michael-yxchen marked this pull request as draft October 23, 2025 02:08
Base automatically changed from michael/chunk-validate to master October 23, 2025 15:17
@michael-yxchen michael-yxchen dismissed xinyuan-dev’s stale review October 23, 2025 15:17

The base branch was changed.

@michael-yxchen michael-yxchen force-pushed the michael/raptorcast_groupid branch 3 times, most recently from 27f1a77 to 777d786 Compare October 23, 2025 19:54
@michael-yxchen michael-yxchen marked this pull request as ready for review October 23, 2025 23:15
Copilot AI review requested due to automatic review settings October 23, 2025 23:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses an issue where full nodes receive unsolicited Raptor chunks from expired groups during network synchronization. The solution overloads the epoch field in Raptor chunk headers to carry round numbers for secondary RaptorCast (validator-to-fullnode), enabling nodes to correctly identify and service groups based on actual round numbers rather than their potentially stale local state.

Key Changes:

  • Introduced GroupId enum to distinguish between Primary (epoch-based) and Secondary (round-based) RaptorCast groups
  • Replaced epoch-only tracking with group-specific identification in chunk headers and validation logic
  • Updated group lookup and validation to use round numbers for secondary RaptorCast
  • Removed epoch-based pruning from decoder cache in favor of timestamp-based expiration

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
monad-types/src/lib.rs Removed unused From<Epoch> trait implementation
monad-state/src/consensus.rs Added round field to PublishToFullNodes command
monad-router-multi/src/lib.rs Updated router to propagate round field in full node publishing
monad-raptorcast/src/udp.rs Introduced GroupId enum and updated message parsing/validation
monad-raptorcast/src/util.rs Changed fullnode group storage from heap to BTreeMap keyed by round
monad-raptorcast/src/packet/*.rs Updated packet building to use GroupId instead of raw epoch
monad-raptorcast/src/decoding.rs Removed epoch-based cache pruning logic
monad-raptorcast/src/raptorcast_secondary/mod.rs Updated secondary RaptorCast to use round-based group IDs
monad-executor-glue/src/lib.rs Added round field to PublishToFullNodes enum variant
monad-consensus-state/src/command.rs Extended consensus commands to include round numbers
test files Updated test cases to use new GroupId type

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

A full node may join new RaptorCast groups from the same validator while
it's syncing or stuck. In such cases, it may rebroadcast to an expired
group because its local round hasn't advanced, and expired groups are
not garbage-collected. As a result, many full nodes end up receiving
unsolicited Raptor chunks from groups they no longer belong to.

The root cause is that the node selects the rebroadcast group based on
its local round. This PR resolves the issue by overloading the epoch
field in the Raptor chunk header setting it to the round number for
secondary raptorcast. This allows syncing or stuck full nodes to use
the round number to identify and service the correct group
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants