Skip to content

Conversation

@tklenze
Copy link

@tklenze tklenze commented Aug 22, 2025

Round up the number of chunks assigned to each validator in the first phase of validator-to-validator RaptorCast.
This change is required to guarantee Byzantine Fault Tolerance. It implements the publicly documented behavior of RaptorCast, ensuring documentation and implementation are aligned.

This feature only affects the leader when sending out a block proposal using RaptorCast. The leader operates as follows:

  • First, we compute the number of packets (num_packets) without rounding up. This is essentially the payload size divided by the payload per chunk times the redundancy factor.
  • For each validator, we determine their obligation: the number of chunks they should be assigned, calculated as num_packets * stake / total_stake. This value is a float.
  • Each validator receives a number of chunks equal to their obligation, rounded up. We distinguish between initial chunks (the obligation rounded down, denoted ic[i]) and a rounding chunk (each validator receives exactly one). While this distinction is not currently important, it may become relevant in the future.
  • The new total number of packets is the sum of all ic[i] plus the number of validators n (since each receives one rounding chunk).

As a safeguard, we check that the rounded-up total rounded_up_num_packets falls between the original num_packets and num_packets + n. This should always be true; if not, we issue a warning. In case num_packets is larger, we use that value instead.

Other minor features

  • The validator stake map is shuffled deterministically for chunk assignment using a seed. This does not yet affect chunk assignment, but will likely be important in future updates.

Performance impact

The solution of rounding up requires sending up to n more chunks than necessary -- each one of which will get re-broadcasted by a validator. Each validator sends and receives ~n more chunks than required. For small messages, the overhead is very large. As such, it is only a temporary solution; a more permanent solution is planned.

We simulated the overhead created by rounding up. The following graph shows the additional bandwidth consumed at validators due to rounding up, for various values of num_packets and numbers of validators n. 4740 chunks represents a full block, whereas 474 chunks present a block that is only 10% full. (Note that the graph says "current RaptorCast", but this is actually the solution proposed in this PR.)

For the leader's block proposal message, the additional overhead is likely not prohibitive for small-medium network sizes (e.g., n = 200).
image

Required actions before merge

  • Before deploying this feature, we must ensure that this only affects the leader's block proposal message, and not any other messages, in particular, if they are sent more frequently than block proposals.
  • Testing could be extended. Manual testing has been done locally on monad-testground, with 5-8 validators, and various payload sizes (yielding 6-100 packets). These tests were all successful, i.e., I confirmed manually that every validator received the expected number of packets. No testing has been done outside of these parameters yet.
  • Need to think about whether this should affect the encoded_symbol_capacity in the decoder. Right now, this capacity is calculated as max(app_message_len / symbol_len, SOURCE_SYMBOLS_MIN) * MAX_REDUNDANCY. Due to rounding up, for small messages, more chunks than that could be created, and they would not be added to the decoder state. I think this might not be an issue, because (i) those chunks are not required for decoding and (ii) we still broadcast those chunks if they are assigned to us (self_hash == parsed_message.recipient_hash), even if we don't end up storing them.

Closes https://github.com/category-labs/category-internal/issues/898 (but a follow-up issue is needed, to address the inefficiency of rounding up).

@tklenze tklenze force-pushed the tobias/roundupchunks branch 3 times, most recently from 759da1c to 7f1468c Compare August 23, 2025 09:17
@omegablitz omegablitz self-assigned this Aug 25, 2025
@tklenze tklenze force-pushed the tobias/roundupchunks branch 6 times, most recently from 1257918 to 525394e Compare August 25, 2025 13:07
Round up the number of chunks assigned to each validator in the
first phase of RaptorCast. This is required to guarantee
Byzantine Fault Tolerance.
@tklenze tklenze force-pushed the tobias/roundupchunks branch from 525394e to bf6268f Compare August 25, 2025 14:11
@tklenze tklenze marked this pull request as ready for review August 27, 2025 09:32
Copilot AI review requested due to automatic review settings August 27, 2025 09:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the round-up functionality for RaptorCast chunk assignments to guarantee Byzantine Fault Tolerance. It ensures that each validator receives at least one chunk by computing stake-weighted obligations and rounding up the assignments.

Key changes:

  • Added chunk assignment logic that computes initial chunks per validator based on stake weight
  • Introduced rounding up mechanism where each validator gets their obligation (rounded down) plus one additional "rounding chunk"
  • Modified the packet computation to account for the rounded-up total

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
monad-raptorcast/src/udp.rs Core implementation of chunk assignment algorithm and packet computation with round-up logic
monad-raptorcast/src/util.rs Updated Group struct to handle validator stake maps and added iterator functionality
monad-raptorcast/tests/raptorcast_instance.rs Updated test to use specific redundancy value instead of MAX_REDUNDANCY constant
monad-raptorcast/Cargo.toml Added alloy-primitives dependency for U256 arithmetic
Comments suppressed due to low confidence (2)

monad-raptorcast/src/udp.rs:1

  • This comment contains unclear notation '//TK: ???' which appears to be a TODO or question marker left by a developer. This should either be clarified or removed.
// Copyright (C) 2025 Category Labs, Inc.

monad-raptorcast/src/udp.rs:1

  • Typo in comment: 'valdiator' should be 'validator'.
// Copyright (C) 2025 Category Labs, Inc.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

// The obligations are scaled by the basis points (bps) to avoid floating point arithmetic on
// the large values of stake and total_weight. Only at the very end we convert to a float.
// In the future, we might want to avoid floats alltogether.
let unit_bias: f64 = 10_000_000.0;
Copy link

Copilot AI Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 10_000_000.0 should be defined as a named constant with documentation explaining its purpose in avoiding floating point precision issues.

Suggested change
let unit_bias: f64 = 10_000_000.0;
let unit_bias: f64 = STAKE_UNIT_BIAS;

Copilot uses AI. Check for mistakes.
@babak1369
Copy link

If we have 100 equally staked validators and 100 chunks. and say redundancy 3, does it set total n_chunks to 400 chunks ?
Probably not a big issue , but I would add a check : if ic[i] is integer then we don't need to add +1.

*addr,
start_idx * segment_size as usize..end_idx * segment_size as usize,
));
trace_string.push_str(&format!("{:?}: {} chunks, ", node_id, num_chunks));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably do not want to leave have the tracing code run conditionlessly. how about put a trace statement here instead?

// The obligations are scaled by the basis points (bps) to avoid floating point arithmetic on
// the large values of stake and total_weight. Only at the very end we convert to a float.
// In the future, we might want to avoid floats alltogether.
let unit_bias: f64 = 10_000_000.0;
Copy link
Contributor

@xinyuan-dev xinyuan-dev Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this bias necessary? stake / total_stake is a fractional number between 0 and 1. In IEEE floating point representation, there is roughly same number of representable real numbers between 0 and 1, versus between 1 and infinity. (source) so there should be enough precision if stake / total_stake is precise enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants