Fix wrong columns getting processed on a CGC change #7792

jimmygchen · 2025-07-25T05:08:06Z

Issue Addressed

This PR fixes a bug where wrong columns could get processed immediately after a CGC increase.

Scenario:

The node's CGC increased due to additional validators attached to it (lets say from 10 to 11)
The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See this). Data availability checker still only require 10 columns for the current epoch.
During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.

Proposed fix

In the handle_gossip function, we compute should_process for each data column that arrived, and pass it to the beacon processor - this column will get verified, propagated, but NOT processed / imported. I've move this logic to DataAvailabilityChecker for consistency with RPC and better separation of concerns.
This requires us to know the sampling columns for the current epoch, which is different to globals.sampling_columns (already updated to the new CGC).
To avoid having to maintain multiple column lists for current and next epoch, we compute the full ordered set of custody group count on startup, and this would allow us to quickly retrieve custody columns for both current and next epoch without having to recompute or maintain multiple lists.

The trade off here is we'd have to initialise the CustodyContext with the results from get_custody_groups after network is initialised, which isn't ideal but will allow us to simplify and clean things up a lot by keeping custody related logic in CustodyContext. With this approach, we could potentially remove network_globals.sampling_columns too - I haven't done it in this PR though, wanted to see how this approach look like and get some feedback first.

The other options considered:

Load node_id before BeaconChain is constructed, and have beacon chain also compute the custody columns. I think this leaks networking types into the BeaconChain, so IMO isn't great for separation of concerns and also introduces a circular dependency which will bite us later.
Having network handle all the custody column related logic and only pass relevant data to the beacon processor. This approach splits custody logic between network and beacon chain - i believe the low cohesivsion sacrifices readability and maintainability.

jimmygchen · 2025-07-25T05:16:43Z

I've labeled it as do-not-merge as this still requires test fixes and clean up on docs, but the idea is there and I'd like to get some early feedback.

mergify · 2025-07-25T05:44:10Z

Some required checks have failed. Could you please take a look @jimmygchen? 🙏

…l verify and propagate them.

…consistency with RPC and better separation of concerns

jimmygchen · 2025-07-25T08:17:31Z

beacon_node/client/src/builder.rs

+        .map_err(|e| format!("Failed to compute custody groups: {:?}", e))?;
+    chain
+        .data_availability_checker
+        .custody_context()


TODO: we can probably remove the clone in custody_context()

pawanjay176 · 2025-07-25T21:39:53Z

beacon_node/beacon_chain/src/validator_custody.rs

+        all_custody_groups_ordered: Vec<CustodyIndex>,
+        spec: &ChainSpec,
+    ) -> Result<(), String> {
+        let mut ordered_custody_groups = vec![];


I don't understand this.
We basically want to store an ordering of data columns for a given NodeId.
Why are we storing columns for every custody index here? Can't we just compute the ordering for the node id and take a slice [..cgc] for the required cgc value?

pawanjay176

I like the direction of this, its less invasive than I thought originally

jimmygchen requested a review from jxs as a code owner July 25, 2025 05:08

jimmygchen added bug Something isn't working ready-for-review The code is ready for review das Data Availability Sampling do-not-merge labels Jul 25, 2025

jimmygchen requested a review from pawanjay176 July 25, 2025 05:41

mergify bot added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Jul 25, 2025

Don't process extra custody groups that aren't effective yet but stil…

fb08ab2

…l verify and propagate them.

jimmygchen force-pushed the fix-extra-columns-processed branch from ab0e887 to fb08ab2 Compare July 25, 2025 07:40

Move custody column filtering logic to DataAvailabilityChecker for …

f8c5d0c

…consistency with RPC and better separation of concerns

jimmygchen commented Jul 25, 2025

View reviewed changes

pawanjay176 reviewed Jul 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix wrong columns getting processed on a CGC change #7792

Fix wrong columns getting processed on a CGC change #7792

Uh oh!

jimmygchen commented Jul 25, 2025 •

edited

Loading

Uh oh!

jimmygchen commented Jul 25, 2025

Uh oh!

mergify bot commented Jul 25, 2025

Uh oh!

jimmygchen Jul 25, 2025

Uh oh!

pawanjay176 Jul 25, 2025

Uh oh!

pawanjay176 left a comment

Uh oh!

Uh oh!

Fix wrong columns getting processed on a CGC change #7792

Are you sure you want to change the base?

Fix wrong columns getting processed on a CGC change #7792

Uh oh!

Conversation

jimmygchen commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed fix

Uh oh!

jimmygchen commented Jul 25, 2025

Uh oh!

mergify bot commented Jul 25, 2025

Uh oh!

jimmygchen Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

pawanjay176 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

pawanjay176 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimmygchen commented Jul 25, 2025 •

edited

Loading