fix: only subscribe to sampled subnets#8181
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## unstable #8181 +/- ##
=========================================
Coverage 54.22% 54.22%
=========================================
Files 843 843
Lines 63365 63396 +31
Branches 4795 4794 -1
=========================================
+ Hits 34361 34378 +17
- Misses 28928 28943 +15
+ Partials 76 75 -1 🚀 New features to boost your workflow:
|
Performance Report✔️ no performance regression detected Full benchmark results
|
|
with a regular node, we subscribed to only 8 columns: |
|
for each slot we only need to receive/validate 8 DataColumnSidecar verbose: Block processed slot=28648, root=0x4cfe3cc6ff33efc3684500ebf8e2e796cdfdc241079a7c3e62be21a9b2494de3, delaySec=0.7920000553131104 |
shouldn't a node without validators attached just have to custody 4 groups/columns? lodestar/packages/config/src/chainConfig/configs/mainnet.ts Lines 129 to 130 in aac4d9d |
with |
nflaig
left a comment
There was a problem hiding this comment.
also need to handle the case if are custody requirement increases
lodestar/packages/beacon-node/src/chain/chain.ts
Line 1312 in aac4d9d
| // sent peers per topic are logged in network.publishGossip(), here we only track metrics for it | ||
| // starting from fulu, we have to push to 128 subnets so need to make sure we have enough sent peers per topic | ||
| // + 1 because we publish to beacon_block first | ||
| if (sentPeersArr.length < dataColumnSidecars.length + 1) { |
There was a problem hiding this comment.
I wonder if this is too strict, technically if there are supernodes in the network it should be sufficient to just publish >=64 columns as they have to reconstruct and re-gossip the missing columns
There was a problem hiding this comment.
there is a tiny chance that we reach non-supernodes first before supernodes
also the recover_matrix is optional, a new client can join the network and don't implement that at all
I don't see the benefit of not sending to all columns, and block proposal is rarely happen
also as a block proposer, we want to make sure our block is not reorged because data is so slow to become available
There was a problem hiding this comment.
I don't see the benefit of not sending to all columns
we should ideally do that just wondering if throwing an error here if it's <128 is good, could just error if it's <64
There was a problem hiding this comment.
it should always be an array with at least 129 items, first one for beacon_block and remaining ones for DataColumnSidecars
it's an issue if I hard coded it as 129 (or 128 + 1), here I only compared with dataColumnSidecars.length + 1) which should be fine. Notice that for each DataColumnSidecar we send to a topic at line 265 above
it should never reach this line
There was a problem hiding this comment.
I think warn for less than 128 and error if less than 64
There was a problem hiding this comment.
I'm not sure if sentPeersArr is tracking what we think it will so wanted to bring it up just in case. There may be some peers we send multiple things to and there is no guarantee that the peer distribution (128 peer means all 128 were on different columns subnets) is accurately represented.
We may need to do something like
...dataColumnSidecars.map((dataColumnSidecar) => () => network.publishDataColumnSidecar(dataColumnSidecar).then(res => /* parse res for peer and track it relative to dataColumnSidecar.index */))
so that we know which columns were sent to which peers and can be sure that each column was actually published. Also, don't we get an error if there are no peers subscribed to a subnet that we publish to? It could be that we do not have a peer for a given subnet, right?
There was a problem hiding this comment.
don't we get an error if there are no peers subscribed to a subnet that we publish to? It could be that we do not have a peer for a given subnet, right?
Yeah, currently configured js-libp2p-gossipsub is to throw if there are no connected peers listening on a topic.
We should not rely on optional behavior here (reconstruction). Imo just kiss, error if we didn't publish to all.
Based on the current behavior, we can remove this conditional (which will never be executed anyways since the promiseAllMaybeAsync call will reject/throw when something is published w/o peers on that topic.)
There was a problem hiding this comment.
Based on the current behavior, we can remove this conditional (which will never be executed anyways since the promiseAllMaybeAsync call will reject/throw when something is published w/o peers on that topic.)
agree, removing will avoid all the confusion. But this shows a concern that if there are 1/2 topics with 0 peers it'll throw but the overall publish is still a success because supernode will rebuild all columns and publish for us
I think we should allow publishing to 0 peers in this case in order to track this sentPeersPerSubnet metric
| @@ -244,13 +244,15 @@ export function getCoreTopicsAtFork( | |||
| // After fulu also track data_column_sidecar_{index} | |||
| if (ForkSeq[fork] >= ForkSeq.fulu) { | |||
| // TODO: @matthewkeil check if this needs to be updated for custody groups | |||
| for (let i = 0; i < dataColumnSidecars.length; i++) { | ||
| // + 1 because we publish to beacon_block first | ||
| const sentPeers = sentPeersArr[i + 1] as number; | ||
| metrics?.dataColumns.sentPeersPerSubnet.observe(sentPeers); |
There was a problem hiding this comment.
should we track this metric before the error?
There was a problem hiding this comment.
the error should never happen as commented
There was a problem hiding this comment.
Does the send promise error if there is not a peer subscribed to a column topic?
There was a problem hiding this comment.
it depends on allowPublishToZeroTopicPeers option
in the last commit I set it to true so it'll not throw error and return 0 sent peer in that case
**Motivation** - followup to #8181 **Description** - If any columns were published to 0 peers, print a warning that the block may be reorged --------- Co-authored-by: Nico Flaig <nflaig@protonmail.com>
|
🎉 This PR is included in v1.34.0 🎉 |
Motivation
Description
Test result
tested successfully on a devnet node
Resolves Dynamically subscribe to data column subnets #8098