Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attnet revamp: Subnet backbone structure based on beacon nodes #3312

Merged
merged 7 commits into from
May 10, 2023

Conversation

AgeManning
Copy link
Contributor

@AgeManning AgeManning commented Mar 30, 2023

This is in a "LAST CALL" state., with the intention of merging on May 4 if no further issues emerge.

Note, this will be merged without the downscoring option to ensure that it can be rolled out iteratively across clients. Options to downscore if not on appropriate peer-id attnets will be added at a subsequent hard fork

Overview

This PR is for #2749

The issue discussion highlights the main motivations for this change. The fundamental premise is that there currently is no way to enforce validators to subscribe to long-lived subnets, we may be oversubscribed to subnets and this results in significant bandwidth use.

The proposed solution is to get beacon nodes to subscribe to long-lived subnets based on their node-id (can think peer-id). Some benefits of this are:

  • Enforceable (a peer-id should now be subscribed to a pre-determined subnet, if not we can penalize)
  • Tune bandwidth use at a network level by the SUBNETS_PER_NODE parameter
  • Node operators running more validators are no longer directly penalized (in bandwidth use) for running more validators. A node operator running >=64 validators does not have to process every single attestation (defeating the purpose of the subnets). Equivalently, we distribute load to all beacon nodes. Everyone on the network participates a little bit by subscribing to a subnet. This would also mean the desire to connect to any individual node is higher and there is not a fight to connect to nodes that subscribe to the most subnets. I imagine this would improve peer connectivity.
  • We would no longer need the attnet field in the ENR (improving validator privacy) and in the future potentially remove the metadata RPC method if a similar approach can be found for the sync committee subnets.
  • Discovery searches can be optimized. As a node-id's initial prefix is fixed to a specific subnet. Discovery queries can target all nodes on a subnet via finding closest nodes to a known subnet prefix (i.e, nodes are no longer scattered randomly in the DHT, they are now segregated into groups based on the number of subnets).
  • Improve peer management as new nodes connecting can be assessed for subnet subscription before establishing a full connection (based on their peer-id).
  • Is fully backwards compatible during a transition period - A mixture of nodes/clients can implement this without penalty (keeping the attnet field for predetermined subnets). After a hard-fork we can make it a strict requirement, then remove the attnet field etc.
  • The update can be gradual. As clients adopt this and as users update their clients we can observe the effect on the network. If adverse effects start to occur we can adjust SUBNETS_PER_NODE to counter issues.

Some downsides are:

  • Beacon nodes that currently have no validators attached will have a slightly increased bandwidth usage as they must now subscribe to SUBNETS_PER_NODE subnets.
  • It isn't entirely certain how the backbone subnet structure looks for all networks, we likely reduce the number of nodes subscribed to long-lived subnets in total. It's uncertain how this plays out on mainnet. If misconfigured, the network could start seeing participation drops due to attestations being missed due to lack of finding peers subscribed to required subnets.
  • The new approach specifies an exact epoch on which node-id shuffle to a new subnet. This can create a large churn of nodes and this transition period could cause turbulence as nodes rotate subnet subscriptions.
  • Currently, a node can get away with connecting to a small number of peers (i.e those that are subscribed to the most subnets) to get a full subnet coverage. After this change, nodes will need a decent peer count to get a full coverage of subnets to avoid discovery queries.
  • It is known in advance which node-ids will be subscribing to which subnets, which could lead to some additional forms of DOS vectors.
  • There could be sybil attacks by setting node-ids to specific values to hijack subnets (however this is not a new vector as it can already be achieved in the current implementation by setting ENR values).

Useful statistics

I ran though the DHT to get an idea of subnet peer density and what it would look like after this change.

I measured 4.2k nodes on mainnet.

  • 45% (1915) beacon nodes are not subscribed to any long-lived subnets (i.e they claim to have no attached validators)
  • 7% (316) beacon nodes were subscribed to 64 or more subnets (claimed more than 64 attached validators, or are running --subscribe-all-subnets).

There are currently around 500 nodes per subnet. That is, if you search for a specific subnet, there should be around 500 distinct nodes to connect to (many nodes can subscribe to more than 1 subnet)

This change (with SUBNETS_PER_NODE ==2 ) would reduce the density just under 200 nodes per subnet.

This would increase the total number of nodes subscribed to subnets from 2.3k to the full 4.2k (as all nodes now need to subscribe) and I'd expect it to be easier to find connections to peers on subnets.

If we were to go with SUBNETS_PER_NODE ==1 there would be just under 100 nodes per subnet.

I've opted for 2 just to be conservative.

Additional Notes

I've currently kept ATTESTATION_SUBNET_EXTRA_BITS at 0 to avoid complexity in the initial stages to see how discovery functions and debugging (but we can set this to higher now or later).

I've left most of the current logic around the attnets field as is, as this should be kept for the transitional period. I've not touched the sync subnets.

Personally, I'm in favour of this and am keen to push this along. However as always am open to feedback/suggestions and everyone's thoughts on whether we should actually do this.

specs/phase0/validator.md Outdated Show resolved Hide resolved
specs/phase0/validator.md Outdated Show resolved Hide resolved
@Menduist
Copy link

After this change, nodes will need a decent peer count to get a full coverage of subnets to avoid discovery queries.

Just to give some numbers, since I was wondering about that:
The minimum peer count to be a validator would be:

  • dLow in every subnet: 64 * 6 / 2 = 192
  • dHigh in every subscribed subnet: 2 * 12 / 2 = 12
    = 204 peers

That's assuming you want dLow in every subnet (to be able to publish). In nimbus we seek dHigh peers to be safe, so that would be even more.
(non-validators don't need to worry about dLow in every subnet, so they would only need 12 peers)

And if you want to subscribe-all-subnets, that's 64 * 12 / 2 = 384 peers
(handwaving birthday paradox everywhere)

@AgeManning
Copy link
Contributor Author

AgeManning commented Apr 3, 2023

Just to give some numbers, since I was wondering about that: The minimum peer count to be a validator would be:

* dLow in every subnet: `64 * 6 / 2` = 192

* dHigh in every subscribed subnet: `2 * 12 / 2` = 12
  = 204 peers

That's assuming you want dLow in every subnet (to be able to publish). In nimbus we seek dHigh peers to be safe, so that would be even more.

Yep, good point and important to highlight.

If we want to maintain these gossip parameters without discovery, we're probably looking at these numbers. I'm of the opinion that our D_lo and D_high numbers are too high, so in Lighthouse, I'd be targeting D_lo. Also we'd gain extra subscriptions from validators aggregating so will bring the average number of peers subscribed to a subnet up slightly.

I imagine we probably want to keep attnets in the ENR for a while for those peers that still want to --subscribe-all-subnets which should also help.

I agree this change will require higher peer counts than we had before as the network will be less centralized around beacon nodes with lots of validators. It might make sense to tune gossipsub if we are seeing issues around this (perhaps increasing the heartbeat etc).

I was hoping that we could keep lower stable peers on subnets, i.e around 3 or 4 and then use discovery a little more as we can target specific subnet peers to find if needed.

We could also set SUBNETS_PER_NODE to be 3 if it makes everyone more comfortable, which would also drop the peer count if targeting a specific number of peers per subnet.

@ajsutton
Copy link
Contributor

ajsutton commented Apr 3, 2023

I was hoping that we could keep lower stable peers on subnets, i.e around 3 or 4 and then use discovery a little more as we can target specific subnet peers to find if needed.

One thing to be wary of with relying on discovery is that there's a ridiculous number of nodes that don't accept inbound connections and (partly because of that) nodes with open ports often are at their peer limit so can be hard to get a stable connection to. While this change will make it easier to search the DHT for peers that should be on the subnet you need, it won't make it easier to actually connect to them so discovery still may not be a very effective option.

You probably have better data on this than I do, but figured I'd mention it.

@Menduist
Copy link

Menduist commented Apr 3, 2023

I was hoping that we could keep lower stable peers on subnets, i.e around 3 or 4 and then use discovery a little more as we can target specific subnet peers to find if needed.

if needed means when needing to publish an attestation / aggregate? Seems like this would add a lot of complexity / fragility / churn, so would have to be carefully thought out (+1 to @ajsutton point, I don't think the bottleneck is discovery currently, but managing to establish the actual connections)

On the other hand, the less peer you publish to, the more you are susceptible to sybil "black holes" attacks

Personally I would err on the side of more connected peers, a "parked" peer shouldn't cost much resources anyway

@AgeManning
Copy link
Contributor Author

AgeManning commented Apr 3, 2023

Yeah I agree with everything here.

I have no data, but I'm hoping we can improve the connection issue by increasing peer counts and also removing the desire for everyone to connect to a smallish set of nodes that have a lot of subnets.

My thinking was to maintain some low set of peers (3/4) per subnet and potentially also do discovery requests to target 6 or something as an optional extra (if we are not at the target). But it probably is smarter to just have a larger set of "parked" peers and reduce any extra resources in maintaining them.

In the end, I think we'll be fine tuning this if we agree this is a good approach and each client can figure out what works best for them.

edit: To add to the connection issue, there's some NAT hole-punching work being added to discovery and I'm hoping to translate that into our TCP/libp2p stack which should also help with connectivity, i'd imagine

@nisdas
Copy link
Contributor

nisdas commented Apr 4, 2023

Is fully backwards compatible during a transition period - A mixture of nodes/clients can implement this without penalty (keeping the attnet field for predetermined subnets). After a hard-fork we can make it a strict requirement, then remove the attnet field etc.

Just to confirm, nodes running with the new backbone structure would be inserting the predetermined subnets into the current attnets field in their enr , correct ?

The update can be gradual. As clients adopt this and as users update their clients we can observe the effect on the network. If adverse effects start to occur we can adjust SUBNETS_PER_NODE to counter issues.

If we do simply start bumping SUBNETS_PER_NODE won't it just be a permanent increase in bandwidth across the network ? The main worry would be that this change breaks something in the network and we have no easy way to reverse this without either disabling this whole change entirely in a new release or permanently bumping up the bandwidth.

The new approach specifies an exact epoch on which node-id shuffle to a new subnet. This can create a large churn of nodes and this transition period could cause turbulence as nodes rotate subnet subscriptions.

This seems vulnerable to possible participation drops across the network in that epoch. If you do have all validators on long-lived subnets suddenly changing to a new one at the same time, it becomes a lot more chaotic for that particular node to find peers for that subnet simply for the fact that every node in the network is doing it at the same time. You could see large spikes in inbound connection attempts, and peers getting kicked off due to the fact that a node is 'full' . Is it possible to stagger these updates in the shuffling for subnets via the node_id ? If every node in the network is grafting new peers into its subnet mesh at the same time, I could see some wierd things happening on the gossip layer.

@AgeManning
Copy link
Contributor Author

Just to confirm, nodes running with the new backbone structure would be inserting the predetermined subnets into the current attnets field in their enr , correct ?

Correct. Just as they do now.

If we do simply start bumping SUBNETS_PER_NODE won't it just be a permanent increase in bandwidth across the network ? The main worry would be that this change breaks something in the network and we have no easy way to reverse this without either disabling this whole change entirely in a new release or permanently bumping up the bandwidth.

I'd imagine we'd see a drop in bandwidth as there would be less nodes per subnet than there currently are. There is a chance that we make this too low and it becomes hard to find and maintain peers on subnets. This can be corrected by either increasing SUBNETS_PER_NODE or simply subscribing based on validator counts as before (to return to past logic). I imagine we as developers who also run quite a few beacon nodes can support the network immediately in these times simply by running beacon nodes with --subscribe-all-subnets.

This seems vulnerable to possible participation drops across the network in that epoch. If you do have all validators on long-lived subnets suddenly changing to a new one at the same time, it becomes a lot more chaotic for that particular node to find peers for that subnet simply for the fact that every node in the network is doing it at the same time. You could see large spikes in inbound connection attempts, and peers getting kicked off due to the fact that a node is 'full' . Is it possible to stagger these updates in the shuffling for subnets via the node_id ? If every node in the network is grafting new peers into its subnet mesh at the same time, I could see some wierd things happening on the gossip layer.

I couldn't find an easy way to stagger the transition, whilst maintaining a uniform distribution and have it be general of subnet count. In this PR I suggest that nodes don't transition on the exact epoch. Instead they subscribe earlier and unsubscribe later, so there is a period of overlap. The exact epoch where the logic switches, is just about the enforceability property. You have to be subscribed at that epoch, but you can and should be subscribed earlier.
In terms of clients, I'm hoping there isn't too much turbulence here. If you have a stable set of say 3/4 peers per subnet the transition is just a translation (the peers will all just rotate subnets, it might get trickier if we add ATTESTATION_SUBNET_EXTRA_BITS). As the peers should all just rotate subnets, in the end you should still have the 3/4 peers per subnet, they're just different peers now. You shouldn't have to search for a whole new set of peers and form a bunch of new connections (if that is the concern).
In terms of grafts, yeah there could be some turbulence if everyone did it at one instant. Which is why this PR suggests subscribing early at some random interval.

@mcdee
Copy link
Contributor

mcdee commented Apr 6, 2023

I couldn't find an easy way to stagger the transition, whilst maintaining a uniform distribution and have it be general of subnet count.

I'm curious as to the issue with the proposal I made a few days ago in this respect. With the current numbers 1/256 of the nodes would move subnet each epoch, would this skew the distribution compared to them all moving at the same time?

@dapplion
Copy link
Member

dapplion commented Apr 7, 2023

Avoiding the sudden rotation issue with subscribing to next subnet a bit early and unsubscribing from prev subnet a bit latter is nice solution. AFAIK most clients already implement something like this around forks when they rotate all topics to the next fork.

@AgeManning
Copy link
Contributor Author

AgeManning commented Apr 11, 2023

I couldn't find an easy way to stagger the transition, whilst maintaining a uniform distribution and have it be general of subnet count.

I'm curious as to the issue with the proposal I made a few days ago in this respect. With the current numbers 1/256 of the nodes would move subnet each epoch, would this skew the distribution compared to them all moving at the same time?

Yeah this could work, I think.

The absolute worst case would be that 1/256 of the nodes (25% of a subnet) all belonged to the same subnet. Then each epoch 25% of a subnet could transition to another epoch. Although it is unlikely (assuming node-ids are uniformly distributed), it is possible that an entire subnet transitions to another subnet and we have no nodes on a given subnet. The odds of this are very low, perhaps its worth calculating them.

The other change this will make will be that our peers will be constantly shifting long-lived subnets every epoch. Because this is deterministic, it would be manageable from a client perspective as everyone would know in advance which peer goes to which subnet, but it could make peer management a bit trickier.

Other than that, it seems like a pretty good approach to get nodes to transition throughout the epochs.

@nisdas
Copy link
Contributor

nisdas commented Apr 11, 2023

I couldn't find an easy way to stagger the transition, whilst maintaining a uniform distribution and have it be general of subnet count. In this PR I suggest that nodes don't transition on the exact epoch. Instead they subscribe earlier and unsubscribe later, so there is a period of overlap. The exact epoch where the logic switches, is just about the enforceability property. You have to be subscribed at that epoch, but you can and should be subscribed earlier.
In terms of clients, I'm hoping there isn't too much turbulence here. If you have a stable set of say 3/4 peers per subnet the transition is just a translation (the peers will all just rotate subnets, it might get trickier if we add ATTESTATION_SUBNET_EXTRA_BITS). As the peers should all just rotate subnets, in the end you should still have the 3/4 peers per subnet, they're just different peers now. You shouldn't have to search for a whole new set of peers and form a bunch of new connections (if that is the concern).

That makes sense, in that case I would have less of a concern as it would just be a matter of doing a lookahead to the transition epoch and subscribing beforehand. We do the same exact thing for sync committee subnet subscriptions, so it shouldn't be an issue as long as there is a sufficient amount of lookahead for individual nodes. This has the same effect as a staggered transition, so the proposal looks good for this case.

specs/phase0/validator.md Outdated Show resolved Hide resolved
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I'm really excited to get this out.

The biggest downside I see right now is the suggestion of 2 rather than 1 for those stakers currently with 1 validator. I think 2 is the proper number for now, a conservative choice for this critical component. just need to get our node count up...

def compute_subnet(node_id: int, epoch: Epoch, index: int) -> int:
node_id_prefix = node_id >> (256 - ATTESTATION_SUBNET_PREFIX_BITS)
permutation_seed = hash(uint_to_bytes(epoch // EPOCHS_PER_SUBNET_SUBSCRIPTION))
permutated_prefix = compute_shuffled_index(node_id_prefix, 1 << ATTESTATION_SUBNET_PREFIX_BITS, permutation_seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth putting an historic randao mix (that can be retrieved from state) in here?

It would define how long before a node_id can be grinded to do some attack.

To not run into issues with potential trailing finality windows, you'd probably need this to be on the order of a month or more. which then degrades the utility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to add this complexity in if we feel it warrants it. We had a small discussion on this in the issue.

My current thinking is that if someone wants to attack a subnet in this manner now, its pretty easy to do without having to generate node-ids. You could just make a bunch of nodes and set your ENR field to a subnet (I realize that the argument that the vulnerability currently exists is not a good argument to not fix it 😅 )

In the new form, I guess if we modify our discovery to start searching for these peer-ids, maybe there is a greater chance of being eclipsed as we are in effect splitting the DHT into smaller sub-DHTs. I think the security guarantees are the same, in that if there are some honest nodes discovery should find them also.

Mixing in RANDAO then ties in fork-choice (as you've pointed out). If it's too new and peers are temporarily on different forks, I'd imagine we'd start penalizing our peers for not being on the right subnets etc.

Do you have a suggestion for mixing in the RANDO. Like maybe current_epoch - 20 or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with using anything moderately recent is that a chain split past that depth (non-finality period) would segment the p2p network because people (from your perspective) wouldn't be on the correct subnets and yuo would downscore. In practice, if there is actually a split, there is likely some sort of partition anyway.

But it does begin to seem dangerous for recent windows

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree.

My thoughts on this is to leave it deterministic for the time being. Perhaps we could add it in in a later PR. I imagine if we decide to set ATTESTATION_SUBNET_EXTRA_BITS to something non-zero, we could add it in then with that change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion for mixing in the RANDO. Like maybe current_epoch - 20 or something?

The number 20 doesn't have to be fixed. I think using the randao mix of the finalized_checkpoint of the head block is the best idea now, since it's unlikely to be reverted.

return [compute_subnet(node_id, epoch, idx) for idx in range(SUBNETS_PER_NODE)]
```

*Note*: Nodes should subscribe to new subnets and remain subscribed to old subnets for at least one epoch. Nodes should pick a random duration to unsubscribe from old subnets to smooth the transition on the exact epoch boundary of which the shuffling changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to stagger subnet subscriptions to avoid the network churning all at once (and to avoid the issue of having to double-up on subnets for some amount of time).

If the node-id dictated the epoch to churn -- essentially defining a (256, 512] period in which you keep your subscription, then we'd not have the churn all at once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random local offset could work too if it's spaced enough but then we lose the determinism (and penalization) in that period

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, seems this is an active convo on the pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Jim's suggestion for now.

@AgeManning
Copy link
Contributor Author

It seems the boundary rotation was of concern to a few people. I've added in @mcdee's suggestion, where every epoch 1/256 of the nodes do their transition.

Curious about other's opinions on this solution.

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The staggering looks good -- one quick question on groupings

specs/phase0/validator.md Show resolved Hide resolved
@Nashatyrev
Copy link
Member

Nashatyrev commented Apr 21, 2023

It seems the boundary rotation was of concern to a few people. I've added in @mcdee's suggestion, where every epoch 1/256 of the nodes do their transition.

Curious about other's opinions on this solution.

I believe the part of the original idea was to have an ability to search nodes for a subnet by their prefix in DHT.
It looks like that staggering approach disables that feature and one needs the full list of nodes from discovery to filter the nodes for any specific subnet.

The other concern: adding node_offset to the permutation_seed effectively gives us 256 * 64 = 16384 groups in the node_id space which looks like a random assignment given we have around 10K nodes. So basically the permutation have a little sense for that approach

UPD:
Sorry, my concerns are wrong actually.
The staggering approach looks good!
We just need to search for 2 prefixes (for the current and the next period) instead of 1 to find candidate node for a single subnet. If I get it right

@djrtwo
Copy link
Contributor

djrtwo commented Apr 21, 2023

This is in a "LAST CALL" state., with the intention of merging on May 4 if no further issues emerge.

Note, this will be merged without the downscoring option to ensure that it can be rolled out iteratively across clients. Options to downscore if not on appropriate peer-id attnets will be added at a subsequent hard fork

specs/phase0/validator.md Outdated Show resolved Hide resolved
Comment on lines 619 to 626
node_id_prefix = node_id >> (256 - int(ATTESTATION_SUBNET_PREFIX_BITS))
node_offset = node_id % EPOCHS_PER_SUBNET_SUBSCRIPTION
permutation_seed = hash(uint_to_bytes(uint64((epoch + node_offset) // EPOCHS_PER_SUBNET_SUBSCRIPTION)))
permutated_prefix = compute_shuffled_index(
node_id_prefix,
1 << int(ATTESTATION_SUBNET_PREFIX_BITS),
permutation_seed,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AgeManning I added three castings (int() and uint64()) since remerkleable doesn't allow some mix int & unit operations.

specs/phase0/validator.md Outdated Show resolved Hide resolved
*Note*: Short lived beacon committee assignments should not be added in into the ENR `attnets` entry.
```python
def compute_subscribed_subnet(node_id: int, epoch: Epoch, index: int) -> int:
node_id_prefix = node_id >> (256 - int(ATTESTATION_SUBNET_PREFIX_BITS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about adding assert spec.ATTESTATION_SUBNET_PREFIX_BITS <= 256 in test_config_invariants.py unittest. But before that, should 256 be parameterized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to add NODE_ID_SIZE = 32 and derive bit count from it?


*Note*: Short lived beacon committee assignments should not be added in into the ENR `attnets` entry.
```python
def compute_subscribed_subnet(node_id: int, epoch: Epoch, index: int) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the maximum size of node_id?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was assumed UInt256. Not sure though what's the most appropriate type for pyspec

Macoophonic

This comment was marked as spam.

@hwwhww
Copy link
Contributor

hwwhww commented May 4, 2023

@Nashatyrev @AgeManning

Some updates:

  • add custom types NodeID: uint256 and SubnetID: uint64. It helps improve readability a bit.
  • add constant NODE_ID_BITS = 256 and replace the 256 in compute_subscribed_subnet with it.
  • add invariant test to verify NODE_ID_BITS is set correctly.

@AgeManning
Copy link
Contributor Author

Sounds good. Thanks @hwwhww. The node-id is a 32-byte hash so fixing it to 256 bits is fine :).

@AgeManning
Copy link
Contributor Author

In the process of releasing this thing. I was thinking potentially having a different value of SUBNETS_PER_NODE for different networks. I was going to add this into our chain configuration, so we could increase it for the testnet etc.

But perhaps we want to keep it low for testnets to detect any cracks before we hit mainnet (the testnets have lower node counts).

bors bot pushed a commit to sigp/lighthouse that referenced this pull request May 30, 2023
This PR address the following spec change: ethereum/consensus-specs#3312

Instead of subscribing to a long-lived subnet for every attached validator to a beacon node, all beacon nodes will subscribe to `SUBNETS_PER_NODE` long-lived subnets. This is currently set to 2 for mainnet. 

This PR does not include any scoring or advanced discovery mechanisms. A future PR will improve discovery and we can implement scoring after the next hard fork when we expect all client teams and all implementations to respect this spec change.

This will be a significant change in the subnet network structure for consensus clients and we will likely have to monitor and tweak our peer management logic.
divagant-martian pushed a commit to divagant-martian/lighthouse that referenced this pull request Jun 7, 2023
This PR address the following spec change: ethereum/consensus-specs#3312

Instead of subscribing to a long-lived subnet for every attached validator to a beacon node, all beacon nodes will subscribe to `SUBNETS_PER_NODE` long-lived subnets. This is currently set to 2 for mainnet. 

This PR does not include any scoring or advanced discovery mechanisms. A future PR will improve discovery and we can implement scoring after the next hard fork when we expect all client teams and all implementations to respect this spec change.

This will be a significant change in the subnet network structure for consensus clients and we will likely have to monitor and tweak our peer management logic.
@arnetheduck
Copy link
Contributor

I assumed this was well-known (ie it has been discussed before), but it seems it was not per comments on the latest consensus call:

Regarding the point that nodes can use this structure enforce participation in subnet traffic, this PR does not help: a node can simply subscribe to a topic and not join the mesh to achieve a similar effect to not subscribing at all (there is still no way to detect if a node has a full mesh already and therefore legitimately is pruning) - if we start enforcing this in clients, it's likely we'll see larger subscription tables achieving a small increased book-keeping cost but still no more nodes carrying traffic.

ghost pushed a commit to oone-world/lighthouse that referenced this pull request Jul 13, 2023
This PR address the following spec change: ethereum/consensus-specs#3312

Instead of subscribing to a long-lived subnet for every attached validator to a beacon node, all beacon nodes will subscribe to `SUBNETS_PER_NODE` long-lived subnets. This is currently set to 2 for mainnet. 

This PR does not include any scoring or advanced discovery mechanisms. A future PR will improve discovery and we can implement scoring after the next hard fork when we expect all client teams and all implementations to respect this spec change.

This will be a significant change in the subnet network structure for consensus clients and we will likely have to monitor and tweak our peer management logic.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
This PR address the following spec change: ethereum/consensus-specs#3312

Instead of subscribing to a long-lived subnet for every attached validator to a beacon node, all beacon nodes will subscribe to `SUBNETS_PER_NODE` long-lived subnets. This is currently set to 2 for mainnet. 

This PR does not include any scoring or advanced discovery mechanisms. A future PR will improve discovery and we can implement scoring after the next hard fork when we expect all client teams and all implementations to respect this spec change.

This will be a significant change in the subnet network structure for consensus clients and we will likely have to monitor and tweak our peer management logic.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
This PR address the following spec change: ethereum/consensus-specs#3312

Instead of subscribing to a long-lived subnet for every attached validator to a beacon node, all beacon nodes will subscribe to `SUBNETS_PER_NODE` long-lived subnets. This is currently set to 2 for mainnet.

This PR does not include any scoring or advanced discovery mechanisms. A future PR will improve discovery and we can implement scoring after the next hard fork when we expect all client teams and all implementations to respect this spec change.

This will be a significant change in the subnet network structure for consensus clients and we will likely have to monitor and tweak our peer management logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.