Skip to content

Commit

Permalink
EIP-7594: Decouple network subnets from das-core
Browse files Browse the repository at this point in the history
Currently we use subnets as a unit of custody in the PeerDAS core
protocol because it doesn't make sense to partially custody only some
columns in the subnets and waste the bandwidth to download the columns
the node doesn't custody.

Since subnets correspond to GossipSub topics which are in a layer lower
than the core protocol, using subnets as a unit of custody makes the
core layer and the network layer too coupled to each other and leave no
room for the network layer flexibility.

This commit introduces "custody groups" which are used a unit of custody
instead of subnets.

The immediate benefit of the decoupling is that we can immediately
increase the number of subnets without affecting the expected number of
peers to cover all columns and affecting the network stability and
without touching the core protocol.

The reason we want to increase the number of subnets to match the number
of columns is that the columns will be propagated through the network
faster when they have their own subnets. Just like EIP-4844, each
blob has its own subnet because, if all the blobs are in a single subnet,
the blobs will be propagated more slowly.

Since we keep the number of custody groups the same as the previous
number of subnets (32), the expected number of peers you need to cover
all the columns is not changed. In fact, you need only NUMBER_OF_COLUMNS
and NUMBER_OF_CUSTODY_GROUPS to analyze the expected number, which
makes the core protocol completely decoupled from the network layer.
  • Loading branch information
ppopth committed Oct 4, 2024
1 parent a6294c6 commit cbc4b55
Show file tree
Hide file tree
Showing 11 changed files with 124 additions and 81 deletions.
1 change: 1 addition & 0 deletions configs/mainnet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ WHISK_PROPOSER_SELECTION_GAP: 2

# EIP7594
NUMBER_OF_COLUMNS: 128
NUMBER_OF_CUSTODY_GROUPS: 128
DATA_COLUMN_SIDECAR_SUBNET_COUNT: 128
MAX_REQUEST_DATA_COLUMN_SIDECARS: 16384
SAMPLES_PER_SLOT: 8
Expand Down
1 change: 1 addition & 0 deletions configs/minimal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ WHISK_PROPOSER_SELECTION_GAP: 1

# EIP7594
NUMBER_OF_COLUMNS: 128
NUMBER_OF_CUSTODY_GROUPS: 128
DATA_COLUMN_SIDECAR_SUBNET_COUNT: 128
MAX_REQUEST_DATA_COLUMN_SIDECARS: 16384
SAMPLES_PER_SLOT: 8
Expand Down
77 changes: 42 additions & 35 deletions specs/_features/eip7594/das-core.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,28 @@
- [Custom types](#custom-types)
- [Configuration](#configuration)
- [Data size](#data-size)
- [Networking](#networking)
- [Custody setting](#custody-setting)
- [Containers](#containers)
- [`DataColumnSidecar`](#datacolumnsidecar)
- [`MatrixEntry`](#matrixentry)
- [Helper functions](#helper-functions)
- [`get_custody_columns`](#get_custody_columns)
- [`get_custody_groups`](#get_custody_groups)
- [`compute_columns_for_custody_group`](#compute_columns_for_custody_group)
- [`compute_matrix`](#compute_matrix)
- [`recover_matrix`](#recover_matrix)
- [`get_data_column_sidecars`](#get_data_column_sidecars)
- [Custody](#custody)
- [Custody requirement](#custody-requirement)
- [Public, deterministic selection](#public-deterministic-selection)
- [Subnet sampling](#subnet-sampling)
- [Custody sampling](#custody-sampling)
- [Extended data](#extended-data)
- [Column gossip](#column-gossip)
- [Parameters](#parameters)
- [Reconstruction and cross-seeding](#reconstruction-and-cross-seeding)
- [FAQs](#faqs)
- [Row (blob) custody](#row-blob-custody)
- [Subnet stability](#subnet-stability)
- [Why don't nodes custody rows?](#why-dont-nodes-custody-rows)
- [Why don't we rotate custody over time?](#why-dont-we-rotate-custody-over-time)
- [Does having a lot of column subnets make the network unstable?](#does-having-a-lot-of-column-subnets-make-the-network-unstable)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
<!-- /TOC -->
Expand All @@ -54,6 +55,7 @@ The following values are (non-configurable) constants used throughout the specif
| - | - | - |
| `RowIndex` | `uint64` | Row identifier in the matrix of cells |
| `ColumnIndex` | `uint64` | Column identifier in the matrix of cells |
| `CustodyIndex` | `uint64` | Custody group identifier in the set of custody groups |

## Configuration

Expand All @@ -63,18 +65,13 @@ The following values are (non-configurable) constants used throughout the specif
| - | - | - |
| `NUMBER_OF_COLUMNS` | `uint64(CELLS_PER_EXT_BLOB)` (= 128) | Number of columns in the extended data matrix |

### Networking

| Name | Value | Description |
| - | - | - |
| `DATA_COLUMN_SIDECAR_SUBNET_COUNT` | `uint64(128)` | The number of data column sidecar subnets used in the gossipsub protocol |

### Custody setting

| Name | Value | Description |
| - | - | - |
| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumnSidecar` random samples a node queries per slot |
| `CUSTODY_REQUIREMENT` | `4` | Minimum number of subnets an honest node custodies and serves samples from |
| `NUMBER_OF_CUSTODY_GROUPS` | `128` | Number of custody groups available for nodes to custody |
| `CUSTODY_REQUIREMENT` | `4` | Minimum number of custody groups an honest node custodies and serves samples from |

### Containers

Expand Down Expand Up @@ -102,33 +99,39 @@ class MatrixEntry(Container):

## Helper functions

### `get_custody_columns`
### `get_custody_groups`

```python
def get_custody_columns(node_id: NodeID, custody_subnet_count: uint64) -> Sequence[ColumnIndex]:
assert custody_subnet_count <= DATA_COLUMN_SIDECAR_SUBNET_COUNT
def get_custody_groups(node_id: NodeID, custody_group_count: uint64) -> Sequence[CustodyIndex]:
assert custody_group_count <= NUMBER_OF_CUSTODY_GROUPS

subnet_ids: List[uint64] = []
custody_groups: List[uint64] = []
current_id = uint256(node_id)
while len(subnet_ids) < custody_subnet_count:
subnet_id = (
while len(custody_groups) < custody_group_count:
custody_group = CustodyIndex(
bytes_to_uint64(hash(uint_to_bytes(uint256(current_id)))[0:8])
% DATA_COLUMN_SIDECAR_SUBNET_COUNT
% NUMBER_OF_CUSTODY_GROUPS
)
if subnet_id not in subnet_ids:
subnet_ids.append(subnet_id)
if custody_group not in custody_groups:
custody_groups.append(custody_group)
if current_id == UINT256_MAX:
# Overflow prevention
current_id = NodeID(0)
current_id += 1

assert len(subnet_ids) == len(set(subnet_ids))
assert len(custody_groups) == len(set(custody_groups))
return sorted(custody_groups)
```

columns_per_subnet = NUMBER_OF_COLUMNS // DATA_COLUMN_SIDECAR_SUBNET_COUNT
### `compute_columns_for_custody_group`

```python
def compute_columns_for_custody_group(custody_group: CustodyIndex) -> Sequence[ColumnIndex]:
assert custody_group < NUMBER_OF_CUSTODY_GROUPS
columns_per_group = NUMBER_OF_COLUMNS // NUMBER_OF_CUSTODY_GROUPS
return sorted([
ColumnIndex(DATA_COLUMN_SIDECAR_SUBNET_COUNT * i + subnet_id)
for i in range(columns_per_subnet)
for subnet_id in subnet_ids
ColumnIndex(NUMBER_OF_CUSTODY_GROUPS * i + custody_group)
for i in range(columns_per_group)
])
```

Expand Down Expand Up @@ -220,21 +223,21 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock,

### Custody requirement

Each node downloads and custodies a minimum of `CUSTODY_REQUIREMENT` subnets per slot. The particular subnets that the node is required to custody are selected pseudo-randomly (more on this below).
Columns are grouped into custody groups. Nodes custodying a custody group MUST custody all the columns in that group.

A node *may* choose to custody and serve more than the minimum honesty requirement. Such a node explicitly advertises a number greater than `CUSTODY_REQUIREMENT` through the peer discovery mechanism, specifically by setting a higher value in the `custody_subnet_count` field within its ENR. This value can be increased up to `DATA_COLUMN_SIDECAR_SUBNET_COUNT`, indicating a super-full node.
A node *may* choose to custody and serve more than the minimum honesty requirement. Such a node explicitly advertises a number greater than `CUSTODY_REQUIREMENT` through the peer discovery mechanism, specifically by setting a higher value in the `custody_group_count` field within its ENR. This value can be increased up to `NUMBER_OF_CUSTODY_GROUPS`, indicating a super-full node.

A node stores the custodied columns for the duration of the pruning period and responds to peer requests for samples on those columns.

### Public, deterministic selection

The particular columns that a node custodies are selected pseudo-randomly as a function (`get_custody_columns`) of the node-id and custody size -- importantly this function can be run by any party as the inputs are all public.
The particular columns/groups that a node custodies are selected pseudo-randomly as a function (`get_custody_groups`) of the node-id and custody size -- importantly this function can be run by any party as the inputs are all public.

*Note*: increasing the `custody_size` parameter for a given `node_id` extends the returned list (rather than being an entirely new shuffle) such that if `custody_size` is unknown, the default `CUSTODY_REQUIREMENT` will be correct for a subset of the node's custody.

## Subnet sampling
## Custody sampling

At each slot, a node advertising `custody_subnet_count` downloads a minimum of `subnet_sampling_size = max(SAMPLES_PER_SLOT, custody_subnet_count)` total subnets. The corresponding set of columns is selected by `get_custody_columns(node_id, subnet_sampling_size)`, so that in particular the subset of columns to custody is consistent with the output of `get_custody_columns(node_id, custody_subnet_count)`. Sampling is considered successful if the node manages to retrieve all selected columns.
At each slot, a node advertising `custody_group_count` downloads a minimum of `sampling_size = max(SAMPLES_PER_SLOT, custody_group_count)` total custody groups. The corresponding set of columns is selected by `groups = get_custody_groups(node_id, sampling_size)` and `compute_columns_for_custody_group(group) for group in groups`, so that in particular the subset of columns to custody is consistent with the output of `get_custody_groups(node_id, custody_group_count)`. Sampling is considered successful if the node manages to retrieve all selected columns.

## Extended data

Expand All @@ -246,7 +249,7 @@ In this construction, we extend the blobs using a one-dimensional erasure coding

For each column -- use `data_column_sidecar_{subnet_id}` subnets, where `subnet_id` can be computed with the `compute_subnet_for_data_column_sidecar(column_index: ColumnIndex)` helper. The sidecars can be computed with the `get_data_column_sidecars(signed_block: SignedBeaconBlock, blobs: Sequence[Blob])` helper.

Verifiable samples from their respective column are distributed on the assigned subnet. To custody a particular column, a node joins the respective gossipsub subnet. If a node fails to get a column on the column subnet, a node can also utilize the Req/Resp protocol to query the missing column from other peers.
Verifiable samples from their respective column are distributed on the assigned subnet. To custody columns in a particular custody group, a node joins the respective gossipsub subnets. If a node fails to get columns on the column subnets, a node can also utilize the Req/Resp protocol to query the missing columns from other peers.

## Reconstruction and cross-seeding

Expand All @@ -262,7 +265,7 @@ Once the node obtains a column through reconstruction, the node MUST expose the

## FAQs

### Row (blob) custody
### Why don't nodes custody rows?

In the one-dimension construction, a node samples the peers by requesting the whole `DataColumnSidecar`. In reconstruction, a node can reconstruct all the blobs by 50% of the columns. Note that nodes can still download the row via `blob_sidecar_{subnet_id}` subnets.

Expand All @@ -273,6 +276,10 @@ The potential benefits of having row custody could include:

However, for simplicity, we don't assign row custody assignments to nodes in the current design.

### Subnet stability
### Why don't we rotate custody over time?

To start with a simple, stable backbone, for now, we don't shuffle the custody assignments via the deterministic custody selection helper `get_custody_groups`. However, staggered rotation likely needs to happen on the order of the pruning period to ensure subnets can be utilized for recovery. For example, introducing an `epoch` argument allows the function to maintain stability over many epochs.

### Does having a lot of column subnets make the network unstable?

To start with a simple, stable backbone, for now, we don't shuffle the subnet assignments via the deterministic custody selection helper `get_custody_columns`. However, staggered rotation likely needs to happen on the order of the pruning period to ensure subnets can be utilized for recovery. For example, introducing an `epoch` argument allows the function to maintain stability over many epochs.
No, the number of subnets doesn't really matter. What matters to the network stability is the number of nodes and the churn rate in the network. If the number of the nodes is too low, it's likely to have a network partition when some nodes are down. For the churn rate, if the churn rate is high, we even need to have a higher number of nodes, since nodes are likely to be turned off more often.
9 changes: 5 additions & 4 deletions specs/_features/eip7594/p2p-interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
- [GetMetaData v3](#getmetadata-v3)
- [The discovery domain: discv5](#the-discovery-domain-discv5)
- [ENR structure](#enr-structure)
- [Custody subnet count](#custody-subnet-count)
- [Custody group count](#custody-group-count)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
<!-- /TOC -->
Expand All @@ -49,6 +49,7 @@

| Name | Value | Description |
|------------------------------------------------|------------------------------------------------|---------------------------------------------------------------------------|
| `DATA_COLUMN_SIDECAR_SUBNET_COUNT` | `128` | The number of data column sidecar subnets used in the gossipsub protocol |
| `MAX_REQUEST_DATA_COLUMN_SIDECARS` | `MAX_REQUEST_BLOCKS_DENEB * NUMBER_OF_COLUMNS` | Maximum number of data column sidecars in a single request |
| `MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS` | `2**12` (= 4096 epochs, ~18 days) | The minimum epoch range over which a node must serve data column sidecars |

Expand Down Expand Up @@ -318,10 +319,10 @@ Requests the MetaData of a peer, using the new `MetaData` definition given above

#### ENR structure

##### Custody subnet count
##### Custody group count

A new field is added to the ENR under the key `csc` to facilitate custody data column discovery.
A new field is added to the ENR under the key `cgc` to facilitate custody data column discovery.

| Key | Value |
|--------|------------------------------------------|
| `csc` | Custody subnet count, `uint64` big endian integer with no leading zero bytes (`0` is encoded as empty byte string) |
| `cgc` | Custody group count, `uint64` big endian integer with no leading zero bytes (`0` is encoded as empty byte string) |
4 changes: 2 additions & 2 deletions specs/_features/eip7594/peer-sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ For reference, the table below shows the number of samples and the number of all

A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries.

A node SHOULD query for samples from selected peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) it could request from, identifying a list of candidate peers for each selected column.
A node SHOULD query for samples from selected peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_groups` helper to determine which peer(s) it could request from, identifying a list of candidate peers for each selected column.

If more than one candidate peer is found for a given column, a node SHOULD randomize its peer selection to distribute sample query load in the network. Nodes MAY use peer scoring to tune this selection (for example, by using weighted selection or by using a cut-off threshold). If possible, it is also recommended to avoid requesting many columns from the same peer in order to avoid relying on and exposing the sample selection to a single peer.

Expand All @@ -115,4 +115,4 @@ A DAS provider is a consistently-available-for-DAS-queries, super-full (or high

DAS providers can also be found out-of-band and configured into a node to connect to directly and prioritize. Nodes can add some set of these to their local configuration for persistent connection to bolster their DAS quality of service.

Such direct peering utilizes a feature supported out of the box today on all nodes and can complement (and reduce attackability and increase quality-of-service) alternative peer discovery mechanisms.
Such direct peering utilizes a feature supported out of the box today on all nodes and can complement (and reduce attackability and increase quality-of-service) alternative peer discovery mechanisms.
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,26 @@
)


def _run_get_custody_columns(spec, rng, node_id=None, custody_subnet_count=None):
def _run_get_custody_columns(spec, rng, node_id=None, custody_group_count=None):
if node_id is None:
node_id = rng.randint(0, 2**256 - 1)

if custody_subnet_count is None:
custody_subnet_count = rng.randint(0, spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT)
if custody_group_count is None:
custody_group_count = rng.randint(0, spec.config.NUMBER_OF_CUSTODY_GROUPS)

result = spec.get_custody_columns(node_id, custody_subnet_count)
columns_per_group = spec.config.NUMBER_OF_COLUMNS // spec.config.NUMBER_OF_CUSTODY_GROUPS
groups = spec.get_custody_groups(node_id, custody_group_count)
yield 'node_id', 'meta', node_id
yield 'custody_subnet_count', 'meta', int(custody_subnet_count)
yield 'custody_group_count', 'meta', int(custody_group_count)

result = []
for group in groups:
group_columns = spec.compute_columns_for_custody_group(group)
assert len(group_columns) == columns_per_group
result.extend(group_columns)

assert len(result) == len(set(result))
assert len(result) == (
custody_subnet_count * spec.config.NUMBER_OF_COLUMNS // spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT
)
assert len(result) == custody_group_count * columns_per_group
assert all(i < spec.config.NUMBER_OF_COLUMNS for i in result)
python_list_result = [int(i) for i in result]

Expand All @@ -31,48 +36,48 @@ def _run_get_custody_columns(spec, rng, node_id=None, custody_subnet_count=None)
@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__min_node_id_min_custody_subnet_count(spec):
def test_get_custody_columns__min_node_id_min_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(spec, rng, node_id=0, custody_subnet_count=0)
yield from _run_get_custody_columns(spec, rng, node_id=0, custody_group_count=0)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__min_node_id_max_custody_subnet_count(spec):
def test_get_custody_columns__min_node_id_max_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(
spec, rng, node_id=0,
custody_subnet_count=spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT)
custody_group_count=spec.config.NUMBER_OF_CUSTODY_GROUPS)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__max_node_id_min_custody_subnet_count(spec):
def test_get_custody_columns__max_node_id_min_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(spec, rng, node_id=2**256 - 1, custody_subnet_count=0)
yield from _run_get_custody_columns(spec, rng, node_id=2**256 - 1, custody_group_count=0)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__max_node_id_max_custody_subnet_count(spec):
def test_get_custody_columns__max_node_id_max_custody_group_count(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(
spec, rng, node_id=2**256 - 1,
custody_subnet_count=spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT,
custody_group_count=spec.config.NUMBER_OF_CUSTODY_GROUPS,
)


@with_eip7594_and_later
@spec_test
@single_phase
def test_get_custody_columns__max_node_id_max_custody_subnet_count_minus_1(spec):
def test_get_custody_columns__max_node_id_max_custody_group_count_minus_1(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(
spec, rng, node_id=2**256 - 2,
custody_subnet_count=spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT,
custody_group_count=spec.config.NUMBER_OF_CUSTODY_GROUPS,
)


Expand All @@ -81,7 +86,7 @@ def test_get_custody_columns__max_node_id_max_custody_subnet_count_minus_1(spec)
@single_phase
def test_get_custody_columns__short_node_id(spec):
rng = random.Random(1111)
yield from _run_get_custody_columns(spec, rng, node_id=1048576, custody_subnet_count=1)
yield from _run_get_custody_columns(spec, rng, node_id=1048576, custody_group_count=1)


@with_eip7594_and_later
Expand Down
Loading

0 comments on commit cbc4b55

Please sign in to comment.