Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-4844: Free the blobs #3244

Merged
merged 24 commits into from
Feb 20, 2023
Merged

Conversation

arnetheduck
Copy link
Contributor

@arnetheduck arnetheduck commented Feb 7, 2023

This PR reintroduces and further decouples blocks and blobs in EIP-4844, so as to improve network and processing performance.

Block and blob processing, for the purpose of gossip validation, are independent: they can both be propagated and gossip-validated in parallel - the decoupled design allows 4 important optimizations (or, if you are so inclined, removes 4 unnecessary pessimizations):

  • Blocks and blobs travel on independent meshes allowing for better parallelization and utilization of high-bandwidth peers
  • Re-broadcasting after validation can start earlier allowing more efficient use of upload bandwidth - blocks for example can be rebroadcast to peers while blobs are being downloaded
  • bandwidth-reduction techniques such as per-peer deduplication are more efficient because of the smaller message size
  • gossip verification happens independently for blocks and blobs, allowing better sharing / use of CPU and I/O resources in clients

This design in particular sends each blob on a separate subnet, thus maximising the potential for parallelisation and providing a natural path for growing the number of blobs per block.

Changes compared to the current design include:

  • BlobsSidecar is split into individual BlobSidecar containers - each container is signed individually by the proposer - the signature is used during gossip validation but later dropped.
  • KZG commitment verification is moved out of the gossip pipeline and instead done before fork choice addition, when both block and sidecars have arrived
    • clients may verify individual blob commitments earlier
  • more generally and similar to block verification, gossip propagation is performed solely based on trivial consistency checks and proposer signature verification
  • by-root blob requests are done per-blob, so as to retain the ability to fill in blobs one-by-one assuming clients generally receive blobs from gossip
  • by-range blob requests are done per-block, so as to simplify historical sync
  • range and root requests are limited to 128 entries for both blocks and blobs - practically, the current higher limit of 1024 for blocks does not get used and keeping the limits consistent simplifies implementation - with the merge, block sizes have grown significantly and clients generally fetch smaller chunks.

https://hackmd.io/@arnetheduck/BJiJw012j contains further notes and information.

TODO:

  • implement cryptography properly (Add KZG multi verify function #3236)
  • (conversation moved to Blob spam protection mechanism #3261) consider more carefully the case of a proposer publishing two blobs for the same index, one valid and one invalid, to separate peers hoping for a network split due to the seen-once rule:
    • we can clear the seen flag, then try to download the correct blob from other peers
    • we can propagate both blobs and specify that peers that have seen more than one must drop the block from fork choice
  • should the blobs be signed in the PROPOSER domain or a separate one? (added domain)

This PR reintroduces and further decouples blocks and blobs in EIP-4844,
so as to improve network and processing performance.

Block and blob processing, for the purpose of gossip validation, are
independent: they can both be propagated and gossip-validated
in parallel - the decoupled design allows 4 important optimizations
(or, if you are so inclined, removes 4 unnecessary pessimizations):

* Blocks and blobs travel on independent meshes allowing for better
parallelization and utilization of high-bandwidth peers
* Re-broadcasting after validation can start earlier allowing more
efficient use of upload bandwidth - blocks for example can be
rebroadcast to peers while blobs are still being downloaded
* bandwidth-reduction techniques such as per-peer deduplication are more
efficient because of the smaller message size
* gossip verification happens independently for blocks and blobs,
allowing better sharing / use of CPU and I/O resources in clients

With growing block sizes and additional blob data to stream, the network
streaming time becomes a dominant factor in propagation times - on a
100mbit line, streaming 1mb to 8 peers takes ~1s - this process is
repeated for each hop in both incoming and outgoing directions.

This design in particular sends each blob on a separate subnet, thus
maximising the potential for parallelisation and providing a natural
path for growing the number of blobs per block should the network be
judged to be able to handle it.

Changes compared to the current design include:

* `BlobsSidecar` is split into individual `BlobSidecar` containers -
each container is signed individually by the proposer
  * the signature is used during gossip validation but later dropped.
* KZG commitment verification is moved out of the gossip pipeline and
instead done before fork choice addition, when both block and sidecars
have arrived
  * clients may verify individual blob commitments earlier
* more generally and similar to block verification, gossip propagation
is performed solely based on trivial consistency checks and proposer
signature verification
* by-root blob requests are done per-blob, so as to retain the ability
to fill in blobs one-by-one assuming clients generally receive blobs
from gossip
* by-range blob requests are done per-block, so as to simplify
historical sync
* range and root requests are limited to `128` entries for both blocks
and blobs - practically, the current higher limit of `1024` for blocks
does not get used and keeping the limits consistent simplifies
implementation - with the merge, block sizes have grown significantly
and clients generally fetch smaller chunks.
@arnetheduck arnetheduck marked this pull request as draft February 7, 2023 10:12
@arnetheduck arnetheduck changed the title Free the blobs EIP-4844: Free the blobs Feb 7, 2023
specs/eip4844/fork-choice.md Outdated Show resolved Hide resolved
specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved
specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved
specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved
specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

I did a first pass. Looks good, a number of small suggestions and questions

@@ -44,6 +44,7 @@ This upgrade adds blobs to the beacon chain as part of EIP-4844. This is an exte
| Name | SSZ equivalent | Description |
| - | - | - |
| `VersionedHash` | `Bytes32` | |
| `BlobIndex` | `uint64` | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define this here? Just because we don't have a precedent for defining types in the p2p? Seems a bit more natural to define it next to the BlobSideCar def in p2p or validator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where we've defined it for other general "index" types (like the ValidatorIndex) - not an opinion, happy to move

specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved

```python
def is_data_available(slot: Slot, beacon_block_root: Root, blob_kzg_commitments: Sequence[KZGCommitment]) -> bool:
# `retrieve_blobs_sidecar` is implementation and context dependent, raises an exception if not available.
# Note: the p2p network does not guarantee sidecar retrieval outside of `MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS`
sidecar = retrieve_blobs_sidecar(slot, beacon_block_root)
sidecars = retrieve_blob_sidecars(slot, beacon_block_root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we have two different message deliveries -- either in separate sidecars or in an aggregate sidecar -- which makes this retrieval and validation very context dependent. Did you get it from gossip or did you get it in an aggregate side car via byRange.

My preference would be to make this retrieve_aggregate_sidecar and assume that if you got it from local gossip, you can bundle it into the aggregate structure. But this depends on which type of proof (individual or aggregate) is sent in the individual sidecars

Or simply that we abstract the details here such that these fork-choice functions don't care how the blobs were received

cc: @asn-d6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply that we abstract the details here such that these fork-choice functions don't care how the blobs were received

This would be my preference - the design I'm aiming for is to allow the client to validate some commitments individually and some in aggregate, but still mandating that all have passed some validation, before passing the block to fork choice - we could see the aggregation as a pure optimization (assuming aggregate+verify is faster than multiple-single-verifies) - this is the case for "batch verification" of attestations for example.

Actually, that reminds me: we use "batch" to denote any form of aggregation done for the purpose of performance whereas "aggregate" is done for spec reasons. Named that way, we can treat everything up to here as individual blobs without aggregation, but also expose batched-validation-functions in relevant kzg libraries (again, assuming this makes sense, crypto-wise - I do not have a clear picture of this).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan at the moment is to provide individual KZG proofs as described here #3244 (comment) . We will still be able to batch-validate them in an amortized way if you have a bunch of blobs and proofs, but we won't use a special aggregated proof to do so.

specs/eip4844/fork-choice.md Outdated Show resolved Hide resolved
specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved
specs/eip4844/validator.md Outdated Show resolved Hide resolved
specs/eip4844/validator.md Outdated Show resolved Hide resolved
specs/eip4844/validator.md Outdated Show resolved Hide resolved
specs/eip4844/validator.md Outdated Show resolved Hide resolved
specs/eip4844/validator.md Outdated Show resolved Hide resolved
specs/eip4844/validator.md Outdated Show resolved Hide resolved
SebastianSupreme

This comment was marked as outdated.

SebastianSupreme

This comment was marked as resolved.

Copy link

@SebastianSupreme SebastianSupreme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think there's a comma missing.

specs/eip4844/validator.md Outdated Show resolved Hide resolved
@hwwhww hwwhww added the Deneb was called: eip-4844 label Feb 9, 2023
specs/eip4844/fork-choice.md Outdated Show resolved Hide resolved
specs/eip4844/fork-choice.md Outdated Show resolved Hide resolved
specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved
specs/eip4844/fork-choice.md Outdated Show resolved Hide resolved
specs/eip4844/p2p-interface.md Outdated Show resolved Hide resolved
@hwwhww
Copy link
Contributor

hwwhww commented Feb 10, 2023

@arnetheduck since this PR is under active development and is still a draft, I'm not going to force-push to fix the conflicts now. Let me know if you need me to fix the conflicts.

* separate constant for blob requests
* pedantry
* expand sidecar gossip conditions
* editing
* add spec text for `BlobSidecar` signatures
@arnetheduck
Copy link
Contributor Author

@hwwhww thanks, I pushed a merge commit fixing the renames :)

@Nashatyrev
Copy link
Member

Here is the simulation results on block + blobs dissemination delays (coupled v.s. decoupled). Decoupled approach definitely wins with that respect:
image
(source data)

Decoupling also appears to save global network traffic:
image
(original sheet)

specs/deneb/fork-choice.md Outdated Show resolved Hide resolved
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent! getting close

specs/deneb/p2p-interface.md Show resolved Hide resolved
specs/deneb/p2p-interface.md Outdated Show resolved Hide resolved
specs/deneb/p2p-interface.md Outdated Show resolved Hide resolved
specs/deneb/p2p-interface.md Outdated Show resolved Hide resolved
specs/deneb/p2p-interface.md Outdated Show resolved Hide resolved
specs/deneb/p2p-interface.md Outdated Show resolved Hide resolved
specs/deneb/p2p-interface.md Outdated Show resolved Hide resolved
specs/deneb/validator.md Outdated Show resolved Hide resolved
specs/deneb/validator.md Outdated Show resolved Hide resolved
specs/deneb/validator.md Outdated Show resolved Hide resolved
@hwwhww
Copy link
Contributor

hwwhww commented Feb 16, 2023

FYI I'm looking into linter and test errors after fixing conflicts. 👀

@hwwhww
Copy link
Contributor

hwwhww commented Feb 16, 2023

@arnetheduck
I fixed conflicts and tests here: https://github.com/ethereum/consensus-specs/tree/pr3244-fix
CI is green now. Let me know if I can push the commits or if you'd like to cherry-pick it.

@asn-d6 @dankrad, it includes some polynomial-commitments.md fixes in a7e45db

@realbigsean realbigsean mentioned this pull request Feb 16, 2023
10 tasks
@arnetheduck
Copy link
Contributor Author

@hwwhww excellent, thanks! I've pushed the commits here too

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fantastic work. just a couple of minor comments and a request for input from @michaelsproul

planning on merging tomorrow

- _[IGNORE]_ The blob's block's parent (defined by `sidecar.block_parent_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue blocks for processing once the parent block is retrieved).
- _[REJECT]_ The proposer signature, `signed_blob_sidecar.signature`, is valid with respect to the `sidecar.proposer_index` pubkey.
- _[IGNORE]_ The sidecar is the only sidecar with valid signature received for the tuple `(sidecar.slot, sidecar.proposer_index, sidecar.index)`.
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate. The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now. It's using the proposer boost (and if low attestation weight so they know it will likely work).

Hmm, I'm okay putting this in here but maybe should merge the other orphaning spec.
Also the language probably needs to be "MAY attempt to orphan"

CC: @michaelsproul, the re-org god

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using the proposer boost

I believe lighthouse calls it proposer reorg which in some ways is the opposite of proposer boost: this does not appear in the spec anywhere but rather is a lighthouse-only feature as of now (afair) in which lighthouse validators penalize nodes that publish late blocks.

The open question here is whether we need the collaboration of other nodes to pull this off: do they have to keep track of the duplicate blobs and "blacklist" the offending block as well or is it enough that the next proposer does so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposer re-orgs rely on the proposer-boost being larger than the weight of the attestations that the previous proposer managed to get (e.g. low because they released block late).

A few things

  1. I don't think this really belongs in the networking spec at all. This is a fork-choice capability and a validator option
  2. The obvious way to map that re-org functionality here is if the duplicate blobs reduce the ability of the block to garner attestations, leaving it re-orgable by proposer-boost weight at the next slot. I'm generally okay with making this into the fc/validator spec
  3. Doing this in a coordinated fashion (rather than just by the leader in clear times that they can) seems much more dangerous and likely to lead to network splits, missed attestations, and other bad things. So I'm not really open to "everyone can try to re-org it" unless we have a very clear how and a clear analysis on the chances, the impact, the potential losses from honest folks, etc.

My preference for this PR would be to remove this line and move this convo into and issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specs/deneb/p2p-interface.md Show resolved Hide resolved
- _[IGNORE]_ The blob's block's parent (defined by `sidecar.block_parent_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue blocks for processing once the parent block is retrieved).
- _[REJECT]_ The proposer signature, `signed_blob_sidecar.signature`, is valid with respect to the `sidecar.proposer_index` pubkey.
- _[IGNORE]_ The sidecar is the only sidecar with valid signature received for the tuple `(sidecar.slot, sidecar.proposer_index, sidecar.index)`.
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate. The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate. The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple.
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a potential valid blob to propagate.
-- The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put this note about clearing the seen cache? We have the same problem with attestations and blocks today. IIUC this cache and IGNORE condition can slow down propagation, but IWANT/IHAVE and subsequently querying for message dependencies will still allow getting such messages even if the cache here hits.

Curious your perspective on this before merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same problem with attestations and blocks today.

In the past, we've considered it "risky" enough for the proposer to try to poison this seen cache because slashing conditions.

Now that I think about it, maybe the most simple solution, rather than playing these "first" games, is to remove the condition altogether - we want duplicates to propagate to a certain extent so they reach the validators that are in a position to enforce the rules.

I've opened #3257 which covers blocks and attestations.

The rule here is a bit more nuanced in that there's no actual slashing message to rely on - instead, we would rely on the "softer" enforcement of having the next honest proposer orphan such a history. Having written #3257, I think the best course of action is actually to remove the rule altogether and allow implementations to do spam prevention measures individually, for example by limiting to N observed tuples or IGNORE:ing duplicates outside of a much tighter 4s window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but IWANT/IHAVE and subsequently querying for message dependencies

IGNORE messages are also not given to the IWANT message cache, they're gone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think we should clear the seen cache. In the event of a malicious proposer, message dependencies are the best path forward, imo. Gossip will be much slower for such a dishonest proposer, potentially leading to orphaning, but due to dependencies not a permanent split

note, getting a block from a peer doesn't necessarily mean they have the correct blobs, but getting an attestation or child block from a peer does. So these dependency messages become signals on who to download the blobs from and won't lead to unresolvable partitions.

Additionally, if you don't get any dependency messages, then the block will be orphaned. which was at the direct cause of the proposer trying to dos the blobs topic.

So I'd say, keep the cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to keep the cache as we do in other message types and to take this to an issue for discussion. Instead of blocking on this to merge the bulk of the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

54d2559 - good point about the block likely being orphaned anyway - let's hope that is discouragement enough.

@asn-d6
Copy link
Contributor

asn-d6 commented Feb 17, 2023

@arnetheduck I fixed conflicts and tests here: https://github.com/ethereum/consensus-specs/tree/pr3244-fix CI is green now. Let me know if I can push the commits or if you'd like to cherry-pick it.

@asn-d6 @dankrad, it includes some polynomial-commitments.md fixes in a7e45db

LGTM!

- _[IGNORE]_ The sidecar is from a slot greater than the latest finalized slot -- i.e. validate that `sidecar.slot > compute_start_slot_at_epoch(state.finalized_checkpoint.epoch)`
- _[IGNORE]_ The blob's block's parent (defined by `sidecar.block_parent_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue blocks for processing once the parent block is retrieved).
- _[REJECT]_ The proposer signature, `signed_blob_sidecar.signature`, is valid with respect to the `sidecar.proposer_index` pubkey.
- _[IGNORE]_ The sidecar is the only sidecar with valid signature received for the tuple `(sidecar.slot, sidecar.proposer_index, sidecar.index)`.
Copy link
Contributor

@terencechain terencechain Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing this, any thoughts on using (sidecar.block_root, sidecar.index). This is an implementation detail but I find that nicer to be able to share the same cache for serving BlobIdentifier requests.

EDIT: this won't work on equivocation for blocks with different contents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: this won't work on equivocation for blocks with different contents

this would be slashable, if done by the same proposer, so it seems fine - or is there another case you're thinking of?

* also, use root/index for uniqueness
Co-authored-by: g11tech <develop@g11tech.io>
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deneb was called: eip-4844
Projects
None yet
Development

Successfully merging this pull request may close these issues.