-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement getBlobs
and bypass blob gossip validation on successful blob retrievals from EL
#6913
base: unstable
Are you sure you want to change the base?
Conversation
for devnet testing locally, use |
beacon_chain/nimbus_beacon_node.nim
Outdated
when consensusFork >= ConsensusFork.Electra: | ||
# Pull blobs and proofs from the EL blob pool | ||
let blobsFromElOpt = await node.elManager.sendGetBlobs(forkyBlck) | ||
debugEcho "pulled blobs from el" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obligatory comment about debugEcho
(but yeah, it's a draft PR and useful for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for now, until i see things are ok in the local devnet
6ad1b3d
to
03e11ee
Compare
we're finalising with nimbus running |
we can get some better decision making on whether to validate blob gossip or pull blobs from EL every slot, by caching |
getBlobs
getBlobs
and bypass blob gossip validation on successful blob retrievals from EL
https://discordapp.com/channels/595666850260713488/1252403418941624532/1341996653149945927 an interesting mainnet research from Jimmy (Lighthouse) suggests, that there are 0-8 blob misses per minute by the EL, hence we shall only bypass gossip validation for blobs completely when the EL can serve a complete response (i.e, blobs from EL == kzg comms len). an alternative can be done, where for every block we can fetch most blobs from the EL, and the rest from gossip val, but in that case, we'd likely spend some time checking what we missed from EL, which isn't very ideal. |
let block_root = hash_tree_root(block_header) | ||
|
||
let vEl = | ||
await self.validateBlobSidecarFromEL(block_root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like in the case of a pathological EL, this could delay things by 1 second?
7a4060a
to
f1d736e
Compare
el_blob_loss = forkyBlck.message.body.blob_kzg_commitments.len - blobsEl.len | ||
|
||
debug "Time taken to receive partially response from EL", | ||
received_percent = float((blobsEl.len div forkyBlck.message.body.blob_kzg_commitments.len) * 100), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
div
is integer division; this doesn't do what it seems to want, because the float
conversion is pointless. It will return 0 all the time (or 100)
|
||
el_blob_loss = forkyBlck.message.body.blob_kzg_commitments.len - blobsEl.len | ||
|
||
debug "Time taken to receive partially response from EL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial response
(or partially receive response
, whichever seems more appropriate, but the "receive" here seems done and not in progress, so I'd lean toward former)
if blobsEl.len == forkyBlck.message.body.blob_kzg_commitments.len: | ||
|
||
# we have got all blobs from EL, now we can | ||
# conveniently the blobless block from quarantine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verb?
@@ -149,9 +157,41 @@ type | |||
|
|||
ValidationRes* = Result[void, ValidationError] | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
@@ -2095,6 +2098,33 @@ proc installMessageValidators(node: BeaconNode) = | |||
await node.processor.processBlsToExecutionChange( | |||
MsgSource.gossip, msg))) | |||
|
|||
when consensusFork >= ConsensusFork.Electra: | |||
# blob_sidecar_{subnet_id} | |||
# https://github.com/ethereum/consensus-specs/blob/v1.4.0-beta.5/specs/deneb/p2p-interface.md#blob_sidecar_subnet_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newly introduced URLs should be current version (currently v1.5.0-beta.3
)
|
||
for idx, req in requests: | ||
if not(req.finished()): | ||
warn "Timeout while getting blob and proof", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really warn
? It's not necessarily a Nimbus error, and it shouldn't disrupt Nimbus functioning because of the race
. So one could end up with a scary-looking WRN
but no malfunction.
second = if raced == f1b: f2 else: f1 | ||
|
||
try: | ||
let res1 = await first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Returns a future which will complete and return finished FutureBase,
## when one of the given futures will be completed, failed or canceled.
##
## On success returned Future will hold finished FutureBase.
##
## On cancel futures in ``futs`` WILL NOT BE cancelled.
under what circumstance should first
have to be await
ed?
Explicitly:
Returns a future which will complete and return finished FutureBase
That is, one should just be able to call complete()
on it? cc @cheatfate
This might not apply to await second
, because that could still be in progress, but by definition the winner of the race should be complete
able.
if (let o = self.quarantine[].getBlobless(block_root); o.isSome): | ||
let blobless = o.get() | ||
withBlck(blobless): | ||
when consensusFork >= ConsensusFork.Electra: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this isn't true, it's a bug, but also kind of falls off the edge of the control flow of withBlck(blobless)
and implicitly returns some default thing. Better to handle this case explicitly somehow and flag it as an error (doAssert
, even, if it's really a logic error to enter this function with a pre-Electra block, but only if that can be actually guaranteed, otherwise some usual runtime error).
else: | ||
node.dag.cfg.BLOB_SIDECAR_SUBNET_COUNT | ||
# It's safe to try as many times to fetch EL blobs as | ||
# there are blob subnets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be more blobs than blob subnets, so for any given block, each subnet might be triggered multiple times. This doesn't happen in Dencun and won't in Pectra, but in theory it could, so the comment at least is misleading.
The code structure works fine for this -- it will just trigger the toValidationRace(...)
per blob, not per blob subnet.
toValidationResult( | ||
node.processor[].processBlobSidecar( | ||
): Future[ValidationResult] {.async: (raises: [CancelledError]).} = | ||
return toValidationResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chronos doesn't' need the explicit return
anymore for async functions.
ethereum/execution-apis#559
if EL supports
engine_getBlobsV1
else
Note:
getBlobs
innimbus-eth2
here is made to be backward compatible from Electra and NOT Deneb (Deneb is what the spec suggests)Pros
Cons
RequestManager
should be able to fetch back the otherwise malicious blob/proof data sent by the bad EL, EL can be quickly swapped, Nimbus would rely on blob gossip that entire duration and switch back to optimistically pulling blobs from EL.CI failing: because of some nim-web3 upstream incompatibilities