-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specify liveness failsafe for builder network #47
Conversation
b90fdcb
to
b58c93f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth mentioning that "missing slot" is not explicit only to the canonical chain, otherwise, a reorg’ing adversary could trivially disable mev-boost
This will count orphaned blocks as missing, no? |
I feel like the outcome on the call was that we would just stick to the canonical chain as it would be simpler to implement -- just look at the canonical beacon state and to mitigate any sort of reorg'ing attacks we now want to widen the window to trigger the breaker and if the current suggested range for |
I don't have an issue with this. For prysm it's equally simple looking from the canonical chain's perspective or combining canonical and forked chain's perspectives. We'll stick with counting missing slots as they are truly missed. I think client diversity is a nice to have here |
yes, I think the first goal is: "clients have something implemented" if only to dissuade potential actors who would abuse this attack vector the second goal is: "do clients in aggregate implement something that is hard to attack?" -- this is where it matters a bit who is doing what bc if someone has a more sensitive trigger then it could selectively be used to take the builder network offline |
Beacon clients select randomized values from the following ranges when initializing the circuit breaker (so at boot time | ||
and once for each unique boot). | ||
|
||
* `ALLOWED_FAULTS`: between `1` and `SLOTS_PER_EPOCH // 2` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't 1
be too low?
In combination with 2 * SLOT_PER_EPOCH
it will activate the circuit breaker after two epochs with missed slots. Although, it seems unlikely on mainnet but not unrealistic
The circuit breaker becomes active if a liveness fault is observed locally which is determined by considering the beacon | ||
chain over a rolling window of `FAULT_INSPECTION_WINDOW` slots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so basically once the node is 'in sync', it will start counting missed slots, and missed slots exceeding ALLOWED_FAULTS
within a defined window deem that we should stop using the builder network...
Rather than random, it may be more appropriate to have actual configured values to use, that way it could be tweaked so that builder can work on 'test' networks that have more missed blocks...
eg mainnet it might be unusual to see 2 missed blocks in 64 slots, but that's basically business as usual on several testnets...
Doesn't this also suggest that the builder can be taken out through no fault of it's own? If a couple of validators were offline at an inopportune moment, then the builder network could be completely shut down for the period defined by the inspection network?
Edge case would be allowed_faults = 1, fault_inspection_window = 64, then 2 missed slots in 2 epochs would stop you using builders until the first one clears... Then another happens, and you're still locked out, which could result in builder not being used for a prolonged period? So 1 missed block in 32 slots is enough in this case to ensure builder is never used?
#### Failsafe initialization | ||
|
||
When a beacon client first boots, the circuit breaker is instantiated with the liveness parameters and is **not** | ||
active. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I am reading it correctly then: circuit breaker is recommended to be "NOT" active on a start/restart, if thats the case then i feel it should be opposite because its highly likely that in an unhealthy network the nodes will be rebooted and should follow the safest/most conservative route.
This could be client specific but still the recommendation should be reverse.
In terms of reducing the scope of the failure domain in the event of liveness faults on chain, given that https://github.com/remyroy/ethstaker/blob/main/MEV-relay-list.md documents multiple relay networks, it's not obvious that it's more effectively centralized than a still-near-Geth-monoculture on the EL side. The implied calculation is that the most likely inference from a bunch of missed slots is that (a) a large portion of the network is using the builder API; and (b) the builder API is less reliable in a correlated-across-CL-nodes way than the engine API. (a) hasn't yet proven as true as some earlier predictions, and I'm not sure (b) is useful to bet on -- even if one relay network has problems, it's reasonable to try another, and While local EL infrastructure for the engine API is definitely less centralized per se, that doesn't protect against already-witnessed situations where all or most EL instances of a certain type across a network fail to propose good blocks in some situation. Rather, the builder and engine APIs act a kind of diversity in themselves, and given that the former isn't simply a FB interface anymore, responding to network trouble by decreasing decentralization seems risky. |
closing in lieu of #95 |
this PR specifies one version of a "circuit breaker" to reduce the scope of the failure domain in the event of liveness faults on chain
note to implementers: you can just count the distinct number of block roots from the
BeaconState
to get a stream of the proposed inputs ("M slots over a window of N slots")