Add Holocene design doc #72

sebastianst · 2024-09-04T22:42:36Z

Description

Holocene design doc, to align on open questions.

Additional context

After getting alignment, the specs can be completed. They are currently in draft at ethereum-optimism/specs#357

This has been reviewed in a public design review session on 2024-09-11, with a public recording.

protocol/strict-derivation.md

ajsutton · 2024-09-04T23:51:53Z

protocol/strict-derivation.md

+Writing this, I realize that this mechanism could even be used to encode large spans of empty
+batches, as long as the sequencer is creating unsafe blocks that follow the same L1 origins as the
+auto-derivation would for gaps. However, this would need to be investigated more deeply.


We will need to very carefully specify the rules for L1 origin selection when inserting a deposit-only block prior to the sequencing window elapsing. There's quite a lot of potential corner cases around that.

Yes this is also a concern I have, and is probably the most underspecified bit. Random thoughts around that:

A simple default rule could be to generate blocks in a way to maintain a steady L1/L2 block ratio, e.g. bumping the L1 origin selection every 6 blocks in the case of mainnet.

Edge case: we were already very near the sequencer drift limit, and need to select a new origin faster.

Edge case: L1 missed a slot, and the L1 origin cannot advance as expected.

So maybe a better rule is to first eagerly advance the L1 origin as quickly as possible, and only if a newer L1 origin isn't available, keep it. This solves for missed slots and will implicitly and automatically maintain a good L1/L2 block ratio. We then just need a clear definition of "L1 origin is/isn't available".

To mimic sequencer behavior, and avoid being hit by shallow L1 reorgs, we could add an in-protocol L1 validation depth. So eagerly advancing the L1 origin while maintaining a timestamp distance of this validation depth times the L1 bock time.

I'll add this to the design doc as an open design question and proposal for Steady Batch Derivation.

ajsutton · 2024-09-04T23:56:57Z

protocol/strict-derivation.md

+invalid batches will be derived as deposit-only blocks. So in case of a reorg, the batcher should
+e.g. wait on the sequencer it is connected to until it has derived all blocks from L1 in order to
+only start batching new blocks on top of the possibly derived deposit-only chain segment.


It seems very problematic that an L1 reorg might cause deposit-only blocks to be included. That would trigger a reorg of the entire unsafe L2 chain and break full nodes. If the batcher is faulty we don't need to give it a "second chance" to submit a valid batch, but are there cases where a L1 reorg could cause previously submitted batches to now be invalid and these new rules cause the reorg to be larger than it would be just based on the L1 reorg causing changed block origins?

Thinking through this a bit, I'm having trouble deciding if this is or isn't a problem.

If an L1 reorg occurs, it only affects the L2 if a batch is wiped out.

If a batch is wiped out, then the batcher's nonce is reverted. Per Seb's proposal "a fixed nonce to block-range assignment", the batcher wants to resubmit on the same range as before, so there isn't a risk of the batcher posting a batch-gap (which would create empty blocks).

🤔 🤔 🤔 but I'm not sure that that's the only way your edge case would present

This is a valid concern, and has been flagged by @tynes in his original design doc's risk section.

The current rules of Steady Batch Derivation can indeed lead to a previously valid submitted batch to now be invalid, and then immediately be derived as deposit-only blocks, causing a long L2 unsafe reorg. The same batcher tx may still be included on L1. However, given that span batches are only forward-invalidated with Holocene, I think the L2 unsafe reorg would be limited to the L2 section that references the reorged-out L1 section. However, more batcher txs that might already have landed on L1 as well would cause more deposit-only blocks to be derived.

I think this is the tradeoff of Steady Batch Derivation.

One solution that I can think of that may alleviate this problem is to reference the last L1 origin in the channel metadata (in a new channel format), and then drop the channel directly in the channel bank before even decoding any batches from it, that would then at some point be derived as deposit-only blocks. This way, the batcher could get a "second chance" to submit a channel that includes the correct reorged-to L1 origin chain. This is very similar to how span batches contain the last L1 origin as l1_origin_check in their prefix, just moved one layer up to the channel container. With such a new channel format, the L1 origin check could arguably be dropped from span batches. Having the channel, rather than any span or singular batch, contain such a L1 origin check has the advantage that the DP wouldn't too eagerly derive deposit-only blocks, and to recover from L1 reorgs. I think this solution would also still maintain the nice properties of Strict Batch Ordering, that there's only one staging channel, and that we don't buffer out of order frames or batches.

If Proofs and Interop experts can confirm that such a solution would still lead to the sought after improvements for Proofs and Interop, resp., we could consider including it as part of Holocene.

Added this to the design doc in a slightly modified way.

Instead of doing this channel L1 origin check, actually just throw away span batches with failing L1 origin check, but give a second chance to be replaced, and don't generate deposit-only blocks for the whole span batch range.

To fully benefit from this, only decode the prefix first, do the check.

ajsutton · 2024-09-05T00:06:00Z

protocol/strict-derivation.md

+I see two options on how to handle a non-empty batch queue at this point:
+- Option 1: Drop future batches, continue resolving undecided batches, if any are left, and apply
+new Holocene rules.
+- Option 2: The Batch Queue will just start applying Holocene rules from this moment onwards. This will then


Option 3: When the L1 origin reaches the Holocene activation block, discard all batches in the batch queue. Could also be applied to the Channel Bank to give us a nice clean starting point.

op-batcher would have to be aware of that rule and consider any blocks it submitted in channels that didn't close prior to the holocene activation block as needing to be resubmitted.

Thanks for flagging, I also thought about this option, but deemed it too drastic. Hearing that you support it makes me consider it! I like its simplicity.

We just need to be careful that the batcher doesn't end up in a very unlikely situation where we're batching with calldata and have a single block that needs to be sent over two calldata frame txs that span across the Holocene activation.

Even if we did find ourselves in that situation, discarding that in-progress submission would work right? It would need to be retried after the threshold.

Or, we could follow Adrian's suggestion and drop all batches, but do it some time before holocene activation, holding a ban on batches until holocene passes. This would allow for the unlikely big-block to resolve prior to passing the threshold.

Yes I think it is reasonable to implement some sort of safety behavior for the batcher close to the Holocene activation. We didn't do a good job of adding special hardfork activation logic to the batcher in the past but I think past batcher operating experience has shown that it may be worthwhile adding some.

Made this the new and preferred Option 1.

ajsutton · 2024-09-05T00:08:06Z

protocol/strict-derivation.md

+Another open question is how to handle span batches that come from pre-Holocene channels.
+I propose that, for simplicity of implementation, if they are found to be valid as a span batch, to
+just apply the new Partial Span Batch Validity rules even though those span batches were derived
+from pre-Holocene L1 blocks.


This would only apply to channels that span the Holocene activation right? Anything full pre-holocene would use the old rules and anything full post-Holocene would use the new rules. If so I agree, though it wouldn't be an issue if we discarded any incomplete channels at Holocene activation.

It would technically also apply to full pre-Holocene span batches, if the channel that contained this span batch was already closed before Holocene activation, but included on L1 only after Holocene activation.

Since correct batcher implementations don't violate this property anyways, I just want to pick the rule with the lowest implementation lift.

protocol/strict-derivation.md

axelKingsley · 2024-09-05T14:28:19Z

protocol/strict-derivation.md

+
+## Out of order frames
+
+There's an open design space around how to handle some scenarios for missing or out of order frames:


I think the causes of missing and out-of-order frames currently is in batcher restarts, is that correct? The batcher submits some frames of a channel, but then crashes. Upon restart, a new channel is created.

Yes, batcher restarts can in theory cause the batcher to not close an open channel and then submit a new one.
Then there's also the scenario where a batcher cannot get a tx on-chain and requeues the blocks and then attempts sending them again in a new channel. Depending on the specifics of the batcher implementation, this can also lead to out of order frames. All quite unlikely and almost never happening, but we need to take extra care to harden the batcher against such behavior in the future.

protocol/strict-derivation.md

axelKingsley · 2024-09-05T15:25:09Z

protocol/strict-derivation.md

+invalid batches will be derived as deposit-only blocks. So in case of a reorg, the batcher should
+e.g. wait on the sequencer it is connected to until it has derived all blocks from L1 in order to
+only start batching new blocks on top of the possibly derived deposit-only chain segment.


Thinking through this a bit, I'm having trouble deciding if this is or isn't a problem.

If an L1 reorg occurs, it only affects the L2 if a batch is wiped out.

If a batch is wiped out, then the batcher's nonce is reverted. Per Seb's proposal "a fixed nonce to block-range assignment", the batcher wants to resubmit on the same range as before, so there isn't a risk of the batcher posting a batch-gap (which would create empty blocks).

🤔 🤔 🤔 but I'm not sure that that's the only way your edge case would present

axelKingsley · 2024-09-05T15:26:56Z

protocol/strict-derivation.md

+
+Note that the new strict ordering rules of the batch queue will always lead to an empty batch queue
+when the origin of the derivation pipeline progresses to the next L1 block (what about
+`undecided` though?).


I am not familiar with undecided, and I see another reference to it below. Can you explain how these behave briefly?

The undecided status is still not 💯 clear to me, which is why I added those questions in brackets. What I understand from the implementation (files batches.go and batch_queue.go) is that if there's missing L1 or L2 data (e.g. due to temporarily broken L1 or L2 connections?), the batch is re-queued for later checking and no batch is processed at this point. I think we will just retain this behavior.

The difference to future is that a future batch is already determined to be out of order and lies in the future. The future case (a gap) will be treated differently in Holocene, and the gap will immediately be derived as deposit-only blocks. It is noteworthy that the future batch can then only be valid if it built on top of a gap of empty batches.

Added this to the design doc.

axelKingsley · 2024-09-05T15:30:18Z

protocol/strict-derivation.md

+I see two options on how to handle a non-empty batch queue at this point:
+- Option 1: Drop future batches, continue resolving undecided batches, if any are left, and apply
+new Holocene rules.
+- Option 2: The Batch Queue will just start applying Holocene rules from this moment onwards. This will then


Even if we did find ourselves in that situation, discarding that in-progress submission would work right? It would need to be retried after the threshold.

Or, we could follow Adrian's suggestion and drop all batches, but do it some time before holocene activation, holding a ban on batches until holocene passes. This would allow for the unlikely big-block to resolve prior to passing the threshold.

protocol/strict-derivation.md

tynes · 2024-09-05T21:56:14Z

One thing to consider here is that we have generally assumed that the batcher will not be malicious and submit batches that are expensive for derivation to process. This assumption can work well for a single chain that has a single sequencer, but in the world of interop the finality of a single chain becomes tied to the finality of another chain. This means that it could be possible for a batcher of a remote chain to influence the cost of proof generation for the local chain. This means that we need to start thinking about untrusted batchers in the context of interop.

There will be some reputation at play since the interop set will be managed by governance, but we should generally strive to minimize the amount of reputation required for security. This just means that we don't need to ship the absolute most denial of service proof thing in the world as the first iteration.

protocol/strict-derivation.md

* spellcheck * define forwards/backwards invalidation * define principle of fastest derivation * define "foreign frame" * Update protocol/strict-derivation.md --------- Co-authored-by: Sebastian Stammler <stammler.s@gmail.com>

protolambda · 2024-09-11T19:42:38Z

protocol/strict-derivation.md

+
+# Partial Span Batch Validity
+
+## Problem Statement + Context


One aspect of the problem that is worth mentioning, although solved by the same partial-validity idea, is that interop fault-proofs require a block to be processed optimistically, and then the proof might later abort on cross-L2 interop-dependencies. After having aborted, the alternative chain continuation should be as straight-forward and minimal as possible, to avoid looping interop dependency checks. Falling back to a deposit-only block, when a batch is invalid, would serve this well.

protocol/strict-derivation.md

protolambda · 2024-09-11T19:57:29Z

protocol/strict-derivation.md

+- Option 1: When the L1 origin reaches the Holocene activation block, discard all frames in the
+  Frame Queue, channels in the Channel Bank, and batches in the Batch Queue.
+  This gives us a nice clean starting point.


👍 clean slate during upgrade is nice. We should add some warnings to the batch-submitter about this though, to prevent operational mistakes.

protocol/strict-derivation.md

sebastianst · 2024-09-11T20:54:05Z

protocol/strict-derivation.md

+The batcher would have to be aware of those rules and consider any blocks it submitted in channels
+that didn't close prior to the Holocene activation block as needing to be resubmitted.


Also need to make sure that there aren't queued up batcher txs that are included after. These would cause gaps, that would auto-derive, then cause L2 reorg. Add special behavior to the batcher. Don't parallelize 1h before holocene, stay on blobs, etc.

sebastianst · 2024-09-11T21:00:56Z

protocol/strict-derivation.md

+The design space and some proposed solutions will be discussed, together with practical implications
+for batcher implementations that have to satisfy the stricter rules.
+
+# Partial Span Batch Validity


TODO: span batch prefix invalidation doesn't lead to auto-derivation, give a new span batch with correct l1 origin check a chance. This protects against L1 reorg induced deep L2 reorgs.

sebastianst · 2024-09-13T11:55:29Z

Further public discussion in Discord made us reconsider the idea of deriving invalid batches as empty batches. The new proposal is to simply drop invalid, and future, batches, and instead only derive invalid payloads at the engine stage as deposit-only payloads. An invalid payload wouldn't trigger the generation of future empty batches, but instead just forward-invalidate any remaining batches and the origin channel.

BlocksOnAChain

I think this design doc is now is "good-enough" state and that we can merge it, so we can move towards the actual development work for holocene.

sebastianst · 2024-09-18T09:55:01Z

I think this design doc is now is "good-enough" state and that we can merge it, so we can move towards the actual development work for holocene.

@BlocksOnAChain
The design changed significantly due to the async discussions after the design session. I want to first finish the spec, then adapt the design doc so it can serve as a historical reference, before we merge it.

BlocksOnAChain · 2024-09-18T14:41:04Z

I think this design doc is now is "good-enough" state and that we can merge it, so we can move towards the actual development work for holocene.

@BlocksOnAChain The design changed significantly due to the async discussions after the design session. I want to first finish the spec, then adapt the design doc so it can serve as a historical reference, before we merge it.

@sebastianst Got it, made sense to me. I was just reviewing the current state, before the last edits and after we chatted on discord. fully agree.

Add Holocene design doc

00eae0f

sebastianst requested review from ajsutton, geoknee, tynes and protolambda September 4, 2024 22:42

clabby reviewed Sep 4, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

protocol/strict-derivation.md Outdated Show resolved Hide resolved

protocol/strict-derivation.md Outdated Show resolved Hide resolved

refcell reviewed Sep 4, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

refcell reviewed Sep 4, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

refcell reviewed Sep 4, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

refcell reviewed Sep 5, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

ajsutton reviewed Sep 5, 2024

View reviewed changes

BlocksOnAChain added documentation Improvements or additions to documentation specs labels Sep 5, 2024

axelKingsley reviewed Sep 5, 2024

View reviewed changes

geoknee reviewed Sep 9, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

protocol/strict-derivation.md Outdated Show resolved Hide resolved

geoknee reviewed Sep 9, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

geoknee and others added 4 commits September 9, 2024 20:18

Add note about span batch forward-invalidation

4d805d5

Address initial async feedback & extend

401eb8c

add comment to L1 origin selection section

3c87bf8

BlocksOnAChain assigned sebastianst Sep 10, 2024

sebastianst added 2 commits September 11, 2024 20:28

improve section on activation

49c0118

improve L1 origin selection section

261d5ea

protolambda approved these changes Sep 11, 2024

View reviewed changes

sebastianst commented Sep 11, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

sebastianst commented Sep 11, 2024

View reviewed changes

protocol/strict-derivation.md Outdated Show resolved Hide resolved

sebastianst commented Sep 11, 2024

View reviewed changes

BlocksOnAChain approved these changes Sep 16, 2024

View reviewed changes

add historical notice, some improvements due to design discussion

28be03b

sebastianst merged commit 85194ab into main Oct 4, 2024

sebastianst deleted the seb/holocene-derivation branch October 4, 2024 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Holocene design doc #72

Add Holocene design doc #72

sebastianst commented Sep 4, 2024 •

edited

Loading

ajsutton Sep 4, 2024

sebastianst Sep 9, 2024 •

edited

Loading

ajsutton Sep 4, 2024

axelKingsley Sep 5, 2024 •

edited

Loading

sebastianst Sep 9, 2024

sebastianst Sep 9, 2024 •

edited

Loading

sebastianst Sep 11, 2024

sebastianst Sep 11, 2024

ajsutton Sep 5, 2024

sebastianst Sep 5, 2024 •

edited

Loading

axelKingsley Sep 5, 2024

sebastianst Sep 9, 2024

sebastianst Sep 9, 2024

ajsutton Sep 5, 2024

sebastianst Sep 9, 2024

axelKingsley Sep 5, 2024

sebastianst Sep 9, 2024

axelKingsley Sep 5, 2024 •

edited

Loading

axelKingsley Sep 5, 2024

sebastianst Sep 9, 2024 •

edited

Loading

axelKingsley Sep 5, 2024

tynes commented Sep 5, 2024

protolambda Sep 11, 2024

protolambda Sep 11, 2024

sebastianst Sep 11, 2024 •

edited

Loading

sebastianst Sep 11, 2024

sebastianst commented Sep 13, 2024

BlocksOnAChain left a comment

sebastianst commented Sep 18, 2024

BlocksOnAChain commented Sep 18, 2024 •

edited

Loading


		## Out of order frames

		There's an open design space around how to handle some scenarios for missing or out of order frames:

		The batcher would have to be aware of those rules and consider any blocks it submitted in channels
		that didn't close prior to the Holocene activation block as needing to be resubmitted.

Add Holocene design doc #72

Add Holocene design doc #72

Conversation

sebastianst commented Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

sebastianst Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axelKingsley Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianst Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianst Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axelKingsley Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianst Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tynes commented Sep 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianst Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianst commented Sep 13, 2024

BlocksOnAChain left a comment

Choose a reason for hiding this comment

sebastianst commented Sep 18, 2024

BlocksOnAChain commented Sep 18, 2024 • edited Loading

sebastianst commented Sep 4, 2024 •

edited

Loading

sebastianst Sep 9, 2024 •

edited

Loading

axelKingsley Sep 5, 2024 •

edited

Loading

sebastianst Sep 9, 2024 •

edited

Loading

sebastianst Sep 5, 2024 •

edited

Loading

axelKingsley Sep 5, 2024 •

edited

Loading

sebastianst Sep 9, 2024 •

edited

Loading

sebastianst Sep 11, 2024 •

edited

Loading

BlocksOnAChain commented Sep 18, 2024 •

edited

Loading