-
Notifications
You must be signed in to change notification settings - Fork 50
feat: retain proofs of non-target nodes in certain edge-cases. #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: retain proofs of non-target nodes in certain edge-cases. #109
Conversation
c95a020
to
bb4e492
Compare
There are specific edge-cases where it's necessary to retain the proofs for nodes which aren't given in the target set, namely leaf removals which result in the deletion of a branch, and leaf additions which result in the creation of a branch. Documentation of each case is provided in the code at the point it is handled. This change will cause more proofs than are strictly necessary to be retained, because the `target` set we are given does not tell us if paths are added, updated or removed. This extra work is made up for by us not needing to fetch the nodes which would otherwise be missing later down the pipeline, and in benchmarking the overall change comes out very slightly faster.
bb4e492
to
cfaaac3
Compare
Fixes #17571 **Background** The Serial/ParallelSparseTrie's have their nodes revealed using the DecodedMultiProofs generated based on the previous database state and the changed leafset. Once revealed, the leaf updates/removals are applied to the sparse tries and root hashes are calculated. There are specific edge-cases during leaf adding/removal where a node which is outside the changeset is required to complete the operation. In these cases we were falling back to a singular database lookup to fill in the gap. By modifying the generation of the DecodedMultiProofs to include those extra required nodes we can simplify the sparse trie implementations significantly, as well as the surrounding engine code which supports these one-off lookups. **Changes** This primarily relies on a change to the ProofRetainer in alloy-trie, made here: alloy-rs/trie#109 These changes are enabled by a flag which the payload validator enables; other places where proofs are generated remain unnaffected. There is a further change here to the trie walker to not skip over branch nodes when they might be involved in a leaf removal. A change to support leaf addition is not required, because that case involves children of extension nodes, and we don't ever skip over extension nodes anyway. The sparse trie code to fallback to the db for missing nodes remains in place for now, but we emit a warning when it happens. This will let us track if there's any edge-cases we're missing. **Benchmarks - 2k blocks on main** No significant change ``` Timestamp: 2025-07-28 10:37:12 UTC Baseline: main Feature: origin/mediocregopher/17571-leaf-updates-removals Performance Changes: NewPayload Latency: -1.24% FCU Latency: -1.74% Total Latency: -1.26% Gas/Second: +1.27% Blocks/Second: +1.27% Baseline Summary: Blocks: 2000, Gas: 36118922440, Duration: 74.02s Avg NewPayload: 35.69ms, Avg FCU: 1.29ms, Avg Total: 36.99ms Started: 2025-07-28 10:48:20 UTC, Ended: 2025-07-28 10:56:39 UTC Feature Summary: Blocks: 2000, Gas: 36118922440, Duration: 73.08s Avg NewPayload: 35.25ms, Avg FCU: 1.27ms, Avg Total: 36.52ms Started: 2025-07-28 10:58:35 UTC, Ended: 2025-07-28 11:06:55 UTC ```
CodSpeed Performance ReportMerging #109 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this makes sense conceptually, I do wonder if it would be worth adding more granular prefixsets to the proof retainer so we don't retain more nodes than we need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, pending @Rjected
graphs are very helpful
src/proof/retainer.rs
Outdated
// When a new leaf is being added to a trie, it can happen that an extension node's | ||
// path is a prefix of the new leaf's, but the extension child's path is not. In | ||
// this case a new branch node is created; the new leaf is one child, the previous | ||
// branch node is its other, and the extension node is its parent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is helpful
wdym? |
Co-authored-by: Dan Cline <6798349+Rjected@users.noreply.github.com>
The ProofRetainer only knows which leaves changed, but not in what way they changed (added, updated, or removed). If we had that more fine-grained information we could return the extra edge-case proofs only precisely when they would be required. As it is, the ProofRetainer only knows when it might need to keep one of these extra nodes, and so it ends up keeping more than are strictly required. |
I would have sworn I tested all this on mainnet already, but when I reran it today I discovered a hole in the logic. It can happen that a branch has three children, two of which are removed leaves and the last which isn't in the target set. In this case we still need the proof of the nontarget child. Same with a 4-child branch where 3 leaves are removed, etc... |
This is now fully tested on mainnet, ran 2k blocks using paradigmxyz/reth#17643 and saw zero requests being made by the sparse tries, indicating that all required trie nodes are being revealed by the multiproof tasks. A big change has been made here, which is the introduction of the The trait is designed so we have a fair amount of flexibility in how we keep track of the keys; ultimately the ProofRetainer only cares about whether certain paths are prefixes of those which have been added/removed. There's still more work which could be done here to narrow down which extra nodes are returned, specifically with regard to leaf removals. If we were to track a stack of masks within the ProofRetainer it might be possible to fully accurately determine which branches will be removed and only retain the extra branch child proof for those cases. Right now we are pessimistically assuming a branch will be removed if all but one of its children are a prefix of a removed leaf, and returning more proofs than are necessary as a result. |
This reverts commit 1e4be6f.
07bff5b
to
263708f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mainly would like to avoid the generic if we can, because it seems like the default type is the only one being used? the logic for retaining based on added / removed keys makes sense to me
pub struct ProofRetainer<K = AddedRemovedKeys> { | ||
/// The nibbles of the target trie keys to retain proofs for. | ||
targets: Vec<Nibbles>, | ||
/// The map retained trie node keys to RLP serialized trie nodes. | ||
proof_nodes: ProofNodes, | ||
/// Provided by the user to give the necessary context to retain extra proofs. | ||
added_removed_keys: Option<K>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have a generic here? I think we could just have this be AddedRemovedKeys
, although if it needs to be shared maybe it should be a reference type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddedRemovedKeys needs to also be read by the TrieWalker during proof generation:
So it definitely cannot just be AddedRemovedKeys
without cloning it for every proof. We could make it instead Option<&'k AddedRemovedKeys>
, but that leaks the 'k
onto the HashBuilder and that kind of infects everything because lifetimes can't have a default. The only other option would be to make this Arc<AddedRemovedKeys>
, as that is what is actually being used by reth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yeah I think an arc makes sense here then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I spoke too soon, it's not as simple as passing an Arc<AddedRemovedKeys>
.
The problem is that in the reth PR we have MultiAddedRemovedKeys which is cloned and put in an Arc, and then that Arc is passed to all the multiproof chunks.
From there a reference to either account or storage AddedRemovedKeys instances are queried from the Arc<MultiAddedRemovedKeys>
(here and here) and passed to the TrieWalker and ProofRetainer/HashBuilder.
So in order for an Arc<AddedRemovedKeys>
to be used here it would be necessary to have something like MultiAddedRemovedKeysMut which is held by the MultiProofTask and has AddedRemovedKeys internally, and then a MultiAddedRemovedKeys which can be created from a MultiAddedRemovedKeysMut, and which has Arc<AddedRemovedKeys>
instances internally.
Imo at that point the generic is the cleaner solution, and more performant because there's only one Arc (for the MultiAddedRemovedKeys) rather than one for every individual AddedRemovedKeys. I'll defer to you though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so is the issue that: we have an Arc
to the higher level MultiAddedRemovedKeys
in reth, and no arc to the AddedRemovedKeys
themselves, we can only obtain regular references. So we need the generic to avoid lifetime leaks, and so we don't need to convert a map, to a map of references every time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we don't need to convert a map, to a map of references every time
to a map of Arcs, but otherwise yes that's right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, yeah it seems like the generic is the best option here
There are specific edge-cases where it's necessary to retain the proofs for nodes which aren't given in the target set, namely leaf removals which result in the deletion of a branch, and leaf additions which result in the creation of a branch.
Documentation of each case is provided in the code at the point it is handled.
This change will cause more proofs than are strictly necessary to be retained, because the
target
set we are given does not tell us if paths are added, updated or removed. This extra work is made up for by us not needing to fetch the nodes which would otherwise be missing later down the pipeline, and in benchmarking the overall change comes out very slightly faster.Partially addresses paradigmxyz/reth#17571