-
Notifications
You must be signed in to change notification settings - Fork 840
Firewood static sync #4361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Firewood static sync #4361
Conversation
|
This PR has become stale because it has been open for 30 days with no activity. Adding the |
34c131b to
5bed3e3
Compare
5bed3e3 to
2375dd7
Compare
x/firewood/sync_test.go
Outdated
|
|
||
| require.NoError(syncer.Start(ctx)) | ||
| err = syncer.Wait(ctx) | ||
| if errors.Is(err, xsync.ErrFinishedWithUnexpectedRoot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most common error I encountered while trying to make this work as a timeout, so I wonder if making a context with a timeout to log differences would be more beneficial than letting it stall for 2 minutes without helpful logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of a great way of doing this - if we use our own timeout logic we don't respect the runtime's testing timeout and although t.Deadline exists trying to fire something right before the timeout is race-y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I'll leave it for now, and see if we run into more issues like this in the future.
x/firewood/sync_db.go
Outdated
| } | ||
|
|
||
| // Commit the range proof to the database. | ||
| func (db *syncDB) CommitRangeProof(_ context.Context, start, end maybe.Maybe[[]byte], proof *RangeProof) (maybe.Maybe[[]byte], error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't look to closely at this function. it's all a little janky until Firewood has a proper FindNextKey implementation, in which case we don't have to copy the byte slice, and we don't have to analyze the returned values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements initial Firewood static sync functionality by integrating Firewood with the xsync.Manager. The implementation enables creating range proofs and syncing database state, though change proofs remain unimplemented stubs.
Key Changes:
- Added xsync.DB interface implementation for Firewood with range proof support
- Introduced EmptyRoot configuration parameter to handle empty state checks
- Implemented comprehensive sync testing with various database sizes
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| x/sync/network_server.go | Added compile-time interface checks for handler types |
| x/sync/manager.go | Added EmptyRoot config field and updated empty state checks |
| x/firewood/sync_test.go | Added comprehensive sync tests for empty and populated databases |
| x/firewood/sync_db.go | Implemented xsync.DB interface with range proof support and change proof stubs |
| x/firewood/proof.go | Added proof marshaling/unmarshaling implementations |
| go.mod | Changed firewood-go-ethhash/ffi from indirect to direct dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return nil, errors.New("not implemented") | ||
| } | ||
|
|
||
| type ChangeProof struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We could remove this struct definition and satisfy the generic w/ struct{} in places where this is used to avoid prematurely introducing this type - but the intent of this type is pretty clear so it's also fine to leave this as a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider that, but it felt more intuitive to provide the infrastructure here, since we need some dummy marshaller anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need the dummy marshaler but this type isn't needed - we don't implement change proofs so struct{}{} works as a sentinel value (and actually this type being exposed out of firewood/syncer suggests that they are implemented).
x/firewood/proof.go
Outdated
| type RangeProof struct { | ||
| ffi *ffi.RangeProof |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we went with ffi as a name to make it clear that we're talking to the ffi - but although it makes clear the domain boundary we're communicating across it makes it more unclear what the type we're working with is (r.ffi.MarshalBinary() reads like we're serializing the ffi).
Some recommendations for other names would be:
rangeProof- verbose, defaulting to the type name is never badffiRangeProof- if you want to highlight that we're working with the ffi's version of this domain typeinner- since this is a wrapper patternrorrp- Can be used for shorter scoped code and since this a wrapper pattern we can argue that the context is given by the wrapping type sor.r.MarshalBinary()is clearly us marshaling the type we wrap.
If I was the author I would probably have used r - but this comment is entering opinionated territory so don't feel bound to my preferences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that is unclear. I'll choose rp out of preference, but I don't mind any of those names
| work.start, | ||
| work.end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a separate diff/bugfix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just unnecessary to convert from proto when we have it right there. So yes? I forgot this was in the diff, I can remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid unrelated changes where possible - I'm open to merging this in separately but we want to keep PRs scoped to a logical change. As an example reverting a feature by reverting the commit inadvertently also reverting unrelated cleanups means that we now need to manually revert a diff.
| SimultaneousWorkLimit int | ||
| Log logging.Logger | ||
| TargetRoot ids.ID | ||
| EmptyRoot ids.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is firewood's value for the empty root? Is it not just 32 zeroes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ethereum's empty root is not 32 zeros. An empty trie has the following hash:
https://github.com/ava-labs/libevm/blob/749b6cefda2894bab9bbdbe2d27086d57ca0d393/core/types/hashes.go#L27
| return nil, errors.New("change proofs are not implemented") | ||
| } | ||
|
|
||
| //nolint:revive // TODO: implement this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What linter are we breaking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's unused variables. I left them for clarity, so you don't have to go look at the interface definition at change-proof time to know what ids.ID is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should listen to the linter here - it doesn't matter what the parameters are/mean because they're not used and this code is clear/self-documenting enough to convey this to the reader
| type Config struct { | ||
| RangeProofClient *p2p.Client | ||
| ChangeProofClient *p2p.Client | ||
| SimultaneousWorkLimit int | ||
| Log logging.Logger | ||
| TargetRoot ids.ID | ||
| StateSyncNodes []ids.NodeID | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x/sync's Config type is abused as a DI container + a go config pattern - can we abstract over this to avoid leaking the bad implementation into this package? The proof clients are really dependencies of the syncer and the TargetRoot is really a parameter for which to start the syncer - I feel like we could probably do something like have Config know about stuff with safe zero values that have reasonable defaults to make tests easier to write + each VM integrating with this won't have to re-define their own defaults (which will all probably be the same). I think SimultaneousWorkLimit and StateSyncNodes have reasonable default values, and the logger can default to not logging. We could also pass the metrics instance through the config and the default can be a new registry - only prod cares to configure to use its own metrics instance.
TargetRoot i actually think is not a true config value because as part of state sync we need to define the point to sync to - so I think the caller should be explicitly providing this through a parameter. Similarly I think there aren't reasonable default values for the network clients so those should be defined explicitly as well.
| StateSyncNodes []ids.NodeID | ||
| } | ||
|
|
||
| func NewSyncer(fw *ffi.Database, config Config, register prometheus.Registerer) (*xsync.Syncer[*RangeProof, *ChangeProof], error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nit:
fwis deviating from the abstractions offfiandDatabase- would calling this something likedbmake more sense? Similarlyregistermight want to beregistererorr. - Locality for
NewSyncerandConfig's definition might want to be after all of the function definitions ondbor beforedb's definition - currently this locality reads like a constructor fordb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the file name here? Maybe syncer?
| // No error indicates the range is complete. | ||
| return maybe.Nothing[[]byte](), err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although semantically correct - this is not clear/deviates from Go's typical pattern of error handling where err typically implies a non-nil error. This can also cause unexpected behavior if the ffi returns a non-nil next key range and a non-nil error (we shouldn't because it breaks the aforementioned idiom of a meaningful error and a meaningful return value, but our repository breaks this idiom without good reason at times). At any rate - I would break this into two separate if's for clarity.
| return nil, errors.New("change proofs are not implemented") | ||
| } | ||
|
|
||
| //nolint:revive // TODO: implement this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should listen to the linter here - it doesn't matter what the parameters are/mean because they're not used and this code is clear/self-documenting enough to convey this to the reader
| for _, serverSize := range []int{0, 1, 1_000, 10_000, 100_000} { | ||
| for _, clientSize := range []int{0, 1_000} { | ||
| t.Run(fmt.Sprintf("numKeys=%d_clientKeys=%d", serverSize, clientSize), func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works but it's deviating from the idiomatic table pattern - could we use a testing table instead? They tend to be easier to maintain over time and are more easily readable and don't require the string formatting that we're doing.
| SimultaneousWorkLimit: 5, | ||
| Log: logging.NoLog{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my earlier comment about defining some defaults - but it would simplify this test setup
| ctx := context.WithoutCancel(t.Context()) | ||
| ctx, cancel := context.WithTimeout(ctx, 5*time.Second) // allow some time for garbage collection | ||
| defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this racey? If Close takes more than 5 seconds we can get a false positive by canceling early. Isn't it safe for us to timeout while closing because these are temporary db instances for testing purposes? If we mess up closure subsequent runs of the tests should not be impacted.
If there is a memory management bug in firewood - it's also not something that the syncer tests should care about anyways - it's an implementation detail of firewood. If we need better ways of debugging it I think we would want to add tests on Close closer to the ffi/firewood implementation than adding complexity to the syncer tests.
| t.Logf("%d keys missing from DB1 starting with %x", missingCount, prevKey) | ||
| } | ||
|
|
||
| t.Logf("DB1 had %d keys, DB2 had %d keys", count1, count2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update some logs/naming of db1/db2 to be something like got and want to make it clear what the expected value was?
| work.start, | ||
| work.end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid unrelated changes where possible - I'm open to merging this in separately but we want to keep PRs scoped to a logical change. As an example reverting a feature by reverting the commit inadvertently also reverting unrelated cleanups means that we now need to manually revert a diff.
| return nil, errors.New("not implemented") | ||
| } | ||
|
|
||
| type ChangeProof struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need the dummy marshaler but this type isn't needed - we don't implement change proofs so struct{}{} works as a sentinel value (and actually this type being exposed out of firewood/syncer suggests that they are implemented).
Why this should be merged
This is the first of many PRs to enable state-syncing Firewood in the EVM. Although not in the perfect state to be used, this marks being able to create the
xsync.Managerobject with range proofs, and this functionality is tested.How this works
Creates the interfaces needed to be a
xsync.DB, with stubs for everything change-proof related. Additionally, the FFI'sFindNextKeysimply returns the last key in the range proof, so there's some hacky stuff to make that workHow this was tested
New UT
Need to be documented in RELEASES.md?
No