Skip to content

Conversation

@alarso16
Copy link
Contributor

@alarso16 alarso16 commented Sep 29, 2025

Why this should be merged

This is the first of many PRs to enable state-syncing Firewood in the EVM. Although not in the perfect state to be used, this marks being able to create the xsync.Manager object with range proofs, and this functionality is tested.

How this works

Creates the interfaces needed to be a xsync.DB, with stubs for everything change-proof related. Additionally, the FFI's FindNextKey simply returns the last key in the range proof, so there's some hacky stuff to make that work

How this was tested

New UT

Need to be documented in RELEASES.md?

No

@alarso16 alarso16 moved this to In Progress 🏗️ in avalanchego Sep 29, 2025
@alarso16 alarso16 removed this from avalanchego Sep 29, 2025
@alarso16 alarso16 linked an issue Sep 29, 2025 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Nov 2, 2025

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

@github-actions github-actions bot added the lifecycle/stale Inactive for 60 days label Nov 2, 2025
@alarso16 alarso16 added lifecycle/frozen and removed lifecycle/stale Inactive for 60 days labels Nov 3, 2025
@alarso16 alarso16 force-pushed the alarso16/firewood-range-proofs branch from 34c131b to 5bed3e3 Compare November 19, 2025 19:04
@alarso16 alarso16 force-pushed the alarso16/firewood-range-proofs branch from 5bed3e3 to 2375dd7 Compare November 25, 2025 21:54
@alarso16 alarso16 added testing This primarily focuses on testing go Pull requests that update Go code coreth labels Nov 25, 2025
@alarso16 alarso16 added storage This involves storage primitives and removed testing This primarily focuses on testing lifecycle/frozen labels Dec 5, 2025
@alarso16 alarso16 self-assigned this Dec 5, 2025

require.NoError(syncer.Start(ctx))
err = syncer.Wait(ctx)
if errors.Is(err, xsync.ErrFinishedWithUnexpectedRoot) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most common error I encountered while trying to make this work as a timeout, so I wonder if making a context with a timeout to log differences would be more beneficial than letting it stall for 2 minutes without helpful logs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of a great way of doing this - if we use our own timeout logic we don't respect the runtime's testing timeout and although t.Deadline exists trying to fire something right before the timeout is race-y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'll leave it for now, and see if we run into more issues like this in the future.

}

// Commit the range proof to the database.
func (db *syncDB) CommitRangeProof(_ context.Context, start, end maybe.Maybe[[]byte], proof *RangeProof) (maybe.Maybe[[]byte], error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't look to closely at this function. it's all a little janky until Firewood has a proper FindNextKey implementation, in which case we don't have to copy the byte slice, and we don't have to analyze the returned values.

@alarso16 alarso16 marked this pull request as ready for review December 8, 2025 21:24
@alarso16 alarso16 requested a review from joshua-kim as a code owner December 8, 2025 21:24
Copilot AI review requested due to automatic review settings December 8, 2025 21:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements initial Firewood static sync functionality by integrating Firewood with the xsync.Manager. The implementation enables creating range proofs and syncing database state, though change proofs remain unimplemented stubs.

Key Changes:

  • Added xsync.DB interface implementation for Firewood with range proof support
  • Introduced EmptyRoot configuration parameter to handle empty state checks
  • Implemented comprehensive sync testing with various database sizes

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
x/sync/network_server.go Added compile-time interface checks for handler types
x/sync/manager.go Added EmptyRoot config field and updated empty state checks
x/firewood/sync_test.go Added comprehensive sync tests for empty and populated databases
x/firewood/sync_db.go Implemented xsync.DB interface with range proof support and change proof stubs
x/firewood/proof.go Added proof marshaling/unmarshaling implementations
go.mod Changed firewood-go-ethhash/ffi from indirect to direct dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alarso16 alarso16 moved this to In Progress 🏗️ in avalanchego Dec 9, 2025
return nil, errors.New("not implemented")
}

type ChangeProof struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could remove this struct definition and satisfy the generic w/ struct{} in places where this is used to avoid prematurely introducing this type - but the intent of this type is pretty clear so it's also fine to leave this as a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that, but it felt more intuitive to provide the infrastructure here, since we need some dummy marshaller anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need the dummy marshaler but this type isn't needed - we don't implement change proofs so struct{}{} works as a sentinel value (and actually this type being exposed out of firewood/syncer suggests that they are implemented).

Comment on lines 42 to 43
type RangeProof struct {
ffi *ffi.RangeProof
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we went with ffi as a name to make it clear that we're talking to the ffi - but although it makes clear the domain boundary we're communicating across it makes it more unclear what the type we're working with is (r.ffi.MarshalBinary() reads like we're serializing the ffi).

Some recommendations for other names would be:

  • rangeProof - verbose, defaulting to the type name is never bad
  • ffiRangeProof - if you want to highlight that we're working with the ffi's version of this domain type
  • inner - since this is a wrapper pattern
  • r or rp - Can be used for shorter scoped code and since this a wrapper pattern we can argue that the context is given by the wrapping type so r.r.MarshalBinary() is clearly us marshaling the type we wrap.

If I was the author I would probably have used r - but this comment is entering opinionated territory so don't feel bound to my preferences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that is unclear. I'll choose rp out of preference, but I don't mind any of those names

Comment on lines +520 to +521
work.start,
work.end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a separate diff/bugfix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just unnecessary to convert from proto when we have it right there. So yes? I forgot this was in the diff, I can remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid unrelated changes where possible - I'm open to merging this in separately but we want to keep PRs scoped to a logical change. As an example reverting a feature by reverting the commit inadvertently also reverting unrelated cleanups means that we now need to manually revert a diff.

SimultaneousWorkLimit int
Log logging.Logger
TargetRoot ids.ID
EmptyRoot ids.ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is firewood's value for the empty root? Is it not just 32 zeroes?

Copy link
Contributor Author

@alarso16 alarso16 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ethereum's empty root is not 32 zeros. An empty trie has the following hash:
https://github.com/ava-labs/libevm/blob/749b6cefda2894bab9bbdbe2d27086d57ca0d393/core/types/hashes.go#L27

return nil, errors.New("change proofs are not implemented")
}

//nolint:revive // TODO: implement this method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What linter are we breaking here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's unused variables. I left them for clarity, so you don't have to go look at the interface definition at change-proof time to know what ids.ID is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should listen to the linter here - it doesn't matter what the parameters are/mean because they're not used and this code is clear/self-documenting enough to convey this to the reader

@alarso16 alarso16 requested review from joshua-kim and rkuris December 9, 2025 22:05
@github-actions github-actions bot removed the coreth label Dec 12, 2025
@JonathanOppenheimer JonathanOppenheimer added the coreth Related to the former coreth standalone repository label Dec 15, 2025
@alarso16 alarso16 removed the coreth Related to the former coreth standalone repository label Dec 15, 2025
Comment on lines +32 to +39
type Config struct {
RangeProofClient *p2p.Client
ChangeProofClient *p2p.Client
SimultaneousWorkLimit int
Log logging.Logger
TargetRoot ids.ID
StateSyncNodes []ids.NodeID
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x/sync's Config type is abused as a DI container + a go config pattern - can we abstract over this to avoid leaking the bad implementation into this package? The proof clients are really dependencies of the syncer and the TargetRoot is really a parameter for which to start the syncer - I feel like we could probably do something like have Config know about stuff with safe zero values that have reasonable defaults to make tests easier to write + each VM integrating with this won't have to re-define their own defaults (which will all probably be the same). I think SimultaneousWorkLimit and StateSyncNodes have reasonable default values, and the logger can default to not logging. We could also pass the metrics instance through the config and the default can be a new registry - only prod cares to configure to use its own metrics instance.

TargetRoot i actually think is not a true config value because as part of state sync we need to define the point to sync to - so I think the caller should be explicitly providing this through a parameter. Similarly I think there aren't reasonable default values for the network clients so those should be defined explicitly as well.

StateSyncNodes []ids.NodeID
}

func NewSyncer(fw *ffi.Database, config Config, register prometheus.Registerer) (*xsync.Syncer[*RangeProof, *ChangeProof], error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. nit: fw is deviating from the abstractions of ffi and Database - would calling this something like db make more sense? Similarly register might want to be registerer or r.
  2. Locality for NewSyncer and Config's definition might want to be after all of the function definitions on db or before db's definition - currently this locality reads like a constructor for db.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the file name here? Maybe syncer?

Comment on lines +97 to +98
// No error indicates the range is complete.
return maybe.Nothing[[]byte](), err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although semantically correct - this is not clear/deviates from Go's typical pattern of error handling where err typically implies a non-nil error. This can also cause unexpected behavior if the ffi returns a non-nil next key range and a non-nil error (we shouldn't because it breaks the aforementioned idiom of a meaningful error and a meaningful return value, but our repository breaks this idiom without good reason at times). At any rate - I would break this into two separate if's for clarity.

return nil, errors.New("change proofs are not implemented")
}

//nolint:revive // TODO: implement this method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should listen to the linter here - it doesn't matter what the parameters are/mean because they're not used and this code is clear/self-documenting enough to convey this to the reader

Comment on lines +27 to +29
for _, serverSize := range []int{0, 1, 1_000, 10_000, 100_000} {
for _, clientSize := range []int{0, 1_000} {
t.Run(fmt.Sprintf("numKeys=%d_clientKeys=%d", serverSize, clientSize), func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works but it's deviating from the idiomatic table pattern - could we use a testing table instead? They tend to be easier to maintain over time and are more easily readable and don't require the string formatting that we're doing.

Comment on lines +50 to +51
SimultaneousWorkLimit: 5,
Log: logging.NoLog{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my earlier comment about defining some defaults - but it would simplify this test setup

Comment on lines +107 to +109
ctx := context.WithoutCancel(t.Context())
ctx, cancel := context.WithTimeout(ctx, 5*time.Second) // allow some time for garbage collection
defer cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this racey? If Close takes more than 5 seconds we can get a false positive by canceling early. Isn't it safe for us to timeout while closing because these are temporary db instances for testing purposes? If we mess up closure subsequent runs of the tests should not be impacted.

If there is a memory management bug in firewood - it's also not something that the syncer tests should care about anyways - it's an implementation detail of firewood. If we need better ways of debugging it I think we would want to add tests on Close closer to the ffi/firewood implementation than adding complexity to the syncer tests.

t.Logf("%d keys missing from DB1 starting with %x", missingCount, prevKey)
}

t.Logf("DB1 had %d keys, DB2 had %d keys", count1, count2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update some logs/naming of db1/db2 to be something like got and want to make it clear what the expected value was?

Comment on lines +520 to +521
work.start,
work.end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid unrelated changes where possible - I'm open to merging this in separately but we want to keep PRs scoped to a logical change. As an example reverting a feature by reverting the commit inadvertently also reverting unrelated cleanups means that we now need to manually revert a diff.

joshua-kim
joshua-kim previously approved these changes Dec 15, 2025
return nil, errors.New("not implemented")
}

type ChangeProof struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need the dummy marshaler but this type isn't needed - we don't implement change proofs so struct{}{} works as a sentinel value (and actually this type being exposed out of firewood/syncer suggests that they are implemented).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code storage This involves storage primitives

Projects

Status: In Progress 🏗️

Development

Successfully merging this pull request may close these issues.

Firewood database in x/sync - Range Proof only

4 participants