Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize expiry based de-duplication, dsmr #1810

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

tsachiherman
Copy link
Contributor

@tsachiherman tsachiherman commented Nov 22, 2024

What ?

Integrate the generic de-deduplication logic into the dsmr

@tsachiherman tsachiherman self-assigned this Nov 22, 2024
x/dsmr/block.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
@tsachiherman tsachiherman marked this pull request as ready for review November 25, 2024 18:20
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
@tsachiherman
Copy link
Contributor Author

tsachiherman commented Nov 25, 2024 via email

Comment on lines 73 to 81
// make sure we have no repeats within the block itself.
blkTxsIDs := make(map[ids.ID]bool, len(blk.Txs()))
for _, tx := range blk.Txs() {
id := tx.GetID()
if _, has := blkTxsIDs[id]; has {
return fmt.Errorf("%w: duplicate in block", ErrDuplicateContainer)
}
blkTxsIDs[id] = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this check twice now in chain since we previously handled this within NewExecutionBlock

We should only apply the check once, which do you think is the better place for it? The other addition within NewExecutionBlock is pre-populating the signature job. It would probably be better to remove that from the execution block and handle it in AsyncVerify. One good reason to do this is that handling this within ParseBlock can be a DoS vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why testing this twice is an issue on it's own, and I think that I have a better solution here;
Testing it in NewExecutionBlock is not ideal from my perspective, as it should be construction step ( i.e. no errors ).
How do you feel about the following:
in ExecutionBlock, we would add the following function:

func (b *ExecutionBlock) ValidateDuplicateTransactions() error {
    if len(b.Txs) != b.txsSet.Len() {
        ErrDuplicateTx
    }
    return nil

than in validitywindow.go VerifyExpiryReplayProtection, we can just call this method:

   ...
   if err := blk.ValidateDuplicateTransactions(); err != nil {
      return err
   }
   ...

and remove the test from NewExecutionBlock

x/dsmr/block.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated
Comment on lines 247 to 250
if block.Tmstmp <= parentBlock.Tmstmp && parentBlock.Hght > 0 {
return ErrTimestampNotMonotonicallyIncreasing
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this check from this PR?

We should enforce timestamp >= parent timestamp (should allow them to be equal in case a block builder pushes the timestamp ahead of wall clock time for some nodes.

We should not allow the case that a malicious node builds a block with a timestamp that we consider valid less than 1s ahead of our wall clock time, but still ahead of our wall clock time, such that when we build a block on top of it, we fail because the current timestamp is ahead of our local timestamp.

We should update the check applied in BuildBlock imo.

We should also never execute the genesis block, so the check for parentBlock.Hght > 0 should be removed.

if blk, has := ti.blocks[id]; has {
return blk, nil
}
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching a non-existing block should return an error, can we return database.ErrNotFound if the block is not available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the ux of us using the sentinel database.ErrNotFound experience to be very awkward. This is due to the existing interface definition and not the changes in this PR though.... if we had something like GetExecutionBlock() (ExecutionBlock[T], bool, error) we wouldn't have to introspect the error and could just return an error if error is non-nil and check the bool if it just wasn't there.

@@ -69,6 +70,15 @@ func (v *TimeValidityWindow[Container]) VerifyExpiryReplayProtection(
if dup.Len() > 0 {
return fmt.Errorf("%w: duplicate in ancestry", ErrDuplicateContainer)
}
// make sure we have no repeats within the block itself.
blkTxsIDs := make(map[ids.ID]bool, len(blk.Txs()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use set.Set here?

Comment on lines 58 to 60
func (c ChunkCertificate) GetExpiry() int64 { return c.Expiry }

func (c *ChunkCertificate) GetSlot() int64 { return c.Expiry }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are duplicated - GetSlot is meant to be Expiry

x/dsmr/node.go Outdated
log logging.Logger,
tracer trace.Tracer,
chainIndex ChainIndex,
validityWindowDuration int64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use time.Duration here?

x/dsmr/node.go Outdated
Comment on lines 204 to 207
oldestAllowed := timestamp - n.validityWindowDuration
if oldestAllowed < 0 {
oldestAllowed = 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: min(0, oldestAllowed)

blocks map[ids.ID]validitywindow.ExecutionBlock[*ChunkCertificate]
}

func (ti *testingChainIndex) GetExecutionBlock(_ context.Context, id ids.ID) (validitywindow.ExecutionBlock[*ChunkCertificate], error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits:

  1. receiver name should just be t
  2. id -> blkID or blockID

x/dsmr/block.go Outdated
Comment on lines 127 to 131
func (b Block) GetID() ids.ID {
return b.blkID
}

func (b Block) Parent() ids.ID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care which we pick, but we should be consistent on the naming of either Foo() or GetFoo().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to dodge this one by saying that this method is no longer needed.

x/dsmr/block.go Outdated
Comment on lines 143 to 145
func (b Block) Txs() []*ChunkCertificate {
return b.ChunkCerts
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird because these are not returning txs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree. let's discuss this as a group ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the term containers?

x/dsmr/node.go Outdated
Comment on lines 58 to 59
log logging.Logger,
tracer trace.Tracer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make these (logger + tracer) the first args in this fn

x/dsmr/node.go Outdated
Comment on lines 37 to 40
type (
ChainIndex = validitywindow.ChainIndex[*ChunkCertificate]
timeValidityWindow = *validitywindow.TimeValidityWindow[*ChunkCertificate]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're doing this to avoid the caller depending on an internal package. I'm wondering if it even makes sense for validitywindow to be in internal at all... should this just be merged into dsmr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronbuchwald asked to remove this section completely.

x/dsmr/block.go Outdated
"github.com/ava-labs/hypersdk/utils"
)

const InitialChunkSize = 250 * 1024

type Tx interface {
GetID() ids.ID
GetExpiry() int64
emap.Item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes the internal/emap package into the caller. I think the previous wrapping pattern where we wrapped this interface w/ a type that implemented the emap interface actually looked cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once merged, we won't need wrapping interface anymore. . unless I'm missing something ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @joshua-kim 's point is to have the internal type implemented as makes sense in this package and then wrap it with another type that converts between that structure and the required interface when we need to use it in the validity window or expiry map where we need a specific interface. Correct me if I'm wrong @joshua-kim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

x/dsmr/block.go Outdated
return e.innerBlock.ParentID
}

func (e ExecutionBlock) Txs() []*emapChunkCertificate {
Copy link
Collaborator

@aaronbuchwald aaronbuchwald Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to a neutral name instead of txs like containers?

Comment on lines 75 to 82
blkTxsIDs := set.NewSet[ids.ID](len(blk.Containers()))
for _, tx := range blk.Containers() {
id := tx.GetID()
if blkTxsIDs.Contains(id) {
return fmt.Errorf("%w: duplicate in block", ErrDuplicateContainer)
}
blkTxsIDs.Add(id)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename the local variables here?

@@ -110,7 +121,7 @@ func (v *TimeValidityWindow[Container]) isRepeat(
if marker.Contains(i) {
continue
}
if ancestorBlk.ContainsTx(tx.GetID()) {
if ancestorBlk.Contains(tx.GetID()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment on tx.GetID() here

x/dsmr/block.go Outdated
Comment on lines 171 to 175
func (b Block) init() {
for _, c := range b.ChunkCerts {
b.certSet.Add(c.ChunkID)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the cert set to the execution block type?

Comment on lines 1259 to 1264
_, err := node.BuildBlock(context.Background(), parentBlk, testCase.timestamp)
r.ErrorIs(err, testCase.buildWantErr)

r.ErrorIs(node.Verify(context.Background(), parentBlk, newBlk), testCase.verifyWantErr)
})
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case looks like it should pass because calling Accept should mark the block as accepted and make sure it's part of the index.

}
r.NoError(node.Accept(context.Background(), blk))

r.NoError(node.storage.AddLocalChunkWithCert(chunk, &chunkCert))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @joshua-kim here that it does not make sense to test this way since a user of this package should never call AddLocalChunkWithCert directly.

The way we'd hit this case is by attempting to build a block with a chunk that is in our pool, but has not been included in an accepted block yet (and therefore has not been removed from that pool).

Comment on lines 1199 to 1200
nodes, indexer := newNodes(t, 1)
node = nodes[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the validity window should be separate from the nodes, since each node maintains its own index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not the validity window itself, but rather the validity window duration. The index remains the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexer is usually part of the node itself so sharing it across each node in tests seems like an anti-pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could create an indexer per node, if you think it would be better.. but it would be a bit of over engineering solution given that it's being used to retrieve previously accepted blocks. Please let me know if you think it's the right move and I'll make it so.

context.Background(),
[]tx{
{
ID: ids.Empty, // ids.GenerateTestID(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be using ids.GenerateTestID() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no; but good catch. It needed to be a way to way to reproduce consistent duplicate chunks. I'd updated the code to use ids.Empty.Prefix(uint64(chunkExpiry))

parentBlk = blk
}

// create the block so that we can test it against the execute directly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// create the block so that we can test it against the execute directly.
// create the block so that we can test it against Execute directly.

Comment on lines 1214 to 1224
_, chunkCert, err := node.BuildChunk(
context.Background(),
[]tx{
{
ID: ids.Empty,
Expiry: int64(chunkExpiry),
},
},
int64(chunkExpiry),
codec.Address{},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we generate actual IDs instead of ids.Empty since these will technically be invalid?

Comment on lines 1259 to 1263
_, err := node.BuildBlock(context.Background(), parentBlk, testCase.timestamp)
r.ErrorIs(err, testCase.buildWantErr)

r.ErrorIs(node.Verify(context.Background(), parentBlk, newBlk), testCase.verifyWantErr)
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BuildBlock succeeds, it seems that we are expecting it to build the same block that we have produced locally. Should we be checking that they're the same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the BuildBlock won't return any of the expected errors, than it would return a valid block. However, I'm not sure how easy it would be to compare it to the testing block we've created. I'll add ParentID, Height and Timestamp.

@@ -169,6 +197,38 @@ func TestNode_GetChunk_AvailableChunk(t *testing.T) {
<-done
}

func TestIndexerMissingBlock(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a test over the duplicate chunk case?

Copy link
Contributor Author

@tsachiherman tsachiherman Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no; after

	r.NoError(node.Verify(context.Background(), node.LastAccepted, blk))
	r.NoError(node.Accept(context.Background(), blk))

I intentionally did not add the newly accepted block to the indexer.
As a result, when

r.ErrorIs(node.Verify(context.Background(), node.LastAccepted, blkNext), database.ErrNotFound)

is called, it wouldn't have that block in the indexer, and would generate the database.ErrNotFound error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can tell, this is the only test that we have that cover this case.
( obviously, it shouldn't happen on any real VM that index all it's accepted blocks )

blkNext, err := node.BuildBlock(context.Background(), node.LastAccepted, 4)
r.NoError(err)

r.ErrorIs(node.Verify(context.Background(), node.LastAccepted, blkNext), database.ErrNotFound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we getting database.ErrNotFound here? It seems to me like this would fail verification because this block references a chunk that contains duplicated txs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wouldn't be the same transactions since ids.GenerateTestID() is random.

x/dsmr/block.go Outdated
@@ -105,6 +106,47 @@ func ParseChunk[T Tx](chunkBytes []byte) (Chunk[T], error) {
return c, c.init()
}

// ExecutionBlock bridge the gap between the dsmr's block implementation and the validity window's execution block interface.
type ExecutionBlock struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we should unexport this type since it's not needed downstream of dsmr, since we're only using this type within the validity window.
  2. Can we call this something like validityWindowBlock? We're going to be introducing another ExecutedBlock type which will be too similar to this type name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, wasn't aware of that, and would gladly rename the data structure.


chainID = ids.Empty
)

type testValidityWindowChainIndex struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move all testing types to bottom of file

Height: 1,
Timestamp: 1,
blkID: ids.GenerateTestID(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block isn't valid, I think we need to call BuildChunk -> BuildBlock -> Verify before calling Accept. As an unrelated note I don't think we need this block at all for this test.

}

return result
return result, indexer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this indexer to be shared across all node instances?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not since the "indexer" is unique per node in practice

}
r.NoError(node.Accept(context.Background(), blk))

_, err = node.BuildBlock(context.Background(), blk, 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use node.LastAccepted instead of referencing blk for the parent to build off of

Comment on lines +1101 to +1114
// make sure that it's not the case with any other chunk.
_, _, err = node.BuildChunk(
context.Background(),
[]tx{
{
ID: ids.GenerateTestID(),
Expiry: 4,
},
},
4,
codec.Address{},
)
r.NoError(err)
r.NoError(node.Accept(context.Background(), blk))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like instead of having a single test case that checks:

  1. Build a duplicate chunk, have it fail, and build a new chunk and see if it passes

It would just make more sense to have this split into separate test cases:

  1. Build a Chunk, it should pass
  2. Build a duplicate chunk, it should fail
  3. Build a chunk, build a duplicate chunk, should pass with a single chunk in the block

Comment on lines 1203 to 1224
blk := Block{
ParentID: parentBlk.GetID(),
Height: uint64(int(node.LastAccepted.Height) + 1),
Timestamp: int64(int(node.LastAccepted.Timestamp) + 1),
blkID: ids.GenerateTestID(),
}

for _, chunkExpiry := range chunkList {
_, chunkCert, err := node.BuildChunk(
context.Background(),
[]tx{
{
ID: ids.Empty.Prefix(uint64(chunkExpiry)),
Expiry: int64(chunkExpiry),
},
},
int64(chunkExpiry),
codec.Address{},
)
r.NoError(err)
blk.ChunkCerts = append(blk.ChunkCerts, &chunkCert)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use BuildBlock after calling BuildChunk instead of initializing the block through the struct?

r.NoError(node.Verify(context.Background(), parentBlk, blk))

r.NoError(node.Accept(context.Background(), blk))
indexer.set(blk.GetID(), NewExecutionBlock(blk))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should dsmr be updating the chain index? Or are we expecting that someone else is responsible for updating it (current behavior)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snow refactor separates the chain index from Accept, so it's reasonable imo to do the same thing here imo

Comment on lines 1199 to 1200
nodes, indexer := newNodes(t, 1)
node = nodes[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexer is usually part of the node itself so sharing it across each node in tests seems like an anti-pattern

var node *Node[tx]
nodes, indexer := newNodes(t, 1)
node = nodes[0]
node.validityWindowDuration = validationWindow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an argument to create new nodes instead of setting it after constructing the nodes?

r.NoError(node.Verify(context.Background(), parentBlk, blk))

r.NoError(node.Accept(context.Background(), blk))
indexer.set(blk.GetID(), NewExecutionBlock(blk))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snow refactor separates the chain index from Accept, so it's reasonable imo to do the same thing here imo

}

return result
return result, indexer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not since the "indexer" is unique per node in practice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants