Skip to content

Conversation

yacovm
Copy link
Contributor

@yacovm yacovm commented Jul 24, 2024

Make BatchedParseBlock parse blocks (and verify their signature) in parallel.

Prior to this code change, batch parsing of blocks in the proposer VM would only work if the wrapped VMs support batch processing. Now, if the wrapped VM doesn't support batch processing, it simply parses them one by one, but the proposer VM still parses in parallel.

Why this should be merged

Makes the proposer VM parse blocks in parallel, shortening the time it takes to process Ancestors messages from remote nodes.

How this works

The code logic is retained, as the parsing is done now at a precomputation step and the parsing results are used exactly as it was in the serial execution before the change.

How this was tested

I added a unit test that plants a block that cannot be parsed in various places in the input and made sure it returns results as expected.

Copy link
Contributor

@marun marun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benchmark test that would enable quantifying the impact of this change against common usage scenarios?


blockThatCantBeParsed := snowmantest.BuildChild(snowmantest.Genesis)

var blockBatch1 [][]byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) Maybe extract this common code into a helper for reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more specific about which code? (which lines)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that creation of the values for rawBlocks is the same modulo adding an unparseable block for the first 2 cases, which suggested to me extracting makeSignedBlock and its dependencies and the loops below to a helper e.g. getRawBlocks and adding the unparseable blocks separately. Just food for thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, done!


results := make([]parseResults, len(blks))

// If we can't parse the first block, no point in parsing anything after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) Is the first block more likely to be unparseable than the rest, or is this just a cheap enough optimization that it's worth specializing for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that you get a bunch of blocks upon a response for a message, and they should all be within proximity.

So if the first cannot be parsed, it's probable the others most likely also can't, and vice versa.

It stems from the fact that blocks are produced in a certain way starting from a period in time.

Copy link
Contributor

@abi87 abi87 Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents here is that I still would not special case the first block and try to reap the full benefit of parallel parsins, instead of waiting for the first block to be seralized.
This unless we have proof that block parsing fails often and fails mostily because of the first block.
Also not sure how "heavy" of an operation block parsing is. I think in general the lighter the less conservative we should be in trying to spare work if an invalid block is present among the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand it, if we are bootstrapping, then we will fail parsing all blocks until the proposer VM was introduced. It's true that failing parsing is not heavy, but I did not want to further degrade the performance for processing old blocks before the proposer VM was introduced.

@StephenButtolph what is your opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unless we've run a mainnet benchmark with results showing the opposite, I'd probably prefer to just parallelize all of the parsing rather than special casing the first block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks. Changed accordingly!


parsingResults := vm.parseBlocks(blks)

for ; blocksIndex < len(blks); blocksIndex++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) Do blocks need to be handled by this loop serially or would it be possible (or sane) to optimize this loop further to process in the order that blocks are successfully parsed (e.g. via a channel)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I don't know if the caller code ensures that the blocks are passed in-order.

I would guess that if not, then probably we could.

However I wanted to preserve the semantics of this change in this PR.

@StephenButtolph wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all a required change, just food for thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marun I think we want to keep the batched call to the innerVM here, so we are bound to wait for all proposerVM level parsing to be done before pushing the innerBlocks content down to the innerVM.
IIUC, you're suggesting instead to avoid batching innerVM calls and shoot them as soon as proposerVM has done its work on a single block.
I am not sure this is helpful, especially if innerVM is an rpcVM one, which is the case that brought us to introduce batching in the first place.

Copy link
Contributor

@StephenButtolph StephenButtolph Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do blocks need to be handled by this loop serially or would it be possible (or sane) to optimize this loop further to process in the order that blocks are successfully parsed (e.g. via a channel)?

As this PR is currently written - there is no real change in behavior... It is just an optimization.

The first block bytes provided must remain the first block returned in the slice (this is required by the engine). Technically I don't think anything would break if the rest of the blocks were potentially shuffled... But I feel like that would be a pretty weird API. IMO we should document on the BatchedParseBlock interface method (and helper function) that the order is expected to be maintained.

If we were to use a channel here, we would need to remove the:

if err != nil {
	break
}

behavior, as we may still need to process additional blocks.

All-in-all, I think using a channel would increase the complexity (at least of this PR) pretty significantly

Copy link
Contributor

@ARR4N ARR4N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have the context, can you please add a method comment explaining what it does? The best-effort block parsing doesn't make sense in isolation so that might help.

expectedBlocks = append(expectedBlocks, block.Bytes())
}

require.Equal(expectedBlocks, testCase.rawBlocks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This instance of require is bound to the outer testing.T so errors will be reported incorrectly and your other test cases probably won't run if one fails. The unused t lint failure is a false flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I am sloppy :-)

Nice catch!


for i, block := range blocks {
if i < testCase.preForkIndex {
require.Equal(reflect.TypeOf(block).String(), "*proposervm.postForkBlock")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflect.Type can be compared directly so you don't need to call String(), and you'll also get compiler checks on the expected value. Same applies below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I quite understand your comment, let me know if that's what you had in mind.

func TestBatchedParseBlockParallel(t *testing.T) {
require := require.New(t)

parentID := ids.ID{1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fyi; no action required) ids.GenerateTestID() was introduced recently so you may not see it in other tests, but just letting you know that it's there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used as suggested, thanks

@yacovm
Copy link
Contributor Author

yacovm commented Jul 24, 2024

Is there a benchmark test that would enable quantifying the impact of this change against common usage scenarios?

I'm not sure of the size of the batch of blocks, but the parsing logic contains inside of it an ECDSA signature verification so it should be worth it almost all the time.

Having said that, I have no clue how heavy is the chain VM parsing down the line, so any benchmark test I will do will probably look good artificially but I'm unsure of how realistic it is for the bottom line.

I can do that just for the sake of trying, but I wouldn't suggest merging it.

@yacovm yacovm force-pushed the propVMParallel branch 3 times, most recently from 7dacd51 to fbbcc44 Compare July 25, 2024 09:36
@yacovm
Copy link
Contributor Author

yacovm commented Jul 25, 2024

If you have the context, can you please add a method comment explaining what it does? The best-effort block parsing doesn't make sense in isolation so that might help.

Added some comments, let me know if that helps

Copy link
Contributor

@abi87 abi87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main comment is around special casing first block parsing. I'd be less conservative and rush to parse all in parallel, but I may lack data indicating this is not a good idea.

@yacovm yacovm force-pushed the propVMParallel branch 2 times, most recently from e4e9cd4 to e99ab79 Compare July 25, 2024 17:00
Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my laptop downloading the Fuji P-chain on

  • master took 16s 651ms
  • this PR took 7s 900ms

Currently looking into the improvement mainnet gets.

Make BatchedParseBlock parse blocks (and verify their signature) in parallel.

Prior to this code change, batch parsing of blocks in the proposer VM would only work if the wrapped VMs support batch processing.
Now, if the wrapped VM doesn't support batch processing, it simply parses them one by one, but the proposer VM still parses in parallel.

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
@yacovm
Copy link
Contributor Author

yacovm commented Jul 25, 2024

Many thanks everyone (@marun , @ARR4N , @abi87 and @StephenButtolph ) for the high quality code review!

Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my laptop downloading the Mainnet P-chain on

  • master took 36m 27s
  • this PR took 20m 4s

@yacovm yacovm requested a review from ARR4N July 25, 2024 21:01
@StephenButtolph StephenButtolph added this pull request to the merge queue Jul 26, 2024
Merged via the queue into ava-labs:master with commit 070aaac Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants