-
Notifications
You must be signed in to change notification settings - Fork 807
Parallelize BatchedParseBlock #3227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benchmark test that would enable quantifying the impact of this change against common usage scenarios?
vms/proposervm/batched_vm_test.go
Outdated
|
||
blockThatCantBeParsed := snowmantest.BuildChild(snowmantest.Genesis) | ||
|
||
var blockBatch1 [][]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action required) Maybe extract this common code into a helper for reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more specific about which code? (which lines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that creation of the values for rawBlocks is the same modulo adding an unparseable block for the first 2 cases, which suggested to me extracting makeSignedBlock and its dependencies and the loops below to a helper e.g. getRawBlocks
and adding the unparseable blocks separately. Just food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, done!
vms/proposervm/batched_vm.go
Outdated
|
||
results := make([]parseResults, len(blks)) | ||
|
||
// If we can't parse the first block, no point in parsing anything after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action required) Is the first block more likely to be unparseable than the rest, or is this just a cheap enough optimization that it's worth specializing for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that you get a bunch of blocks upon a response for a message, and they should all be within proximity.
So if the first cannot be parsed, it's probable the others most likely also can't, and vice versa.
It stems from the fact that blocks are produced in a certain way starting from a period in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents here is that I still would not special case the first block and try to reap the full benefit of parallel parsins, instead of waiting for the first block to be seralized.
This unless we have proof that block parsing fails often and fails mostily because of the first block.
Also not sure how "heavy" of an operation block parsing is. I think in general the lighter the less conservative we should be in trying to spare work if an invalid block is present among the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand it, if we are bootstrapping, then we will fail parsing all blocks until the proposer VM was introduced. It's true that failing parsing is not heavy, but I did not want to further degrade the performance for processing old blocks before the proposer VM was introduced.
@StephenButtolph what is your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unless we've run a mainnet benchmark with results showing the opposite, I'd probably prefer to just parallelize all of the parsing rather than special casing the first block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. Changed accordingly!
|
||
parsingResults := vm.parseBlocks(blks) | ||
|
||
for ; blocksIndex < len(blks); blocksIndex++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action required) Do blocks need to be handled by this loop serially or would it be possible (or sane) to optimize this loop further to process in the order that blocks are successfully parsed (e.g. via a channel)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I don't know if the caller code ensures that the blocks are passed in-order.
I would guess that if not, then probably we could.
However I wanted to preserve the semantics of this change in this PR.
@StephenButtolph wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all a required change, just food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marun I think we want to keep the batched call to the innerVM here, so we are bound to wait for all proposerVM level parsing to be done before pushing the innerBlocks content down to the innerVM.
IIUC, you're suggesting instead to avoid batching innerVM calls and shoot them as soon as proposerVM has done its work on a single block.
I am not sure this is helpful, especially if innerVM is an rpcVM one, which is the case that brought us to introduce batching in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do blocks need to be handled by this loop serially or would it be possible (or sane) to optimize this loop further to process in the order that blocks are successfully parsed (e.g. via a channel)?
As this PR is currently written - there is no real change in behavior... It is just an optimization.
The first block bytes provided must remain the first block returned in the slice (this is required by the engine). Technically I don't think anything would break if the rest of the blocks were potentially shuffled... But I feel like that would be a pretty weird API. IMO we should document on the BatchedParseBlock
interface method (and helper function) that the order is expected to be maintained.
If we were to use a channel here, we would need to remove the:
if err != nil {
break
}
behavior, as we may still need to process additional blocks.
All-in-all, I think using a channel would increase the complexity (at least of this PR) pretty significantly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have the context, can you please add a method comment explaining what it does? The best-effort block parsing doesn't make sense in isolation so that might help.
vms/proposervm/batched_vm_test.go
Outdated
expectedBlocks = append(expectedBlocks, block.Bytes()) | ||
} | ||
|
||
require.Equal(expectedBlocks, testCase.rawBlocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instance of require
is bound to the outer testing.T
so errors will be reported incorrectly and your other test cases probably won't run if one fails. The unused t
lint failure is a false flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I am sloppy :-)
Nice catch!
vms/proposervm/batched_vm_test.go
Outdated
|
||
for i, block := range blocks { | ||
if i < testCase.preForkIndex { | ||
require.Equal(reflect.TypeOf(block).String(), "*proposervm.postForkBlock") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reflect.Type
can be compared directly so you don't need to call String()
, and you'll also get compiler checks on the expected value. Same applies below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I quite understand your comment, let me know if that's what you had in mind.
func TestBatchedParseBlockParallel(t *testing.T) { | ||
require := require.New(t) | ||
|
||
parentID := ids.ID{1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(fyi; no action required) ids.GenerateTestID()
was introduced recently so you may not see it in other tests, but just letting you know that it's there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used as suggested, thanks
I'm not sure of the size of the batch of blocks, but the parsing logic contains inside of it an ECDSA signature verification so it should be worth it almost all the time. Having said that, I have no clue how heavy is the chain VM parsing down the line, so any benchmark test I will do will probably look good artificially but I'm unsure of how realistic it is for the bottom line. I can do that just for the sake of trying, but I wouldn't suggest merging it. |
7dacd51
to
fbbcc44
Compare
Added some comments, let me know if that helps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main comment is around special casing first block parsing. I'd be less conservative and rush to parse all in parallel, but I may lack data indicating this is not a good idea.
e4e9cd4
to
e99ab79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my laptop downloading the Fuji P-chain on
master
took16s 651ms
- this PR took
7s 900ms
Currently looking into the improvement mainnet gets.
Make BatchedParseBlock parse blocks (and verify their signature) in parallel. Prior to this code change, batch parsing of blocks in the proposer VM would only work if the wrapped VMs support batch processing. Now, if the wrapped VM doesn't support batch processing, it simply parses them one by one, but the proposer VM still parses in parallel. Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Many thanks everyone (@marun , @ARR4N , @abi87 and @StephenButtolph ) for the high quality code review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my laptop downloading the Mainnet P-chain on
master
took36m 27s
- this PR took
20m 4s
Make BatchedParseBlock parse blocks (and verify their signature) in parallel.
Prior to this code change, batch parsing of blocks in the proposer VM would only work if the wrapped VMs support batch processing. Now, if the wrapped VM doesn't support batch processing, it simply parses them one by one, but the proposer VM still parses in parallel.
Why this should be merged
Makes the proposer VM parse blocks in parallel, shortening the time it takes to process
Ancestors
messages from remote nodes.How this works
The code logic is retained, as the parsing is done now at a precomputation step and the parsing results are used exactly as it was in the serial execution before the change.
How this was tested
I added a unit test that plants a block that cannot be parsed in various places in the input and made sure it returns results as expected.