Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-Trie Mark in Parallel #1219

Merged
merged 56 commits into from
Oct 23, 2020
Merged

Conversation

RatanRSur
Copy link
Contributor

@RatanRSur RatanRSur commented Jul 14, 2020

PR description

See javadoc and comments in MarkSweepPruner for an explanation of the algorithm.

Validation

https://grafana-metrics-ohio.ops.pegasys.tech/d/97M3nMWGk/besu-pruning?orgId=1&from=now-7d&to=now&refresh=5s&var-system=dev-besu-ohio-mainnet-pruning-parallel-16-threads&var-system=dev-besu-ohio-mainnet-pruning-parallel-2-threads

The 16-thread version, which is obviously overkill, is useful for showing that it can perform multiple mark/sweep cycles so it didn't delete any necessary nodes. The 2-thread version, which is what is in this PR, shows that some form of parallelization can be accomplished without falling behind in block processing more than we already do in our production nodes. To make it concrete, the max lag was 6 blocks. The majority of the fix for this will come from flat database work in progress by @shemnon .

With today's state size, we can now mark mainnet with 2 threads in ~14 days. This is better than the "? days" of before where ? > a month.

Errorprone Changes

I think this is a bug in spotless. I didn't intend to make any changes there.

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
@RatanRSur RatanRSur force-pushed the pruning-parallel branch 2 times, most recently from 2c2fe05 to 63a744d Compare August 7, 2020 21:54
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
@RatanRSur RatanRSur marked this pull request as ready for review October 21, 2020 16:26
@RatanRSur RatanRSur requested review from mbaxter and ajsutton October 21, 2020 16:27
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
private static final Logger LOG = LogManager.getLogger();
private static final byte[] IN_USE = Bytes.of(1).toArrayUnsafe();

private static final int DEFAULT_OPS_PER_TRANSACTION = 50_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about this - is it really very advantageous to bundle up changes into large transactions? Versus just picking a reasonable size and committing more frequently? I also wonder if the node were under heavy load if its possible we might hit the timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me, I'll lower it to 10k. Let me know if you think it should go lower.

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
@Override
public CompletableFuture<Void> visitAll(
final Consumer<Node<V>> nodeConsumer, final ExecutorService executorService) {
nodeConsumer.accept(root);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I'd be tempted to just expose getRoot() on the Trie interface, and move this logic into the Pruner...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like it being on the Trie because we're adding other traversal methods there. It becomes one place to look various types of trie iteration. Not sure if I understood what you were suggesting so does that make sense?

This reverts commit 26ec138.

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
This reverts commit 5700395.

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
@RatanRSur RatanRSur merged commit a25d3f1 into hyperledger:master Oct 23, 2020
@RatanRSur RatanRSur deleted the pruning-parallel branch October 23, 2020 17:14
Comment on lines +115 to +116
Stream.of(
CompletableFuture.runAsync(() -> nodeConsumer.accept(root), executorService)),
Copy link
Contributor

@mbaxter mbaxter Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this change, I think you need to bump up your queue size to 17 in the Pruner. This is part of why I suggested moving this logic to the Pruner - it feels a bit tightly coupled in terms of how the traversal is happening and the required queue size, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is wrong but in my mental model, the 16 never mattered too much. It could very well be 2, or 7, or whatever. The reason being that the main thing we're expecting to do is to start marking sub-tries almost immediately through the CallerRuns policy.

AbdelStark pushed a commit to AbdelStark/besu that referenced this pull request Oct 28, 2020
The parallel task production is by sub-trie, so calling `visitAll` on a 
root node will eventually spawn up to 16 tasks (for a hexary trie).

If we marked each sub-trie in its own thread, with no common queue of 
tasks, our mark speed would be limited by the sub-trie with the maximum 
number of nodes. In practice for the Ethereum mainnet, we see a large 
imbalance in sub-trie size so without a common task pool the time in 
which there is only 1 thread left marking its big sub-trie would be 
substantial.

If we were to leave all threads to produce mark tasks before starting 
to mark, we would run out of memory quickly.

If we were to have a constant number of threads producing the mark 
tasks with the others consuming them, we would have to optimize the 
production/consumption balance.

To get the best of both worlds, the marking executor has a 
ThreadPoolExecutor.CallerRunsPolicy which causes the producing tasks to 
essentially consume their own mark task immediately when the task queue 
is full. The resulting behavior is threads that mark their own sub-trie 
until they finish that sub-trie, at which point they switch to marking 
the sub-trie tasks produced by another thread.

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Abdelhamid Bakhta <abdelhamid.bakhta@consensys.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants