-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full-Trie Mark in Parallel #1219
Conversation
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
2c2fe05
to
63a744d
Compare
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
63a744d
to
0a252f4
Compare
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java
Outdated
Show resolved
Hide resolved
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/StoredMerklePatriciaTrie.java
Outdated
Show resolved
Hide resolved
private static final Logger LOG = LogManager.getLogger(); | ||
private static final byte[] IN_USE = Bytes.of(1).toArrayUnsafe(); | ||
|
||
private static final int DEFAULT_OPS_PER_TRANSACTION = 50_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this - is it really very advantageous to bundle up changes into large transactions? Versus just picking a reasonable size and committing more frequently? I also wonder if the node were under heavy load if its possible we might hit the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to me, I'll lower it to 10k. Let me know if you think it should go lower.
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java
Show resolved
Hide resolved
ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/SimpleMerklePatriciaTrie.java
Outdated
Show resolved
Hide resolved
ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/StoredMerklePatriciaTrie.java
Outdated
Show resolved
Hide resolved
@Override | ||
public CompletableFuture<Void> visitAll( | ||
final Consumer<Node<V>> nodeConsumer, final ExecutorService executorService) { | ||
nodeConsumer.accept(root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I'd be tempted to just expose getRoot()
on the Trie
interface, and move this logic into the Pruner
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda like it being on the Trie because we're adding other traversal methods there. It becomes one place to look various types of trie iteration. Not sure if I understood what you were suggesting so does that make sense?
This reverts commit 26ec138. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
This reverts commit 5700395. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>
Stream.of( | ||
CompletableFuture.runAsync(() -> nodeConsumer.accept(root), executorService)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this change, I think you need to bump up your queue size to 17 in the Pruner
. This is part of why I suggested moving this logic to the Pruner
- it feels a bit tightly coupled in terms of how the traversal is happening and the required queue size, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is wrong but in my mental model, the 16 never mattered too much. It could very well be 2, or 7, or whatever. The reason being that the main thing we're expecting to do is to start marking sub-tries almost immediately through the CallerRuns
policy.
The parallel task production is by sub-trie, so calling `visitAll` on a root node will eventually spawn up to 16 tasks (for a hexary trie). If we marked each sub-trie in its own thread, with no common queue of tasks, our mark speed would be limited by the sub-trie with the maximum number of nodes. In practice for the Ethereum mainnet, we see a large imbalance in sub-trie size so without a common task pool the time in which there is only 1 thread left marking its big sub-trie would be substantial. If we were to leave all threads to produce mark tasks before starting to mark, we would run out of memory quickly. If we were to have a constant number of threads producing the mark tasks with the others consuming them, we would have to optimize the production/consumption balance. To get the best of both worlds, the marking executor has a ThreadPoolExecutor.CallerRunsPolicy which causes the producing tasks to essentially consume their own mark task immediately when the task queue is full. The resulting behavior is threads that mark their own sub-trie until they finish that sub-trie, at which point they switch to marking the sub-trie tasks produced by another thread. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com> Signed-off-by: Abdelhamid Bakhta <abdelhamid.bakhta@consensys.net>
PR description
See javadoc and comments in
MarkSweepPruner
for an explanation of the algorithm.Validation
https://grafana-metrics-ohio.ops.pegasys.tech/d/97M3nMWGk/besu-pruning?orgId=1&from=now-7d&to=now&refresh=5s&var-system=dev-besu-ohio-mainnet-pruning-parallel-16-threads&var-system=dev-besu-ohio-mainnet-pruning-parallel-2-threads
The 16-thread version, which is obviously overkill, is useful for showing that it can perform multiple mark/sweep cycles so it didn't delete any necessary nodes. The 2-thread version, which is what is in this PR, shows that some form of parallelization can be accomplished without falling behind in block processing more than we already do in our production nodes. To make it concrete, the max lag was
6 blocks
. The majority of the fix for this will come from flat database work in progress by @shemnon .With today's state size, we can now mark mainnet with 2 threads in ~14 days. This is better than the "? days" of before where ? > a month.
Errorprone Changes
I think this is a bug in spotless. I didn't intend to make any changes there.