Full-Trie Mark in Parallel #1219

RatanRSur · 2020-07-14T18:16:54Z

PR description

See javadoc and comments in MarkSweepPruner for an explanation of the algorithm.

Validation

https://grafana-metrics-ohio.ops.pegasys.tech/d/97M3nMWGk/besu-pruning?orgId=1&from=now-7d&to=now&refresh=5s&var-system=dev-besu-ohio-mainnet-pruning-parallel-16-threads&var-system=dev-besu-ohio-mainnet-pruning-parallel-2-threads

The 16-thread version, which is obviously overkill, is useful for showing that it can perform multiple mark/sweep cycles so it didn't delete any necessary nodes. The 2-thread version, which is what is in this PR, shows that some form of parallelization can be accomplished without falling behind in block processing more than we already do in our production nodes. To make it concrete, the max lag was 6 blocks. The majority of the fix for this will come from flat database work in progress by @shemnon .

With today's state size, we can now mark mainnet with 2 threads in ~14 days. This is better than the "? days" of before where ? > a month.

Errorprone Changes

I think this is a bug in spotless. I didn't intend to make any changes there.

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/StoredMerklePatriciaTrie.java

mbaxter · 2020-10-22T15:14:17Z

ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java

  private static final Logger LOG = LogManager.getLogger();
  private static final byte[] IN_USE = Bytes.of(1).toArrayUnsafe();

+  private static final int DEFAULT_OPS_PER_TRANSACTION = 50_000;


Curious about this - is it really very advantageous to bundle up changes into large transactions? Versus just picking a reasonable size and committing more frequently? I also wonder if the node were under heavy load if its possible we might hit the timeout.

Sounds reasonable to me, I'll lower it to 10k. Let me know if you think it should go lower.

ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/SimpleMerklePatriciaTrie.java

ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/StoredMerklePatriciaTrie.java

mbaxter · 2020-10-23T15:05:37Z

ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/SimpleMerklePatriciaTrie.java

+  @Override
+  public CompletableFuture<Void> visitAll(
+      final Consumer<Node<V>> nodeConsumer, final ExecutorService executorService) {
+    nodeConsumer.accept(root);


Alternatively, I'd be tempted to just expose getRoot() on the Trie interface, and move this logic into the Pruner...

I kinda like it being on the Trie because we're adding other traversal methods there. It becomes one place to look various types of trie iteration. Not sure if I understood what you were suggesting so does that make sense?

This reverts commit 26ec138. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

This reverts commit 5700395. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

mbaxter · 2020-10-23T20:29:27Z

ethereum/trie/src/main/java/org/hyperledger/besu/ethereum/trie/SimpleMerklePatriciaTrie.java

+                Stream.of(
+                    CompletableFuture.runAsync(() -> nodeConsumer.accept(root), executorService)),


For this change, I think you need to bump up your queue size to 17 in the Pruner. This is part of why I suggested moving this logic to the Pruner - it feels a bit tightly coupled in terms of how the traversal is happening and the required queue size, etc.

Maybe this is wrong but in my mental model, the 16 never mattered too much. It could very well be 2, or 7, or whatever. The reason being that the main thing we're expecting to do is to start marking sub-tries almost immediately through the CallerRuns policy.

The parallel task production is by sub-trie, so calling `visitAll` on a root node will eventually spawn up to 16 tasks (for a hexary trie). If we marked each sub-trie in its own thread, with no common queue of tasks, our mark speed would be limited by the sub-trie with the maximum number of nodes. In practice for the Ethereum mainnet, we see a large imbalance in sub-trie size so without a common task pool the time in which there is only 1 thread left marking its big sub-trie would be substantial. If we were to leave all threads to produce mark tasks before starting to mark, we would run out of memory quickly. If we were to have a constant number of threads producing the mark tasks with the others consuming them, we would have to optimize the production/consumption balance. To get the best of both worlds, the marking executor has a ThreadPoolExecutor.CallerRunsPolicy which causes the producing tasks to essentially consume their own mark task immediately when the task queue is full. The resulting behavior is threads that mark their own sub-trie until they finish that sub-trie, at which point they switch to marking the sub-trie tasks produced by another thread. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com> Signed-off-by: Abdelhamid Bakhta <abdelhamid.bakhta@consensys.net>

RatanRSur added 24 commits August 7, 2020 16:45

commit marks only on block boundary

1102f12

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

test changes

fc99761

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

truer in order verification for case where transaction size isn't 1

655e571

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

flatmap

8f6cfaa

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

simplify test

e3e749e

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

switch back to old sweepBefore

1cf490a

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

remove mentions to tx

6dae00c

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

logging and timing and exiting

50247e0

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

ignore failure and logging

6a43c1b

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

retry on most errors, fail on a MerkleTrieException

3b866b2

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

add ops per transaction back to integration test

785c287

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

spotless

e24e384

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

medium tx

29e409c

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

visit all nodes in parallel

27b5843

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

moving some stuff around

ba521ff

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

change to CompletableFuture and wait on them

a56881b

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

unlock pending marks once the marks have been added to the tx

c4dbca0

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

use fixed thread pool

7e5ab65

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

join instead of get

6d41c2b

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

make single threaded visitAll return void

190afb3

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

report when mark work items finish

9b3d8bc

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

debug -> info

9900b92

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

markNodes directly to disk

c34ea60

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

ReadWriteLock for adding nodes during full trie marking

6a49649

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

RatanRSur force-pushed the pruning-parallel branch 2 times, most recently from 2c2fe05 to 63a744d Compare August 7, 2020 21:54

RatanRSur added 3 commits August 7, 2020 17:04

make markNode and markNodes both use the same locking mechanism

d5cd282

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

typo

cb228e0

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

parallelize *storage* marking instead of whole account subtries

0a252f4

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

RatanRSur force-pushed the pruning-parallel branch from 63a744d to 0a252f4 Compare August 26, 2020 16:51

RatanRSur added 7 commits October 21, 2020 11:47

undo some testing changes

51abfaa

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

thread pool holds 16 tasks

e52d809

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

comment fixup

0f8dba7

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

remove comment

9aa005f

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

param

6d22d3e

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

log messages

3b91245

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

rename lock

08adbb3

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

RatanRSur commented Oct 21, 2020

View reviewed changes

ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java Outdated Show resolved Hide resolved

RatanRSur marked this pull request as ready for review October 21, 2020 16:26

RatanRSur requested review from mbaxter and ajsutton October 21, 2020 16:27

RatanRSur commented Oct 21, 2020

View reviewed changes

ethereum/core/src/main/java/org/hyperledger/besu/ethereum/worldstate/MarkSweepPruner.java Outdated Show resolved Hide resolved

RatanRSur added 2 commits October 21, 2020 15:37

fix metrics bug

33d74dd

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

rename locks

bd2b55b

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

mbaxter approved these changes Oct 22, 2020

View reviewed changes

RatanRSur added 3 commits October 22, 2020 11:35

typo

97110e9

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

switch to non-blocking code

26ec138

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

comment

5700395

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

mbaxter reviewed Oct 23, 2020

View reviewed changes

RatanRSur added 7 commits October 23, 2020 11:45

Revert "switch to non-blocking code"

76367cc

This reverts commit 26ec138. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

Revert "comment"

e1d15ad

This reverts commit 5700395. Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

remove unused var

40de52a

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

process root node in executor

5fb2372

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

lower ops per tx

e48fe7e

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

Merge remote-tracking branch 'upstream/master' into pruning-parallel

4d3bd8b

errata

aa15d98

Signed-off-by: Ratan Rai Sur <ratan.r.sur@gmail.com>

RatanRSur merged commit a25d3f1 into hyperledger:master Oct 23, 2020

RatanRSur deleted the pruning-parallel branch October 23, 2020 17:14

mbaxter reviewed Oct 23, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full-Trie Mark in Parallel #1219

Full-Trie Mark in Parallel #1219

RatanRSur commented Jul 14, 2020 •

edited

Loading

mbaxter Oct 22, 2020

RatanRSur Oct 23, 2020

mbaxter Oct 23, 2020

RatanRSur Oct 23, 2020

mbaxter Oct 23, 2020 •

edited

Loading

RatanRSur Oct 26, 2020

		Stream.of(
		CompletableFuture.runAsync(() -> nodeConsumer.accept(root), executorService)),

Full-Trie Mark in Parallel #1219

Full-Trie Mark in Parallel #1219

Conversation

RatanRSur commented Jul 14, 2020 • edited Loading

PR description

Validation

Errorprone Changes

mbaxter Oct 22, 2020

Choose a reason for hiding this comment

RatanRSur Oct 23, 2020

Choose a reason for hiding this comment

mbaxter Oct 23, 2020

Choose a reason for hiding this comment

RatanRSur Oct 23, 2020

Choose a reason for hiding this comment

mbaxter Oct 23, 2020 • edited Loading

Choose a reason for hiding this comment

RatanRSur Oct 26, 2020

Choose a reason for hiding this comment

RatanRSur commented Jul 14, 2020 •

edited

Loading

mbaxter Oct 23, 2020 •

edited

Loading