Reduce memory reallocation in prefetch_blocks #598

yperbasis · 2022-03-02T07:50:07Z

This is a small optimization to reduce memory reallocation in prefetch_block.

master@e1dd4c37	prefetch@871e241e

linux-x86_64, gcc 11.2.0

codecov · 2022-03-04T10:24:07Z

Codecov Report

Merging #598 (86ac24c) into master (5de7d92) will increase coverage by 0.03%.
The diff coverage is 28.84%.

@@            Coverage Diff             @@
##           master     #598      +/-   ##
==========================================
+ Coverage   82.12%   82.15%   +0.03%     
==========================================
  Files         173      173              
  Lines       14438    14433       -5     
==========================================
+ Hits        11857    11858       +1     
+ Misses       2581     2575       -6

Impacted Files	Coverage Δ
node/silkworm/db/util.hpp	`86.36% <ø> (ø)`
node/silkworm/stagedsync/stage_execution.cpp	`34.37% <2.63%> (+0.32%)`	⬆️
node/silkworm/db/access_layer.cpp	`82.83% <100.00%> (-0.04%)`	⬇️
node/silkworm/db/util.cpp	`79.83% <100.00%> (ø)`
node/silkworm/stagedsync/stage_execution.hpp	`100.00% <100.00%> (ø)`
core/silkworm/state/in_memory_state.cpp	`95.12% <0.00%> (+0.97%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5de7d92...86ac24c. Read the comment docs.

node/silkworm/stagedsync/stage_execution.hpp

node/silkworm/stagedsync/stage_execution.cpp

AndreaLanfranchi · 2022-03-06T20:13:28Z

node/silkworm/stagedsync/stage_execution.cpp

+    const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)};
+    prefetched_blocks_.resize(n);


My two cents. Avoiding blocks reallocation is good but it also means more heap memory required which transforms into less available memory for OS to cache MDBX data pages. In my original implementation I've chosen (totally arbitrarily) 1024 blocks which gets progressively deleted as soon as they're consumed hence leaving more memory for state lookups (if any). Choosing too small pre-fetch segments, imho, partially voids the purpose of prefetching.
Besides the heavy part of reallocations occurs in Block's ommers and transactions retrieval (Block header is relatively small if compared to the other two).

I restored the prefetch size to be 10240, the same as in the master.

Execution::prefetch_blocks now calls prefetched_blocks_.clear() so that all memory occupied by ommers and transactions is freed.

I restored the prefetch size to be 10240, the same as in the master.

Ok

Execution::prefetch_blocks now calls prefetched_blocks_.clear() so that all memory occupied by ommers and transactions is freed.

I don't see an improvement here: previous version popped out blocks immediately after being processed. Here instead we keep them in memory till the batch is completed or at least another prefetch is requested. Isn't the goal to free mem for MDBX ASAP ?

Fair enough. I've reworked my change and now use a circular buffer with popping out blocks immediately after being processed.

node/silkworm/stagedsync/stage_execution.cpp

AndreaLanfranchi · 2022-03-28T13:57:57Z

node/silkworm/stagedsync/stage_execution.cpp

+    const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)};
+    prefetched_blocks_.resize(n);


I restored the prefetch size to be 10240, the same as in the master.

Ok

Execution::prefetch_blocks now calls prefetched_blocks_.clear() so that all memory occupied by ommers and transactions is freed.

I don't see an improvement here: previous version popped out blocks immediately after being processed. Here instead we keep them in memory till the batch is completed or at least another prefetch is requested. Isn't the goal to free mem for MDBX ASAP ?

AndreaLanfranchi · 2022-03-28T14:07:46Z

node/silkworm/stagedsync/stage_execution.cpp

-            break;
-        }
+    const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)};
+    prefetched_blocks_.resize(n);


Here we may allocate more blocks than what possibly returned by read_blocks (in case of error). In understand such error would be unrecoverable nevertheless I find this conceptually arguable.

no resize anymore

AndreaLanfranchi · 2022-03-28T14:13:27Z

node/silkworm/db/access_layer.cpp

+        key = block_key(block_number);
+        auto data{canonical_hashes_cursor.find(to_slice(key), false)};


Why build the key (which implies Bytes allocation and endianess swap) for each block number and force a find (which is slower than movenext) ?
kCanonicalHashes is already sorted by block number ... use a loop with a single find of initial key and then move next, move next till EOF or when count of retrieved elements reaches the limit.
Eventually endian load data key from cursor to check blocks are in order

Fair enough, I've switched to db::cursor_for_count

AndreaLanfranchi · 2022-03-28T14:59:37Z

node/silkworm/stagedsync/stage_execution.cpp

+
+                // Flush whole buffer if time to
+                if (gas_batch_size >= gas_max_batch_size || block_num_ >= max_block_num) {
+                    prefetched_blocks_.clear();  // free the memory held by transactions, etc


I don't see any usefulness in making prefetched_blocks_ a private member variable if we end up clearing it on batch commit: eventually its life cycle is within execute_batch.
A possible optimization is imho this : instead of clearing its contents on batch commit save a pointer to last processed block and, on re-entry of execute_batch process residual blocks instead of refetching those from db.
Example : we loaded 10240 blocks but after first 100 the buffer overflow gets triggered. We destroy 10140 loaded blocks and on re-entry we reload those again.

I've implemented the suggested optimization.

AndreaLanfranchi · 2022-03-31T14:10:35Z

node/silkworm/stagedsync/stage_execution.cpp

-        ++from;
-        data = hashes_table.to_next(false);
+    if (num_read != count) {
+        throw std::runtime_error("Missing block " + std::to_string(from + num_read));


This shows incorrect info: from is altered within walk function so the printed value is not coherent.

Indeed. Fixed in 18d9e63

AndreaLanfranchi

Looks good.

AndreaLanfranchi · 2022-03-31T15:09:22Z

Just realized one last thing. Although in current context cannot happen ... if we look at prefetch_blocks from a generic POV the function is badly protected against underflow if from > to: granted it falls back to kMaxPrefetchedBlocks but this implies it'd return blocks with higher numbers than to

const size_t count{std::min(static_cast<size_t>(to - from + 1), kMaxPrefetchedBlocks)};

yperbasis added 5 commits March 1, 2022 17:21

Optimization: reduce memory reallocation in prefetch_blocks

30a200c

Make kDefaultPrefetchWidth default param value

9ee25d6

Tiny clean-up

66b5366

Merge branch 'master' into prefetch

8fdbf17

Merge branch 'master' into prefetch

962019a

yperbasis added 3 commits March 4, 2022 13:20

Decrease max prefetch blocks to 64

0631edb

Clear prefetched blocks on whole buffer flush

2ab6a38

Merge branch 'master' into prefetch

c8428c3

greg7mdp reviewed Mar 5, 2022

View reviewed changes

node/silkworm/stagedsync/stage_execution.hpp Outdated Show resolved Hide resolved

AndreaLanfranchi reviewed Mar 6, 2022

View reviewed changes

yperbasis added 6 commits March 13, 2022 11:55

Merge branch 'master' into prefetch

ffaaf2e

prefetched_blocks_clear() in prefetch_blocks

5b52d7c

Merge branch 'master' into prefetch

edb2365

Remove pleonastic assert

c878f0e

Batch block reading

2650f43

Remove unused include

871e241

yperbasis marked this pull request as ready for review March 28, 2022 13:09

yperbasis changed the title ~~[WIP] Optimization: reduce memory reallocation in prefetch_blocks~~ Reduce memory reallocation in prefetch_blocks Mar 28, 2022

yperbasis requested review from AndreaLanfranchi, canepat and mriccobene March 28, 2022 13:09

AndreaLanfranchi suggested changes Mar 28, 2022

View reviewed changes

yperbasis marked this pull request as draft March 28, 2022 15:18

yperbasis added 5 commits March 28, 2022 19:55

read_block -> read_block_by_number

4caf787

Separate read_block & read_block_by_number

f538438

Extract kMaxPrefetchedBlocks

58f2da0

Circular buffer for prefetched blocks

40df3e0

Restore some of the original logic

e759156

yperbasis added 3 commits March 28, 2022 21:52

Restore erroneously deleted break

5ce7789

A couple of renamings

e5a38f1

A couple of fixes

e73be23

yperbasis marked this pull request as ready for review March 31, 2022 08:20

Merge branch 'master' into prefetch

e51fce8

AndreaLanfranchi suggested changes Mar 31, 2022

View reviewed changes

yperbasis added 2 commits March 31, 2022 16:19

Make from const in prefetch_blocks

18d9e63

Small comment fix

86ac24c

AndreaLanfranchi approved these changes Mar 31, 2022

View reviewed changes

yperbasis merged commit 0874035 into master Mar 31, 2022

yperbasis deleted the prefetch branch March 31, 2022 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory reallocation in prefetch_blocks #598

Reduce memory reallocation in prefetch_blocks #598

yperbasis commented Mar 2, 2022 •

edited

Loading

codecov bot commented Mar 4, 2022 •

edited

Loading

AndreaLanfranchi Mar 6, 2022

yperbasis Mar 28, 2022 •

edited

Loading

AndreaLanfranchi Mar 28, 2022

yperbasis Mar 29, 2022

AndreaLanfranchi Mar 28, 2022

AndreaLanfranchi Mar 28, 2022

yperbasis Mar 29, 2022

AndreaLanfranchi Mar 28, 2022

yperbasis Mar 29, 2022

AndreaLanfranchi Mar 28, 2022 •

edited

Loading

yperbasis Mar 29, 2022

AndreaLanfranchi Mar 31, 2022

yperbasis Mar 31, 2022

AndreaLanfranchi left a comment

AndreaLanfranchi commented Mar 31, 2022

		const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)};
		prefetched_blocks_.resize(n);

		key = block_key(block_number);
		auto data{canonical_hashes_cursor.find(to_slice(key), false)};

Reduce memory reallocation in prefetch_blocks #598

Reduce memory reallocation in prefetch_blocks #598

Conversation

yperbasis commented Mar 2, 2022 • edited Loading

codecov bot commented Mar 4, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

yperbasis Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreaLanfranchi Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreaLanfranchi left a comment

Choose a reason for hiding this comment

AndreaLanfranchi commented Mar 31, 2022

yperbasis commented Mar 2, 2022 •

edited

Loading

codecov bot commented Mar 4, 2022 •

edited

Loading

yperbasis Mar 28, 2022 •

edited

Loading

AndreaLanfranchi Mar 28, 2022 •

edited

Loading