Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory reallocation in prefetch_blocks #598

Merged
merged 25 commits into from
Mar 31, 2022
Merged

Reduce memory reallocation in prefetch_blocks #598

merged 25 commits into from
Mar 31, 2022

Conversation

yperbasis
Copy link
Member

@yperbasis yperbasis commented Mar 2, 2022

This is a small optimization to reduce memory reallocation in prefetch_block.

master@e1dd4c37 prefetch@871e241e
master_e1dd4c37 prefetch_e73be231

linux-x86_64, gcc 11.2.0

@codecov
Copy link

codecov bot commented Mar 4, 2022

Codecov Report

Merging #598 (86ac24c) into master (5de7d92) will increase coverage by 0.03%.
The diff coverage is 28.84%.

@@            Coverage Diff             @@
##           master     #598      +/-   ##
==========================================
+ Coverage   82.12%   82.15%   +0.03%     
==========================================
  Files         173      173              
  Lines       14438    14433       -5     
==========================================
+ Hits        11857    11858       +1     
+ Misses       2581     2575       -6     
Impacted Files Coverage Δ
node/silkworm/db/util.hpp 86.36% <ø> (ø)
node/silkworm/stagedsync/stage_execution.cpp 34.37% <2.63%> (+0.32%) ⬆️
node/silkworm/db/access_layer.cpp 82.83% <100.00%> (-0.04%) ⬇️
node/silkworm/db/util.cpp 79.83% <100.00%> (ø)
node/silkworm/stagedsync/stage_execution.hpp 100.00% <100.00%> (ø)
core/silkworm/state/in_memory_state.cpp 95.12% <0.00%> (+0.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5de7d92...86ac24c. Read the comment docs.

Comment on lines 120 to 121
const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)};
prefetched_blocks_.resize(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents. Avoiding blocks reallocation is good but it also means more heap memory required which transforms into less available memory for OS to cache MDBX data pages. In my original implementation I've chosen (totally arbitrarily) 1024 blocks which gets progressively deleted as soon as they're consumed hence leaving more memory for state lookups (if any). Choosing too small pre-fetch segments, imho, partially voids the purpose of prefetching.
Besides the heavy part of reallocations occurs in Block's ommers and transactions retrieval (Block header is relatively small if compared to the other two).

Copy link
Member Author

@yperbasis yperbasis Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I restored the prefetch size to be 10240, the same as in the master.
  2. Execution::prefetch_blocks now calls prefetched_blocks_.clear() so that all memory occupied by ommers and transactions is freed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I restored the prefetch size to be 10240, the same as in the master.

Ok

  1. Execution::prefetch_blocks now calls prefetched_blocks_.clear() so that all memory occupied by ommers and transactions is freed.

I don't see an improvement here: previous version popped out blocks immediately after being processed. Here instead we keep them in memory till the batch is completed or at least another prefetch is requested. Isn't the goal to free mem for MDBX ASAP ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I've reworked my change and now use a circular buffer with popping out blocks immediately after being processed.

@yperbasis yperbasis marked this pull request as ready for review March 28, 2022 13:09
@yperbasis yperbasis changed the title [WIP] Optimization: reduce memory reallocation in prefetch_blocks Reduce memory reallocation in prefetch_blocks Mar 28, 2022
Comment on lines 120 to 121
const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)};
prefetched_blocks_.resize(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I restored the prefetch size to be 10240, the same as in the master.

Ok

  1. Execution::prefetch_blocks now calls prefetched_blocks_.clear() so that all memory occupied by ommers and transactions is freed.

I don't see an improvement here: previous version popped out blocks immediately after being processed. Here instead we keep them in memory till the batch is completed or at least another prefetch is requested. Isn't the goal to free mem for MDBX ASAP ?

break;
}
const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)};
prefetched_blocks_.resize(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we may allocate more blocks than what possibly returned by read_blocks (in case of error). In understand such error would be unrecoverable nevertheless I find this conceptually arguable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no resize anymore

Comment on lines 205 to 206
key = block_key(block_number);
auto data{canonical_hashes_cursor.find(to_slice(key), false)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why build the key (which implies Bytes allocation and endianess swap) for each block number and force a find (which is slower than movenext) ?
kCanonicalHashes is already sorted by block number ... use a loop with a single find of initial key and then move next, move next till EOF or when count of retrieved elements reaches the limit.
Eventually endian load data key from cursor to check blocks are in order

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I've switched to db::cursor_for_count


// Flush whole buffer if time to
if (gas_batch_size >= gas_max_batch_size || block_num_ >= max_block_num) {
prefetched_blocks_.clear(); // free the memory held by transactions, etc
Copy link
Contributor

@AndreaLanfranchi AndreaLanfranchi Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any usefulness in making prefetched_blocks_ a private member variable if we end up clearing it on batch commit: eventually its life cycle is within execute_batch.
A possible optimization is imho this : instead of clearing its contents on batch commit save a pointer to last processed block and, on re-entry of execute_batch process residual blocks instead of refetching those from db.
Example : we loaded 10240 blocks but after first 100 the buffer overflow gets triggered. We destroy 10140 loaded blocks and on re-entry we reload those again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the suggested optimization.

@yperbasis yperbasis marked this pull request as draft March 28, 2022 15:18
@yperbasis yperbasis marked this pull request as ready for review March 31, 2022 08:20
++from;
data = hashes_table.to_next(false);
if (num_read != count) {
throw std::runtime_error("Missing block " + std::to_string(from + num_read));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows incorrect info: from is altered within walk function so the printed value is not coherent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Fixed in 18d9e63

Copy link
Contributor

@AndreaLanfranchi AndreaLanfranchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@yperbasis yperbasis merged commit 0874035 into master Mar 31, 2022
@yperbasis yperbasis deleted the prefetch branch March 31, 2022 15:02
@AndreaLanfranchi
Copy link
Contributor

Just realized one last thing. Although in current context cannot happen ... if we look at prefetch_blocks from a generic POV the function is badly protected against underflow if from > to: granted it falls back to kMaxPrefetchedBlocks but this implies it'd return blocks with higher numbers than to

const size_t count{std::min(static_cast<size_t>(to - from + 1), kMaxPrefetchedBlocks)};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants