-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory reallocation in prefetch_blocks #598
Conversation
Codecov Report
@@ Coverage Diff @@
## master #598 +/- ##
==========================================
+ Coverage 82.12% 82.15% +0.03%
==========================================
Files 173 173
Lines 14438 14433 -5
==========================================
+ Hits 11857 11858 +1
+ Misses 2581 2575 -6
Continue to review full report at Codecov.
|
const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)}; | ||
prefetched_blocks_.resize(n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents. Avoiding blocks reallocation is good but it also means more heap memory required which transforms into less available memory for OS to cache MDBX data pages. In my original implementation I've chosen (totally arbitrarily) 1024 blocks which gets progressively deleted as soon as they're consumed hence leaving more memory for state lookups (if any). Choosing too small pre-fetch segments, imho, partially voids the purpose of prefetching.
Besides the heavy part of reallocations occurs in Block's ommers and transactions retrieval (Block header is relatively small if compared to the other two).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I restored the prefetch size to be 10240, the same as in the master.
Execution::prefetch_blocks
now callsprefetched_blocks_.clear()
so that all memory occupied by ommers and transactions is freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I restored the prefetch size to be 10240, the same as in the master.
Ok
Execution::prefetch_blocks
now callsprefetched_blocks_.clear()
so that all memory occupied by ommers and transactions is freed.
I don't see an improvement here: previous version popped out blocks immediately after being processed. Here instead we keep them in memory till the batch is completed or at least another prefetch is requested. Isn't the goal to free mem for MDBX ASAP ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I've reworked my change and now use a circular buffer with popping out blocks immediately after being processed.
const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)}; | ||
prefetched_blocks_.resize(n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I restored the prefetch size to be 10240, the same as in the master.
Ok
Execution::prefetch_blocks
now callsprefetched_blocks_.clear()
so that all memory occupied by ommers and transactions is freed.
I don't see an improvement here: previous version popped out blocks immediately after being processed. Here instead we keep them in memory till the batch is completed or at least another prefetch is requested. Isn't the goal to free mem for MDBX ASAP ?
break; | ||
} | ||
const size_t n{std::min(static_cast<size_t>(to - from + 1), max_blocks)}; | ||
prefetched_blocks_.resize(n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we may allocate more blocks than what possibly returned by read_blocks (in case of error). In understand such error would be unrecoverable nevertheless I find this conceptually arguable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no resize
anymore
node/silkworm/db/access_layer.cpp
Outdated
key = block_key(block_number); | ||
auto data{canonical_hashes_cursor.find(to_slice(key), false)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why build the key (which implies Bytes allocation and endianess swap) for each block number and force a find (which is slower than movenext) ?
kCanonicalHashes is already sorted by block number ... use a loop with a single find of initial key and then move next, move next till EOF or when count of retrieved elements reaches the limit.
Eventually endian load data key from cursor to check blocks are in order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I've switched to db::cursor_for_count
|
||
// Flush whole buffer if time to | ||
if (gas_batch_size >= gas_max_batch_size || block_num_ >= max_block_num) { | ||
prefetched_blocks_.clear(); // free the memory held by transactions, etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any usefulness in making prefetched_blocks_
a private member variable if we end up clearing it on batch commit: eventually its life cycle is within execute_batch.
A possible optimization is imho this : instead of clearing its contents on batch commit save a pointer to last processed block and, on re-entry of execute_batch
process residual blocks instead of refetching those from db.
Example : we loaded 10240 blocks but after first 100 the buffer overflow gets triggered. We destroy 10140 loaded blocks and on re-entry we reload those again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented the suggested optimization.
++from; | ||
data = hashes_table.to_next(false); | ||
if (num_read != count) { | ||
throw std::runtime_error("Missing block " + std::to_string(from + num_read)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shows incorrect info: from
is altered within walk function so the printed value is not coherent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Fixed in 18d9e63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Just realized one last thing. Although in current context cannot happen ... if we look at
|
This is a small optimization to reduce memory reallocation in
prefetch_block
.linux-x86_64, gcc 11.2.0