-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Improve cache eviction policy for LoadedPrograms #34391
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #34391 +/- ##
========================================
Coverage 81.8% 81.8%
========================================
Files 820 820
Lines 220865 221037 +172
========================================
+ Hits 180736 180903 +167
- Misses 40129 40134 +5 |
Also, running a node against MB. No perceptible difference in eviction rate compared to the rest of the cluster. |
/// Evicts programs using 2's random selection, choosing the least used program out of the two entries. | ||
/// The eviction is performed enough number of times to reduce the cache usage to the given percentage. | ||
pub fn evict_using_2s_random_selection(&mut self, shrink_to: PercentageInteger, now: Slot) { | ||
let mut candidates = self.get_flattened_entries(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could switch self.entries
from HashMap
to IndexMap
. then the draw is just two random samplings from 0..self.entries.len()
. no allocation necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to IndexMap
will impact other parts of the code in the cache, not just eviction logic. If we agree to do it, maybe best to do it in a separate PR (in case we want to backport this). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah would definitely want to do it in a preliminary pr. their APIs are pretty compatible and perf is par iirc.
then again, just calling .iter().nth()
directly on the HashMap
might be sufficient 🤔
mainly trying to kill the allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each entry in entries
is a Vec<>
of cached programs (since a given program ID can potentially have different versions of the program loaded simultaneously in cache corresponding to different slot/fork). So we need to flatten the two dimensional list to actually compute the total cache usage, and count how much to evict. I don't think we can avoid allocation even if we were to use IndexMap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think that github is having issues with PRs, my comments keep appearing and disappearing, but since this disappeared again)
I think this sounds like we should use a BTreeMap with keys (pubkey, slot)? So we can key ranges and still have a total length. It's a tiny map so perf wise it's almost certainly better than juggling a two level hash table + vec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do a reverse lookup for entries of a program during extract()
. This helps find the most latest version of the program applicable to the given slot. That was one of the main reason/advantage of doing two level table.
Maybe this is not a big perf hit if we were to use BTreeMap. But, I think we should analyze it more carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTreeMap::range supports reverse iteration, I don't think it's going to be any slower than what we're currently doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can/should certainly explore it. I believe it'll intersect with the other PR that does cooperative loading of programs.
Filed this issue to track it: #34442
4556a50
to
c8bb69b
Compare
5fc1e44
to
9aede9f
Compare
@@ -5259,7 +5260,10 @@ impl Bank { | |||
self.loaded_programs_cache | |||
.write() | |||
.unwrap() | |||
.sort_and_unload(Percentage::from(SHRINK_LOADED_PROGRAMS_TO_PERCENTAGE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort_and_unload is now dead code and can probably be removed right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left it in as an escape hatch. We can remove it once the new eviction policy is live and working on MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
The flattening isn't great but I agree that it would make this diff considerably larger and it can be tried in a separate PR
Thanks @alessandrod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we intend to let this ride 1.18 stabilization, right? no 1.17 bp
Yes, we are not backporting it to any of the branches. I am waiting to merge it to let other PRs go in, as it might create merge conflicts (or make it harder to backport the other PRs). |
2259cbd
to
21807c0
Compare
21807c0
to
b790e74
Compare
@@ -935,6 +956,26 @@ impl<FG: ForkGraph> LoadedPrograms<FG> { | |||
}) | |||
} | |||
|
|||
/// Returns the list of loaded programs which are verified and compiled. | |||
fn get_flattened_entries(&self) -> Vec<(Pubkey, Arc<LoadedProgram>)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could take two filter parameters include_program_runtime_v1
and include_program_runtime_v2
. Then get_entries_sorted_by_tx_usage()
, which is only used by the recompilation phase could be inlined there.
); | ||
} | ||
|
||
fn decayed_usage_counter(&self, now: Slot) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be used for the sorting at the beginning of the recompilation phase as well.
let _ = self.latest_access_slot.fetch_update( | ||
Ordering::Relaxed, | ||
Ordering::Relaxed, | ||
|last_access| (last_access < slot).then_some(slot), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a manual implementation of fetch_max.
@@ -862,7 +883,7 @@ impl<FG: ForkGraph> LoadedPrograms<FG> { | |||
if let LoadedProgramType::Unloaded(_environment) = &entry.program { | |||
break; | |||
} | |||
|
|||
entry.update_access_slot(loaded_programs_for_tx_batch.slot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only update it in this case here and not below where we also increment the usage counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other cases the entry is a tombstone. The slot value will likely be useless there, as we don't evict tombstones. We can set the slot value there if you think in future it could be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would prefer to have both updates (to the usage counter and the access slot) in the same place. Also, I think it can be inlined as update_access_slot()
only has one call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test uses it too test_usage_counter_decay
. But it could be inlined there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are sticking with Atomics, it'll be good to have a wrapper for test and the code.
…ana-labs#34391)"" This reverts commit 4488299.
|
||
pub fn decayed_usage_counter(&self, now: Slot) -> u64 { | ||
let last_access = self.latest_access_slot.load(Ordering::Relaxed); | ||
let decaying_for = now.saturating_sub(last_access); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decaying_for
must be limited by .min(64)
otherwise the line below might overflow on the bitshift.
Problem
Current eviction policy for
LoadedPrograms
cache relies on program'susage_counter
to decide which entry to evict. Theusage_counter
increases monotonically. This means if an old program which was frequently used in the past, but not used anymore, has higher chance of staying in the cache. This can make it harder for new programs to stay in the cache once the cache depth limits are reached.Summary of Changes
usage_counter
of a program if the program does not get used.Fixes #