Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Improve cache eviction policy for LoadedPrograms #34391

Merged
merged 8 commits into from
Dec 18, 2023

Conversation

pgarg66
Copy link
Contributor

@pgarg66 pgarg66 commented Dec 10, 2023

Problem

Current eviction policy for LoadedPrograms cache relies on program's usage_counter to decide which entry to evict. The usage_counter increases monotonically. This means if an old program which was frequently used in the past, but not used anymore, has higher chance of staying in the cache. This can make it harder for new programs to stay in the cache once the cache depth limits are reached.

Summary of Changes

  • Decay the usage_counter of a program if the program does not get used.
  • Use 2's random selection to pick a program entry to be evicted from the cache.
  • Add unit tests for the new code

Fixes #

Copy link

codecov bot commented Dec 10, 2023

Codecov Report

Merging #34391 (1d30f3b) into master (22bfcd9) will increase coverage by 0.0%.
Report is 8 commits behind head on master.
The diff coverage is 98.3%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #34391    +/-   ##
========================================
  Coverage    81.8%    81.8%            
========================================
  Files         820      820            
  Lines      220865   221037   +172     
========================================
+ Hits       180736   180903   +167     
- Misses      40129    40134     +5     

@pgarg66 pgarg66 marked this pull request as ready for review December 10, 2023 18:13
@pgarg66
Copy link
Contributor Author

pgarg66 commented Dec 10, 2023

Also, running a node against MB. No perceptible difference in eviction rate compared to the rest of the cluster.

program-runtime/src/loaded_programs.rs Outdated Show resolved Hide resolved
/// Evicts programs using 2's random selection, choosing the least used program out of the two entries.
/// The eviction is performed enough number of times to reduce the cache usage to the given percentage.
pub fn evict_using_2s_random_selection(&mut self, shrink_to: PercentageInteger, now: Slot) {
let mut candidates = self.get_flattened_entries();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could switch self.entries from HashMap to IndexMap. then the draw is just two random samplings from 0..self.entries.len(). no allocation necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to IndexMap will impact other parts of the code in the cache, not just eviction logic. If we agree to do it, maybe best to do it in a separate PR (in case we want to backport this). Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah would definitely want to do it in a preliminary pr. their APIs are pretty compatible and perf is par iirc.

then again, just calling .iter().nth() directly on the HashMap might be sufficient 🤔

mainly trying to kill the allocation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each entry in entries is a Vec<> of cached programs (since a given program ID can potentially have different versions of the program loaded simultaneously in cache corresponding to different slot/fork). So we need to flatten the two dimensional list to actually compute the total cache usage, and count how much to evict. I don't think we can avoid allocation even if we were to use IndexMap here.

Copy link
Contributor

@alessandrod alessandrod Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that github is having issues with PRs, my comments keep appearing and disappearing, but since this disappeared again)

I think this sounds like we should use a BTreeMap with keys (pubkey, slot)? So we can key ranges and still have a total length. It's a tiny map so perf wise it's almost certainly better than juggling a two level hash table + vec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do a reverse lookup for entries of a program during extract(). This helps find the most latest version of the program applicable to the given slot. That was one of the main reason/advantage of doing two level table.

Maybe this is not a big perf hit if we were to use BTreeMap. But, I think we should analyze it more carefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTreeMap::range supports reverse iteration, I don't think it's going to be any slower than what we're currently doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can/should certainly explore it. I believe it'll intersect with the other PR that does cooperative loading of programs.

Filed this issue to track it: #34442

program-runtime/src/loaded_programs.rs Outdated Show resolved Hide resolved
program-runtime/src/loaded_programs.rs Outdated Show resolved Hide resolved
program-runtime/src/loaded_programs.rs Show resolved Hide resolved
program-runtime/src/loaded_programs.rs Show resolved Hide resolved
@@ -5259,7 +5260,10 @@ impl Bank {
self.loaded_programs_cache
.write()
.unwrap()
.sort_and_unload(Percentage::from(SHRINK_LOADED_PROGRAMS_TO_PERCENTAGE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort_and_unload is now dead code and can probably be removed right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left it in as an escape hatch. We can remove it once the new eviction policy is live and working on MB.

alessandrod
alessandrod previously approved these changes Dec 13, 2023
Copy link
Contributor

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

The flattening isn't great but I agree that it would make this diff considerably larger and it can be tried in a separate PR

@pgarg66
Copy link
Contributor Author

pgarg66 commented Dec 13, 2023

lgtm

The flattening isn't great but I agree that it would make this diff considerably larger and it can be tried in a separate PR

Thanks @alessandrod
Leaving it open till tomorrow to give others a chance to take another look.

t-nelson
t-nelson previously approved these changes Dec 15, 2023
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we intend to let this ride 1.18 stabilization, right? no 1.17 bp

@pgarg66
Copy link
Contributor Author

pgarg66 commented Dec 15, 2023

we intend to let this ride 1.18 stabilization, right? no 1.17 bp

Yes, we are not backporting it to any of the branches. I am waiting to merge it to let other PRs go in, as it might create merge conflicts (or make it harder to backport the other PRs).

@pgarg66 pgarg66 dismissed stale reviews from t-nelson and alessandrod via 21807c0 December 18, 2023 16:51
program-runtime/src/loaded_programs.rs Show resolved Hide resolved
@@ -935,6 +956,26 @@ impl<FG: ForkGraph> LoadedPrograms<FG> {
})
}

/// Returns the list of loaded programs which are verified and compiled.
fn get_flattened_entries(&self) -> Vec<(Pubkey, Arc<LoadedProgram>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could take two filter parameters include_program_runtime_v1 and include_program_runtime_v2. Then get_entries_sorted_by_tx_usage(), which is only used by the recompilation phase could be inlined there.

);
}

fn decayed_usage_counter(&self, now: Slot) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be used for the sorting at the beginning of the recompilation phase as well.

let _ = self.latest_access_slot.fetch_update(
Ordering::Relaxed,
Ordering::Relaxed,
|last_access| (last_access < slot).then_some(slot),
Copy link
Contributor

@Lichtso Lichtso Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a manual implementation of fetch_max.

@@ -862,7 +883,7 @@ impl<FG: ForkGraph> LoadedPrograms<FG> {
if let LoadedProgramType::Unloaded(_environment) = &entry.program {
break;
}

entry.update_access_slot(loaded_programs_for_tx_batch.slot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only update it in this case here and not below where we also increment the usage counter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other cases the entry is a tombstone. The slot value will likely be useless there, as we don't evict tombstones. We can set the slot value there if you think in future it could be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would prefer to have both updates (to the usage counter and the access slot) in the same place. Also, I think it can be inlined as update_access_slot() only has one call site.

Copy link
Contributor Author

@pgarg66 pgarg66 Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses it too test_usage_counter_decay. But it could be inlined there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are sticking with Atomics, it'll be good to have a wrapper for test and the code.

@pgarg66 pgarg66 requested a review from Lichtso December 18, 2023 21:16
@pgarg66 pgarg66 merged commit 6f0133b into solana-labs:master Dec 18, 2023
19 checks passed
@pgarg66 pgarg66 deleted the cache-eviction branch December 18, 2023 22:51
ryoqun added a commit to ryoqun/solana that referenced this pull request Dec 21, 2023
ryoqun added a commit to ryoqun/solana that referenced this pull request Dec 21, 2023

pub fn decayed_usage_counter(&self, now: Slot) -> u64 {
let last_access = self.latest_access_slot.load(Ordering::Relaxed);
let decaying_for = now.saturating_sub(last_access);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decaying_for must be limited by .min(64) otherwise the line below might overflow on the bitshift.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants