Skip to content

Prepare findEviction to be extended for multiple memory tiers #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

igchor
Copy link
Contributor

@igchor igchor commented Oct 11, 2022

This PR refactors the eviction logic (inside findEviction) so that it will be easy to add support for multiple memory tiers. Problem with multi-tier configuration is that the critical section under MMContainer lock is too long. To fix that we have implemented an algorithm which utilize WaitContext to decrease the critical section. (which will be part of future PRs).

The idea is to use moving (now exclusive) bit to synchronize eviction with SlabRelease (and in future, with readers). In this PR, I only changed how findEviction synchronizes with SlabRelease.

This PR is a subset of: #132
The above PR introduced some performance regressions in the single-memory-tier version which we weren't able to fix yet, hence we decided to first implement this part (which should not affect performance) and later we can add separate path for multi-tier or try to optimize the original patch.

Remove the item from mmContainer and drop the lock before attempting
eviction.
to avoid races with remove/replace with predicate.

Also, refact tryEvictRegularItem and tryEvictChainedItem into
a single function.
moving bit is used to give exclusive right to evict the item
to a particular thread.

Originially, there was an assumption that whoever marked the item
as moving will try to free it until he succeeds. Since we don't want
to do that in findEviction (potentially can take a long time) we need
to make sure that unmarking is safe.

This patch checks the flags after unmarking (atomically) and if ref is
zero it also recyles the item. This is needed as there might be some
concurrent thread releasing the item (and decrementing ref count). If
moving bit is set, that thread would not free the memory back to
allocator, resulting in memory leak on unmarkMoving().
under MMContainer combined_lock.
Checking token validity should not be needed
right after creating it.
in findEviction, and rely only of recount
being zero when releasing items back to allocator.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 11, 2022
@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance looks good. No noticeable diff in A/B tests. I left comments on the correctness of a couple of places.

if (!itr) {
itr.resetToBegin();
}
itr.resetToBegin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the original logic. With this change, we can still get here by skipping an item due to refcount > 0 when checking the hash table, and we should continue forward.

    // If we destroyed the itr to possibly evict and failed, we restart
    // from the beginning again
    if (!itr) {
      itr.resetToBegin();
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the version that's imported into Meta this is still not fixed and caused problems in production.

We would need to revert this and re-import it. We will try our best to prioritize the re-import. @igchor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Sorry about this, I think something went wrong when I did rebase to the latest main.

@@ -2468,7 +2340,7 @@ void CacheAllocator<CacheTrait>::releaseSlabImpl(
// Need to mark an item for release before proceeding
// If we can't mark as moving, it means the item is already freed
const bool isAlreadyFreed =
!markMovingForSlabRelease(releaseContext, alloc, throttler);
!markExclusiveForSlabRelease(releaseContext, alloc, throttler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer safe. The original intent is that only a freed item cannot be marked as moving and no allocations can allocate a freed item once that slab was set to be released.
With this change, we can get false here if the item was already marked as exclusive (e.g. from an eviction thread). This means here we would treat that item as one already freed, which isn't the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cannot happen. If we fail to mark the item as exclusive inside processAllocForRelease we will just spin on the while(true) inside markExclusiveForSlabRelease

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ yup you're right.

// and the item is not expired.
Item& item = *itr;
const bool evictToNvmCache = shouldWriteToNvmCache(item);

Copy link
Contributor

@therealgymmy therealgymmy Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this check that returns early if token is invalid. Otherwise we trigger the assert at L2677.

That said, we could just skip evicting to NvmCache on eviction if we couldn't acquire a valid PutToken (which can happen due to contention on the in-flight-puts queues). It may cause impact on hit rate. So it's best done as a separate change. We could make this particular change internally (easier to test performance as well).

  // record the in-flight eviciton. If not, we move on to next item to avoid
  // stalling eviction.
  if (evictToNvmCache && !token.isValid()) {
    ++itr;
    stats_.evictFailConcurrentFill.inc();
    return WriteHandle{};
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, fixed.

XDCHECK(evictHandle->getRefCount() == 1);

if (evictToNvmCache && shouldWriteToNvmCacheExclusive(item)) {
XDCHECK(token.isValid());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed our internal tests. You can repro this by running this cachebench test config cachelib/cachebench/test_configs/consistency/navy.json. The issue is related to comments at L2644.

F1013 14:57:11.677257   633 CacheAllocator-inl.h:2677] Check failed: token.isValid() 
*** Aborted at 1665698231 (Unix time, try 'date -d @1665698231') ***
*** Signal 6 (SIGABRT) (0x6300000161) received by PID 353 (pthread TID 0x7f2645c00640) (linux TID 633) (maybe from PID 353, UID 99) (code: -6), stack trace: ***
    @ 000000000698b4be folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 0000000000000000 (unknown)
    @ 000000000009c9d3 __GI___pthread_kill
    @ 00000000000444ec __GI_raise
    @ 000000000002c432 __GI_abort
    @ 00000000069ac304 folly::LogCategory::admitMessage(folly::LogMessage const&) const
                       ./folly/logging/LogCategory.cpp:71
    @ 00000000069aa42d folly::LogStreamProcessor::logNow()
                       ./folly/logging/LogStreamProcessor.cpp:190
    @ 00000000069aa91c folly::LogStreamVoidify<true>::operator&(std::ostream&)
                       ./folly/logging/LogStreamProcessor.cpp:222
    @ 0000000005c608b6 facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait>::advanceIteratorAndTryEvictRegularItem(facebook::cachelib::MMLru::Container<facebook::cachelib::CacheItem<facebook::cachelib::LruCacheTrait>, &facebook::cachelib::CacheItem<facebook::cachelib::LruCacheTrait>::mmHook_>&, facebook::cachelib::MMLru::Container<facebook::cachelib::CacheItem<facebook::cachelib::LruCacheTrait>, &facebook::cachelib::CacheItem<facebook::cachelib::LruCacheTrait>::mmHook_>::Iterator&)
                       ./cachelib/allocator/CacheAllocator-inl.h:2677
                       -> ./cachelib/allocator/CacheAllocator.cpp
    @ 0000000005c5810d facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait>::findEviction(signed char, signed char)
                       ./cachelib/allocator/CacheAllocator-inl.h:1251
                       -> ./cachelib/allocator/CacheAllocator.cpp
    @ 0000000005c5897c facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait>::allocateChainedItemInternal(facebook::cachelib::detail::ReadHandleImpl<facebook::cachelib::CacheItem<facebook::cachelib::LruCacheTrait> > const&, unsigned int)
                       ./cachelib/allocator/CacheAllocator-inl.h:409
                       -> ./cachelib/allocator/CacheAllocator.cpp
    @ 0000000005c8bee0 facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait>::allocateChainedItem(facebook::cachelib::detail::ReadHandleImpl<facebook::cachelib::CacheItem<facebook::cachelib::LruCacheTrait> > const&, unsigned int)
                       ./cachelib/allocator/CacheAllocator-inl.h:381
                       -> ./cachelib/allocator/CacheAllocator.cpp
    @ 00000000059c0e0a facebook::cachelib::cachebench::CacheStressor<facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait> >::stressByDiscreteDistribution(facebook::cachelib::cachebench::ThroughputStats&)
                       ./cachelib/cachebench/cache/Cache-inl.h:355
                       -> ./cachelib/cachebench/runner/Stressor.cpp
    @ 00000000000df4e4 execute_native_thread_routine
    @ 000000000009ac0e start_thread
    @ 000000000012d1db __clone3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

auto& parent = candidate->getParentItem(compressor_);

const bool evictToNvmCache = shouldWriteToNvmCache(parent);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here we should return early if token isn't valid. Otherwise we can trigger the assertion at L2734.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -2940,7 +2916,7 @@ bool CacheAllocator<CacheTrait>::markMovingForSlabRelease(
// Since this callback is executed, the item is not yet freed
itemFreed = false;
Item* item = static_cast<Item*>(memory);
if (item->markMoving()) {
if (item->markExclusive()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now returns false if the item was already marked as exclusive. This causes issue in slab release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it returns false, we will just spin in the while loop until the item is either freed or we succeed in marking it as exclusive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you're right. This will just repeat in the loop. I noticed one more thing L2956 that I think we should remove the assert there.

*
* Unmarking moving does not depend on `isInMMContainer`
* Unmarking exclusive does not depend on `isInMMContainer`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add these to the comments

Unmarking exclusive will also return the refcount at the moment of unmarking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*
* Unmarking moving does not depend on `isInMMContainer`
* Unmarking exclusive does not depend on `isInMMContainer`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Can you add these to the comments.

Unmarking exclusive will also return the refcount at the moment of unmarking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// among all the control bits is set. This indicates an item is already on
// its way out of cache and does not need to be moved.
bool isOnlyExclusive() const noexcept {
// An item is only exclusive when its refcount is zero and only the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// An item is only exclusive when its refcount is zero and only the
// exclusive bit among all the control bits is set. This indicates an item
// is exclusive to the current thread. No other thread is allowed to
// do anything with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@igchor igchor force-pushed the upstream_initial_cs branch from ce5a74a to 549108b Compare October 18, 2022 20:23
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor
Copy link
Contributor Author

igchor commented Oct 18, 2022

By the way, is there any fundamental reason for NvmCache::put to accept a handle instead of a raw pointer? I'm working on the next set of changes (based on this PR) and passing a raw ptr would be much simpler

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -2962,7 +2953,7 @@ bool CacheAllocator<CacheTrait>::markMovingForSlabRelease(
itemFreed = true;

if (shutDownInProgress_) {
XDCHECK(!static_cast<Item*>(alloc)->isMoving());
XDCHECK(!static_cast<Item*>(alloc)->isExclusive());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this assert? I think this can be true now if an item is marked as exclusive by eviction thread while the slab release gets to this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Removed.

@@ -2940,7 +2916,7 @@ bool CacheAllocator<CacheTrait>::markMovingForSlabRelease(
// Since this callback is executed, the item is not yet freed
itemFreed = false;
Item* item = static_cast<Item*>(memory);
if (item->markMoving()) {
if (item->markExclusive()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you're right. This will just repeat in the loop. I noticed one more thing L2956 that I think we should remove the assert there.

@@ -2468,7 +2340,7 @@ void CacheAllocator<CacheTrait>::releaseSlabImpl(
// Need to mark an item for release before proceeding
// If we can't mark as moving, it means the item is already freed
const bool isAlreadyFreed =
!markMovingForSlabRelease(releaseContext, alloc, throttler);
!markExclusiveForSlabRelease(releaseContext, alloc, throttler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ yup you're right.

@therealgymmy
Copy link
Contributor

therealgymmy commented Oct 31, 2022

@igchor:

By the way, is there any fundamental reason for NvmCache::put to accept a handle instead of a raw pointer? I'm working on the next set of changes (based on this PR) and passing a raw ptr would be much simpler

By raw pointer do you mean Item&? Why would it make your changes much simpler?

There is no strong reason to use WriteHandle&. Mostly the code was just written this way.

@igchor igchor force-pushed the upstream_initial_cs branch from 549108b to c5e6f2a Compare October 31, 2022 20:36
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@igchor
Copy link
Contributor Author

igchor commented Oct 31, 2022

@igchor:

By the way, is there any fundamental reason for NvmCache::put to accept a handle instead of a raw pointer? I'm working on the next set of changes (based on this PR) and passing a raw ptr would be much simpler

By raw pointer do you mean Item&? Why would it make your changes much simpler?

There is no strong reason to use WriteHandle&. Mostly the code was just written this way.

Ok, I overstated it a bit :) It will make it slightly easier because in the patch, in some places we do not acquire a handle to the item anymore (.e.g. in findEviction we only operate on pointers/references). Once this PR is merged I'll rebase my second patch and push it so you can take a look.

@facebook-github-bot
Copy link
Contributor

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 928114c.

Copy link
Contributor

@haowu14 haowu14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @igchor. This change caused an issue in production and we reverted it.

Once you get a chance to fix the comments, I'll import it again and run tests against the service it failed on.


// if token is invalid, return. iterator is already advanced.
if (evictToNvmCache && !token.isValid()) {
++itr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already advanced the iterator at line 2700 so we shouldn't do it again here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (toReleaseHandle) {
if (toReleaseHandle->hasChainedItem()) {
const auto ref = candidate->unmarkExclusive();
if (ref == 0u) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to check if toReleaseHandle is valid. Not because we want to make sure its not being evicted by another thread, but to know whether we did actually removed candidate from the advanceIteratorAndTryEvict*Item function call.

If an item was not removed, we should skip the logic inside the if-else block and simply unmarkExclusive and move on. If we don't want to return toReleaseHandle because we don't want to increment the refcount, we should find another way to check if the iterator was advanced or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter whether we did actually remove the candidate? If it was some other thread that did this and the ref is 0 after unmarkExclsuive we should still be able to recycle the item. unmarkExclusive in this case is like decRef() . For ref == 0 we are the last owner but instead of returning the item to allocator we reuse it.

I even belive that we have to recycle the item (or release it back to allocator) in this case. Otherwise we could have a memory leak.

Is my thinking correct or do I miss something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. unmarkExclusive has both admin and access refcount right? if so, you are correct that we would have to recycle as this is the last owner.

I think there is just one small problem now:

C1: removed item and ref == 0 (no problem. The golden path)
C2: removed item and ref != 0 (shouldn't happen. No access ref can be created after removed from AccessContainer)
C3: not removed and ref == 0 (no problem. If happened, should still recycle because another thread had removed the item from Access and MMContainer, but didn't recycle because current thread is holding exclusive bit)
C4: not removed and ref > 0. In this case, we would increment the eviction failure counters again in the else block, while we already incremented them inside advanceIteratorAndTryEvict.

I think we just need to fix C4 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ref will contain both access and admin bits.

You are also right about those 4 cases. Since C2 cannot happen, all the stats are always updated inside advanceIteratorAndTryEvict*. So, I just removed the code which updates stats outside of those functions and added an assert for C2.

if (!itr) {
itr.resetToBegin();
}
itr.resetToBegin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be fixed.

@haowu14 haowu14 reopened this Nov 16, 2022
@igchor igchor force-pushed the upstream_initial_cs branch from c5e6f2a to 707cf1f Compare November 17, 2022 20:33
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

attempts. Mark the item as exclusive before trying
to remove from AC or MMContainer.
@igchor igchor force-pushed the upstream_initial_cs branch from 707cf1f to 5b6d18c Compare November 18, 2022 19:44
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@haowu14
Copy link
Contributor

haowu14 commented Nov 22, 2022

@igchor Update: we will be testing this in a prod-like setup in the week of Nov 28.

@haowu14
Copy link
Contributor

haowu14 commented Nov 30, 2022

Tested on Monday and landed just now. I'll move on to the next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants