Skip to content

[upstream] Initial framework for background item eviction/promotion #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 44 commits into from

Conversation

igchor
Copy link
Member

@igchor igchor commented Dec 15, 2022

Implement stubs for periodic workers for item eviction/promotion. Actual implementation of eviction and promotion will be provided in the next patch.

The idea behing introducing those workers is the following:

  • BG eviction: to keep certain amount of free memory and decreasing allocate latency
  • BG promotion: to move hot items to memory decresing latency

This change is Reviewable

@igchor
Copy link
Member Author

igchor commented Dec 15, 2022

If and when this PR is merged into upstream we can followup with PRs that:

  • implement params (watermarks, etc.) for BG workers and config support
  • implement calculateBatchSize()
  • implement eviction for single tier (to NVMe) + optionally promotion from NVMe to DRAM

@guptask
Copy link

guptask commented Dec 20, 2022

cachelib/allocator/CacheAllocator.h line 1063 at r1 (raw file):

                      util::Throttler::Config reaperThrottleConfig);

  bool startNewBackgroundPromoter(

Is it better to set default values for interval and strategy ? I understand doing so will require a background thread attributes class since multiple default values won't be possible.
e.g. bool startNewBackgroundEvictor(BackgroundThreadOpts opts);

@guptask
Copy link

guptask commented Dec 20, 2022

cachelib/allocator/CacheAllocatorConfig.h line 480 at r1 (raw file):

  std::shared_ptr<BackgroundMoverStrategy> backgroundPromoterStrategy;
  // time interval to sleep between runs of the background evictor
  std::chrono::milliseconds backgroundEvictorInterval{

I see now that default values are set in config. So maybe you can ignore my previous comment about background evictor/promoter thread creation api carrying default values for interval and strategy.

@guptask
Copy link

guptask commented Dec 20, 2022

cachelib/allocator/CacheStats.h line 293 at r1 (raw file):

// Mover Stats
struct BackgroundMoverStats {

Should this class be renamed to BackgroundMoverLocalStats ? I think adding the term "local" adds clarity to the per-thread stats collection mode.

Copy link

@guptask guptask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR looks good. Please review the few cosmetic comments.
:lgtm:

Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @igchor)

Copy link
Member Author

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @guptask)


cachelib/allocator/CacheAllocator.h line 1063 at r1 (raw file):

Previously, guptask (Sounak Gupta) wrote…

Is it better to set default values for interval and strategy ? I understand doing so will require a background thread attributes class since multiple default values won't be possible.
e.g. bool startNewBackgroundEvictor(BackgroundThreadOpts opts);

We don't have too many params, so I think it's fine as it is for now. startNewPoolOptimizer and others also don't have default params.


cachelib/allocator/CacheAllocatorConfig.h line 480 at r1 (raw file):

Previously, guptask (Sounak Gupta) wrote…

I see now that default values are set in config. So maybe you can ignore my previous comment about background evictor/promoter thread creation api carrying default values for interval and strategy.

Yeah, I think it's probably better to have the default values here and perhaps in enableBackgroundEvictor function as well eventually.


cachelib/allocator/CacheStats.h line 293 at r1 (raw file):

Previously, guptask (Sounak Gupta) wrote…

Should this class be renamed to BackgroundMoverLocalStats ? I think adding the term "local" adds clarity to the per-thread stats collection mode.

Well, we're using this class to aggregate stats from all Bg workers at the end so I'm not sure. This is done inside getBackgroundMoverStats

Open Source Bot and others added 24 commits February 15, 2023 19:44
Summary:
GitHub commits:

facebook/wangle@43e5620
facebookexperimental/edencommon@3934bec
facebookincubator/katran@a4c94aa

Reviewed By: jurajh-fb

fbshipit-source-id: 720310662068e4aa245198d817666700c9dde1d5
Summary: As mentioned in the tasks, these stats make more sense in rates.

Reviewed By: jiayuebao

Differential Revision: D43243498

fbshipit-source-id: 556b0b4a062235bb3b34dd076e5f7a5a88316f8c
Summary:
GitHub commits:

facebook@ba9dd68
facebook/fbthrift@23208ff
facebookincubator/velox@68b8a45

Reviewed By: jurajh-fb

fbshipit-source-id: f8581e13cd1c0a2510d87f42c813ca2c65eaef1d
Summary:
GitHub commits:

facebook/fbthrift@3b00d58

Reviewed By: jurajh-fb

fbshipit-source-id: 23ecb07e7e9ebd4dfbf3980de141ea1ebe36bb12
Summary:
GitHub commits:

facebook/fbthrift@df0eb61
facebook/proxygen@7fc3fea

Reviewed By: jurajh-fb

fbshipit-source-id: fa982baf35fd98edc52be391f03d15a9c2c333bf
Summary:
GitHub commits:

facebook/fbthrift@1d01d76

Reviewed By: jurajh-fb

fbshipit-source-id: bdefba88e8d002c65d8b7c11e86c8aa772996d9b
Summary:
GitHub commits:

facebook/fbthrift@1b848d8
facebook/rocksdb@cfe50f7
pytorch/FBGEMM@c55d800

Reviewed By: bigfootjon

fbshipit-source-id: e840d1116311ac52336e598442cf66cdc50d7725
Summary:
GitHub commits:

facebook/fbthrift@90fddab

Reviewed By: bigfootjon

fbshipit-source-id: cf7ac48c00c548805fb74663fe3e9bd9f5906cc2
Summary: Currently `replacedPtr` is passed as an argument and such usage isn't easy to understand (we kept receiving user feedback on that). Now we want to return it.

Reviewed By: therealgymmy, antonf, jaesoo-fb

Differential Revision: D43143989

fbshipit-source-id: eaae055107050cd67a7ca8b26b716fffc0c002f2
Summary:
GitHub commits:

facebook/folly@fdb333b
facebook/watchman@0b238b0
pytorch/FBGEMM@d142cdf

Reviewed By: bigfootjon

fbshipit-source-id: ac185dc9dc24ba7d2765dca010958ce6b38bd029
Summary:
GitHub commits:

facebook/fbthrift@cbc3de5
facebook/rocksdb@229297d
facebook/watchman@d873e11
facebookexperimental/rust-shed@2f45b88

Reviewed By: bigfootjon

fbshipit-source-id: a65fa2098f6b6f1704154e0509e2f5423f9679eb
Summary:
Change build script to pin fmt version to same as folly's to minimize future breaks.

Pull Request resolved: facebook#196

Test Plan:
Built successfully on a fresh clone of CacheLib. (Also had to change `external_git_branch=dev` for zstd to deal with the cmake/zstd issue in facebook#194, but that should resolve when gets merged into release)

**Context:** OSS build broke between 3-5 Jan 2023, likely due to changes in folly. While switching to v9.1.0 or 9.0.0 fixes the issue at hand, it seems sensible to match folly, which specifies fmt v8.0.1: https://github.com/facebook/folly/blob/main/build/fbcode_builder/manifests/fmt

> facebook#62 agordon: For the other packages, you'll notice we do use a specific git tag or branch… I notice `fmt` is an exception - not pinned to a specific git tag or revision - likely an omission that can be fixed.

Related CacheLib issues: facebook#186, facebook#189, facebook#107, facebook#97, facebook#62
Possibly related CacheLib commit: 67cc11a

Last working (Jan 3): https://github.com/facebook/CacheLib/actions/runs/3826992478
First failed (Jan 5): https://github.com/facebook/CacheLib/actions/runs/3844002307/jobs/6546742348
Error: `error: static assertion failed: Cannot format an argument. To make type T formattable provide a formatter<T> specialization: https://fmt.dev/latest/api.html#udt`

Reviewed By: therealgymmy

Differential Revision: D43517927

Pulled By: jiayuebao

fbshipit-source-id: 2d28791f7804d862b646263b96b10b835f843d8c
Summary:
Memcached's WSA logger will now emits GET_LEASE and SET_LEASE operations as well. This changes makes
the cachebench treats those as GET and SET, respectively, for compatibility.

Reviewed By: therealgymmy

Differential Revision: D43336316

fbshipit-source-id: c9b842d567b9fb2128b822bd429f5dce30b378da
Summary: update to >6.0.1 version

Reviewed By: antonk52

Differential Revision: D43575577

fbshipit-source-id: 7143da212ab6f124bffdfdaf3be29ff3ab986ffb
Summary:
Fix OSS builds by adding numa deps to build files. Currently some fail on missing `numa.h`.

Context: facebook#161 added the dependencies to the centOS, debian, and ubuntu18 build files. The PR was opened in Sep 2022 but only landed in Dec 2022, and so probably missed out on the fedora, rocky and arch build files which were added in-between those dates. Having had those build actions run on PRs would have caught this (currently, they are only scheduled.)

Pull Request resolved: facebook#197

Test Plan: Github Actions builds (ideally, facebook#198 would be landed first.) I've checked that those packages exist for the respective repositories but didn't run them myself.

Reviewed By: jaesoo-fb

Differential Revision: D43587970

Pulled By: jiayuebao

fbshipit-source-id: 8c59e48528042350e576a45ffc3bf2520699f5a9
Reviewed By: avalonalex

Differential Revision: D43616310

fbshipit-source-id: 3367ad01ba31e5dc561d63a4f3b9746170e64912
Summary:
Run GitHub action builds on every pull request, in addition to currently daily scheduled runs.

Benefit: avoid accidentally breaking other OS builds. This would have caught facebook#197.
This triggers whenever PRs are opened or when commits are added to a PR.

Frequency of PRs (total of 76 PRs over last 18 months since CacheLib was open-sourced) is much less than frequency  of daily scheduled builds, so this shouldn't add too many builds overall.

Pull Request resolved: facebook#198

Reviewed By: therealgymmy

Differential Revision: D43625609

Pulled By: jiayuebao

fbshipit-source-id: 1572f6da32584ce6a1983d5e64afedf17ff17457
Summary:
Add a custom deleter class that stores a `handle`. This allows object-cache to access the Item Handle via a shared_ptr. Both size-awareness feature and getting/updating object's TTL require that.

Deleter class is marked as private because we don't want to expose `Handle` to object-cache users.

Reviewed By: therealgymmy, jaesoo-fb

Differential Revision: D42503594

fbshipit-source-id: 16ac14e6a84a1cfa80a3c145d440790002734a34
Summary:
This change fixes following flaky tests in NvmCacheTests.
* NvmCacheTest.Delete
* NvmCacheTest.NvmEvicted
* NvmCacheTest.EvictToNvmGetCheckCtime

The root cause of the failures are essentially the same as D42443647 (facebook@5e7ff9a) which fixed the problem for
NvmCacheTest.EvictToNvmGet; we are inserting enough items that could be spilled to NVM cache, where
the NvmCache::put() can be dropped and the item is evicted completely when the delete operations
(and tombstones) issued as part of the insertion are still outstanding. In order to fix the problem,
this change flushes the NVM cache periodically during the insertions.

Also, since this could cause more regions are used, the size of NVM cache needs to be increased.
This change bumps the default size of NVM cache to 200MB (previous 100MB). Also, the size of persist
storage used in the test PersistenceCache has been bumped by 100MB accordingly, i.e., from 400MB to
500MB.

This change addresses the github issue facebook#169

Reviewed By: therealgymmy

Differential Revision: D43592888

fbshipit-source-id: f0968884eb39fb5728b59129e98345df3240f01e
Summary: The Samsung PM9A3 does not report samsung in the model number so I added the specific model number to the vendorMap.

Reviewed By: jaesoo-fb

Differential Revision: D43676582

fbshipit-source-id: 6df19c40dd9da9563b75aa5847a1d1f9eb6a9aef
Summary:
XDCHECK (or XCHECK) fails here for gcc-8.x in a lambda, so we move it outside. This occurs in CacheBench's AsyncCacheStressor.h.

```
/__w/CacheLib/CacheLib/cachelib/../cachelib/cachebench/runner/AsyncCacheStressor.h:306:7:
internal compiler error: in cp_build_addr_expr_1, at cp/typeck.c:5965
```

This line compiled fine with 8.5 before Jan 2023 so I suspect a regression in folly or other external library. It still compiles fine with gcc-7.5, 9.4, and 11.3.1. Possibly related commits: facebook/folly@1aafad4 and facebook/folly@e6d09f6

Line of gcc-8.5 that it fails on: https://github.com/gcc-mirror/gcc/blob/releases/gcc-8.5.0/gcc/cp/typeck.c#L5965

The version that isn't in a lambda compiles just fine: https://github.com/facebook/CacheLib/blob/df5b9f6ef35c55e432b6713c52397a03dd19c34c/cachelib/cachebench/runner/CacheStressor.h#L399

Pull Request resolved: facebook#201

Test Plan: GitHub actions. Built fine on my fork with CentOS 8.5/gcc-8.5. This issue currently causes 3 builds that use gcc-8.5 to fail (2 CentOS and RockyLinux-8.6) and 1 build using gcc-8.3 (Debian).

Reviewed By: therealgymmy

Differential Revision: D43681854

Pulled By: jaesoo-fb

fbshipit-source-id: f3a65aefedcd98a26a80bb6ad009ad0d64e2395b
Summary:
Add the following TTL-related APIs:
- getConfiguredTtlSec
- getExpiryTimeSec
- extendTtlSec
- updateExpiryTimeSec

Usage:
```
auto ptr = objcache->find<T>("key");
auto configuredTtl = objcache->getConfiguredTtl(ptr);
auto expiryTime = objcache->getExpiryTimeSec(ptr);
objcache->extendTtl(ptr, std::chrono::seconds(3));
objcache->updateExpiryTimeSec(ptr, newExpiryTimeSecs);

```

Reviewed By: therealgymmy, jaesoo-fb

Differential Revision: D43167879

fbshipit-source-id: 3b11fb0a2b9a3b5c38fcfd856ade100e6ae27470
Summary: Based on our discussion, this API would be confusing if a reference of ReadHandle is obtained from a WriteHandle. We also don't want to make it virtual because this will increase `sizeof(ReadHandle)` / `sizeof(WriteHandle)` by 8 bytes.

Reviewed By: therealgymmy

Differential Revision: D43667308

fbshipit-source-id: a77a17113f1a23f84332ebfa7ea6772d7647339c
Jaesoo Lee and others added 20 commits March 1, 2023 16:16
Summary:
This diff has been automatically generated by the inpage editor.

NOTE: If you want to update this diff, go via the preview link inside the static docs section below.
Ensure you are editing the same page that was used to create this diff.

Reviewed By: therealgymmy

Differential Revision: D43667238

fbshipit-source-id: 0be4c1ef376a5a1a2de92afc311af24f66d10afd
Summary:
1. Workaround for Debian Docker image bug that is breaking Debian build on GitHub (Explicitly mark Git repo as safe).
2. Pin zstd to a commit that resolves problems with older CMakes (note: affects all OSes, not just Debian)

Context for 1: In latest Debian Docker image , there is a regression that affects the checkout action.

From actions/checkout#1169:
> - Checkout runs, and runs /usr/bin/git config --global --add safe.directory <path>
> - The global .gitconfig does not exist
> - Any calls to git remain unsafe/dubious

The suggested workaround was to use --system instead of --global.

Pull Request resolved: facebook#200

Test Plan: See if GitHub Action Debian build is fixed.

Reviewed By: therealgymmy

Differential Revision: D43720363

Pulled By: jaesoo-fb

fbshipit-source-id: 54f3586cc7f8e72045e60d8dd454c7a77725e6b2
Summary:
An user in github reported an issue that the installation link is broken. Somehow,
docs/installation/installation.md cannot be referenced by `/docs/installation/installation`, but
only by `/docs/installation`. The root cause could not figured out yet, but this change fixes it as
such for now.

Reviewed By: jiayuebao

Differential Revision: D43757739

fbshipit-source-id: 4abd3208800c0b68e9162d381f6395897f047b24
Summary: There is another missing link in index.js for installation page. This change fixes it.

Reviewed By: jiayuebao

Differential Revision: D43774697

fbshipit-source-id: 1c2ff1c801ae41ea6b50e6e687714f422bcb0c92
Summary:
We need to fix builds for the entire Folly stack first before this will work again

Pull Request resolved: facebook#205

Reviewed By: michel-slm, jiayuebao

Differential Revision: D43786012

Pulled By: jaesoo-fb

fbshipit-source-id: dfd1947d7d2c294fc622496d6054c2395ef0d5a3
…itor

Summary:
This diff has been automatically generated by the inpage editor.

NOTE: If you want to update this diff, go via the preview link inside the static docs section below.
Ensure you are editing the same page that was used to create this diff.

Reviewed By: therealgymmy

Differential Revision: D43781316

fbshipit-source-id: 18091011f9e4592240fb41c5fb9cc8cb40e62205
Reviewed By: jiayuebao

Differential Revision: D43869027

fbshipit-source-id: 6f805a86b072e2fc96d4357672175e8283e06d0d
Summary:
As suggested by wonglkd in facebook#199, this change adds the
build status for all targets in the main README.md (https://github.com/facebook/CacheLib)

Reviewed By: therealgymmy

Differential Revision: D43854616

fbshipit-source-id: add32de29c160ed06e1778fef41bb37ffc359fd7
Summary:
**Context:** To enable user/intern simulations, D41134915 added 'itemValue' support in cachebench simulations. Any string that appears in the 'itemValue' column in the trace, is used to populate the cachelib item value (previously empty). For user/intern, we set this string to "0" for user and "1" for intern. Then in FBDep.cpp, we tell the admission policy to distinguish user/intern with a lambda function that parses item value. Prod uses a different lambda that reads item *header*, which has a user/intern flag set by Proxygen as part of the request to BigCache. We cannot use the same function and item header flag in simulations because the simulator does not run Proxygen. Instead, it emulates its behavior with trace + PieceWiseReplayGenerator.

**This diff:** Currently, the logic that populates item value in cachebench (setStringItem below) takes a std::string and sets the item value with that. When we want to read the item in FBDep, we don't know where the end of the string is because the item can be bigger than the data. Right now, we're avoiding this by adding a null character at the end of the itemValue data string. FBDep then uses strnlen to find the length of the data until the null string. **This is not working as intended**, and the string read in FBDep includes gibberish after the "0" or "1" because it is not properly terminated. This diff fixes the issue.

Reviewed By: jaesoo-fb

Differential Revision: D43861361

fbshipit-source-id: d4974281e28165ecf8c23c44dc4a6b50d5b1b806
Differential Revision: D43869116

fbshipit-source-id: c62c2518691313ec82c15c9f65ac304378222257
Differential Revision: D43869054

fbshipit-source-id: e90b16576404b8eb5819c65d61fe972f0f518c66
Differential Revision: D43869046

fbshipit-source-id: 288bc98b8146424f71ece5d0685d08c59e3ed523
Differential Revision: D43869073

fbshipit-source-id: 52fea96db9cefe735ba56b564bf6af5566cd6d76
Differential Revision: D43869045

fbshipit-source-id: 120149fa4cac2590b93f0c9a9fa5f7827efca3fc
Summary:
It is similar to 'moving' but requires ref count to be 0.

An item which is marked for eviction causes all incRef() calls to that item to fail.

This will be used to ensure that once item is selected for eviction, no one can interfere and prevent the eviction from suceeding.

'markedForEviction' relies on the same 'exlusive' bit as the 'moving' state. To distinguish between those two states, 'moving' add 1 to the refCount. This is hidden from the user, so getRefCount() will not return that extra ref.

Pull Request resolved: facebook#183

Test Plan:
flatmemcache shadow result:

It seems there is a slight latency regression in allocation (~1 micro second)
https://fburl.com/ods/u3isn8b4
It's probably small enough to ignore. But we can come back if this starts showing up in large scale in production.

Imported from GitHub, without a `Test Plan:` line.

Reviewed By: therealgymmy

Differential Revision: D42089974

Pulled By: haowu14

fbshipit-source-id: 90477faa9443080a7ddefa37a524965a04b6f084
Differential Revision: D43869057

fbshipit-source-id: 84aa3aec8d75f04b640d6aa8393002190c97842f
Differential Revision: D43869089

fbshipit-source-id: 57d359fa8cfeaca395f636f6d3997a934afca94a
Differential Revision: D43904241

fbshipit-source-id: 3d7484bee3a62495b2f2ee299fd3b0b855a3be58
Summary: Putting extra note of separating read/write traffic when using find APIs.

Reviewed By: jaesoo-fb

Differential Revision: D43960690

fbshipit-source-id: 56d3a8b65c7b66b3319bee6e7ffdd959826c8dcf
Implement stubs for periodic workers for item eviction/promotion.
Actual implementation of eviction and promotion will be provided
in the next patch.

The idea behing introducing those workers is the following:
- BG eviction: to keep certain amount of free memory and
  decreasing allocate latency
- BG promotion: to move hot items to memory decresing latency
@igchor igchor closed this Mar 24, 2023
byrnedj pushed a commit that referenced this pull request Jul 23, 2023
Compensation results in ratios being different than originially specified.
byrnedj pushed a commit that referenced this pull request Jul 23, 2023
Compensation results in ratios being different than originially specified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants