Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple heuristics for experimental mempurge. #8583

Conversation

bjlemaire
Copy link
Contributor

@bjlemaire bjlemaire commented Jul 23, 2021

Add experimental_mempurge_policy option flag and introduce two new MemPurge (Memtable Garbage Collection) policies: 'ALWAYS' and 'ALTERNATE'. Default value: ALTERNATE.
ALWAYS: every flush will first go through a MemPurge process. If the output is too big to fit into a single memtable, then the mempurge is aborted and a regular flush process carries on. ALWAYS is designed for user that need to reduce the number of L0 SST file created to a strict minimum, and can afford a small dent in performance (possibly hits to CPU usage, read efficiency, and maximum burst write throughput).
ALTERNATE: a flush is transformed into a MemPurge except if one of the memtables being flushed is the product of a previous MemPurge. ALTERNATE is a good tradeoff between reduction in number of L0 SST files created and performance. ALTERNATE perform particularly well for completely random garbage ratios, or garbage ratios anywhere in (0%,50%], and even higher when there is a wild variability in garbage ratios.
This PR also includes support for experimental_mempurge_policy in db_bench.
Testing was done locally by replacing all the MemPurge policies of the unit tests with ALTERNATE, as well as local testing with db_crashtest.py whitebox and blackbox. Overall, if an ALWAYS mempurge policy passes the tests, there is no reasons why an ALTERNATE policy would fail, and therefore the mempurge policy was set to ALWAYS for all mempurge unit tests.

@facebook-github-bot
Copy link
Contributor

@bjlemaire has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@bjlemaire bjlemaire force-pushed the enhanced_heuristics_experimental_mempurge branch from 5b0eb03 to e0040fe Compare July 23, 2021 22:35
@facebook-github-bot
Copy link
Contributor

@bjlemaire has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@bjlemaire has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might call these "trivial" or "simple" heuristics but LGTM

db/c.cc Outdated Show resolved Hide resolved
db/flush_job.cc Outdated Show resolved Hide resolved
@pdillinger
Copy link
Contributor

can afford a small dent in performance.

Specifically, likely hits to CPU usage, read efficiency, and maximum burst write throughput, though not always so.

@bjlemaire bjlemaire changed the title Add enhanced heuristics for experimental mempurge. Add simple heuristics for experimental mempurge. Jul 26, 2021
@facebook-github-bot
Copy link
Contributor

@bjlemaire has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@bjlemaire has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@bjlemaire bjlemaire force-pushed the enhanced_heuristics_experimental_mempurge branch from fccf5eb to a40adb1 Compare July 26, 2021 16:54
@facebook-github-bot
Copy link
Contributor

@bjlemaire has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@bjlemaire has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@bjlemaire merged this pull request in 4361d6d.

bjlemaire added a commit to bjlemaire/rocksdb that referenced this pull request Aug 16, 2021
Summary:
Add `experimental_mempurge_policy` option flag and introduce two new `MemPurge` (Memtable Garbage Collection) policies: 'ALWAYS' and 'ALTERNATE'. Default value: ALTERNATE.
`ALWAYS`: every flush will first go through a `MemPurge` process. If the output is too big to fit into a single memtable, then the mempurge is aborted and a regular flush process carries on. `ALWAYS` is designed for user that need to reduce the number of L0 SST file created to a strict minimum, and can afford a small dent in performance (possibly hits to CPU usage, read efficiency, and maximum burst write throughput).
`ALTERNATE`: a flush is transformed into a `MemPurge` except if one of the memtables being flushed is the product of a previous `MemPurge`. `ALTERNATE` is a good tradeoff between reduction in number of L0 SST files created and performance. `ALTERNATE` perform particularly well for completely random garbage ratios, or garbage ratios anywhere in (0%,50%], and even higher when there is a wild variability in garbage ratios.
This PR also includes support for `experimental_mempurge_policy` in `db_bench`.
Testing was done locally by replacing all the `MemPurge` policies of the unit tests with `ALTERNATE`, as well as local testing with `db_crashtest.py` `whitebox` and `blackbox`. Overall, if an `ALWAYS` mempurge policy passes the tests, there is no reasons why an `ALTERNATE` policy would fail, and therefore the mempurge policy was set to `ALWAYS` for all mempurge unit tests.

Pull Request resolved: facebook#8583

Reviewed By: pdillinger

Differential Revision: D29888050

Pulled By: bjlemaire

fbshipit-source-id: e2cf26646d66679f6f5fb29842624615610759c1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants