Make it possible to enable blob files starting from a certain LSM tree level #10077

gangliao · 2022-05-31T03:58:19Z

Summary:

Currently, if blob files are enabled (i.e. enable_blob_files is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.

In order to achieve this, we would like to do the following:

Add a new configuration option blob_file_starting_level (default: 0) to AdvancedColumnFamilyOptions (and MutableCFOptions and extend the related logic)
Instantiate BlobFileBuilder in BuildTable (used during flush and recovery, where the LSM tree level is L0) and CompactionJob iff enable_blob_files is set and the LSM tree level is >= blob_file_starting_level
Add unit tests for the new functionality, and add the new option to our stress tests (db_stress and db_crashtest.py )
Add the new option to our benchmarking tool db_bench and the BlobDB benchmark script run_blob_bench.sh
Add the new option to the ldb tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
Ideally extend the C and Java bindings with the new option
Update the BlobDB wiki to document the new option.

Reviewers:

gangliao · 2022-05-31T04:03:38Z

TODO:

Add a new configuration option blob_file_starting_level (default: 0) to AdvancedColumnFamilyOptions (and MutableCFOptions and extend the related logic)
Instantiate BlobFileBuilder in BuildTable (used during flush and recovery, where the LSM tree level is L0) and CompactionJob iff enable_blob_files is set and the LSM tree level is >= blob_file_starting_level
Add unit tests for the new functionality, and add the new option to our stress tests (db_stress and db_crashtest.py )
Add the new option to our benchmarking tool db_bench and the BlobDB benchmark script run_blob_bench.sh
Add the new option to the ldb tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
Ideally extend the C and Java bindings with the new option
Update the BlobDB wiki to document the new option. (see here: https://github.com/facebook/rocksdb/wiki/BlobDB#column-family-options)

gangliao · 2022-06-02T17:07:48Z

Add a new feature to History.md

ltamasi

Thanks a lot for the patch @gangliao ! 🎉

db/blob/db_blob_compaction_test.cc

db/builder.cc

db_stress_tool/db_stress_gflags.cc

docs/_posts/2021-05-26-integrated-blob-db.markdown

ltamasi

LGTM (with one minor comment), thanks for the updates! Please add an entry to HISTORY.md :)

java/rocksjni/options.cc

…e level Summary: Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed. In order to achieve this, we would like to do the following: - Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic) - Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level` - Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` ) - Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh` - Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool) - Ideally extend the C and Java bindings with the new option - Update the BlobDB wiki to document the new option. Reviewers:

facebook-github-bot · 2022-06-02T22:59:41Z

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-06-02T23:05:23Z

@gangliao has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-02T23:05:51Z

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ltamasi

Just a few more minor things before we land this...

options/cf_options.cc

tools/db_bench_tool.cc

HISTORY.md

facebook-github-bot · 2022-06-02T23:30:33Z

@gangliao has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-02T23:42:00Z

@gangliao has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-02T23:43:07Z

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-06-03T00:34:23Z

@gangliao has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-03T00:35:11Z

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed label May 31, 2022

gangliao requested review from ltamasi, riversand963 and akankshamahajan15 May 31, 2022 12:52

ltamasi reviewed Jun 2, 2022

View reviewed changes

ltamasi approved these changes Jun 2, 2022

View reviewed changes

java/rocksjni/options.cc Outdated Show resolved Hide resolved

gangliao added 7 commits June 2, 2022 15:48

fix error and add stress test options

08effcc

add unit test

8621b98

Add blob_file_starting_level option to wiki

b2b3d6a

fix typo

b395582

fix typo

d7452ab

follow the comments

8bdda14

gangliao force-pushed the blob_start_level branch from de18e1e to 8bdda14 Compare June 2, 2022 22:48

gangliao added 2 commits June 2, 2022 15:58

Update blob_file_starting_level in HISTORY.md

3c5d80a

Remove cast

fe4e9e8

Add a more accurate description for blob_file_starting_level

d76200b

ltamasi reviewed Jun 2, 2022

View reviewed changes

options/cf_options.cc Outdated Show resolved Hide resolved

tools/db_bench_tool.cc Outdated Show resolved Hide resolved

HISTORY.md Outdated Show resolved Hide resolved

HISTORY.md Outdated Show resolved Hide resolved

Remove unused variable 'num_keys'

fd35f9e

follow the comments

559756b

hot fix

839a8ca

facebook-github-bot closed this in e6432df Jun 3, 2022

gangliao deleted the blob_start_level branch June 3, 2022 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to enable blob files starting from a certain LSM tree level #10077

Make it possible to enable blob files starting from a certain LSM tree level #10077

gangliao commented May 31, 2022

gangliao commented May 31, 2022 •

edited

Loading

gangliao commented Jun 2, 2022 •

edited

Loading

ltamasi left a comment

ltamasi left a comment

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

ltamasi left a comment

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 3, 2022

facebook-github-bot commented Jun 3, 2022

Make it possible to enable blob files starting from a certain LSM tree level #10077

Make it possible to enable blob files starting from a certain LSM tree level #10077

Conversation

gangliao commented May 31, 2022

gangliao commented May 31, 2022 • edited Loading

gangliao commented Jun 2, 2022 • edited Loading

ltamasi left a comment

Choose a reason for hiding this comment

ltamasi left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

ltamasi left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 2, 2022

facebook-github-bot commented Jun 3, 2022

facebook-github-bot commented Jun 3, 2022

gangliao commented May 31, 2022 •

edited

Loading

gangliao commented Jun 2, 2022 •

edited

Loading