Skip to content

Commit

Permalink
Add Bloom/Ribbon hybrid API support (#8679)
Browse files Browse the repository at this point in the history
Summary:
This is essentially resurrection and fixing of the part of
facebook/rocksdb#8198 that was reverted in facebook/rocksdb#8212, using data added in facebook/rocksdb#8246. Basically,
when configuring Ribbon filter, you can specify an LSM level before which
Bloom will be used instead of Ribbon. But Bloom is only considered for
Leveled and Universal compaction styles and file going into a known LSM
level. This way, SST file writer, FIFO compaction, etc. use Ribbon filter as
you would expect with NewRibbonFilterPolicy.

So that this can be controlled with a single int value and so that flushes
can be distinguished from intra-L0, we consider flush to go to level -1 for
the purposes of this option. (Explained in API comment.)

I also expect the most common and recommended Ribbon configuration to
use Bloom during flush, to minimize slowing down writes and because according
to my estimates, Ribbon only pays off if the structure lives in memory for
more than an hour. Thus, I have changed the default for NewRibbonFilterPolicy
to be this mild hybrid configuration. I don't really want to add something like
NewHybridFilterPolicy because at least the mild hybrid configuration (Bloom for
flush, Ribbon otherwise) should be considered a natural choice.

C APIs also updated, but because they don't support overloading,
rocksdb_filterpolicy_create_ribbon is kept pure ribbon for clarity and
rocksdb_filterpolicy_create_ribbon_hybrid must be called for a hybrid
configuration. While touching C API, I changed bits per key options from
int to double.

BuiltinFilterPolicy is needed so that LevelThresholdFilterPolicy doesn't inherit
unused fields from BloomFilterPolicy.

Pull Request resolved: facebook/rocksdb#8679

Test Plan: new + updated tests, including crash test

Reviewed By: jay-zhuang

Differential Revision: D30445797

Pulled By: pdillinger

fbshipit-source-id: 6f5aeddfd6d79f7e55493b563c2d1d2d568892e1
  • Loading branch information
pdillinger authored and facebook-github-bot committed Aug 21, 2021
1 parent baf22b4 commit 2a383f2
Show file tree
Hide file tree
Showing 13 changed files with 341 additions and 70 deletions.
2 changes: 2 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,12 @@
* Added a stat rocksdb.secondary.cache.hits
* Added a PerfContext counter secondary_cache_hit_count
* The integrated BlobDB implementation now supports the tickers `BLOB_DB_BLOB_FILE_BYTES_READ`, `BLOB_DB_GC_NUM_KEYS_RELOCATED`, and `BLOB_DB_GC_BYTES_RELOCATED`, as well as the histograms `BLOB_DB_COMPRESSION_MICROS` and `BLOB_DB_DECOMPRESSION_MICROS`.
* Added hybrid configuration of Ribbon filter and Bloom filter where some LSM levels use Ribbon for memory space efficiency and some use Bloom for speed. See NewRibbonFilterPolicy. This also changes the default behavior of NewRibbonFilterPolicy to use Bloom for flushes under Leveled and Universal compaction and Ribbon otherwise. The C API function `rocksdb_filterpolicy_create_ribbon` is unchanged but adds new `rocksdb_filterpolicy_create_ribbon_hybrid`.

## Public API change
* Added APIs to decode and replay trace file via Replayer class. Added `DB::NewDefaultReplayer()` to create a default Replayer instance. Added `TraceReader::Reset()` to restart reading a trace file. Created trace_record.h, trace_record_result.h and utilities/replayer.h files to access the decoded Trace records, replay them, and query the actual operation results.
* Added Configurable::GetOptionsMap to the public API for use in creating new Customizable classes.
* Generalized bits_per_key parameters in C API from int to double for greater configurability.

### Performance Improvements
* Try to avoid updating DBOptions if `SetDBOptions()` does not change any option value.
Expand Down
23 changes: 16 additions & 7 deletions db/c.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3840,7 +3840,8 @@ void rocksdb_filterpolicy_destroy(rocksdb_filterpolicy_t* filter) {
delete filter;
}

rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_bloom_format(int bits_per_key, bool original_format) {
rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_bloom_format(
double bits_per_key, bool original_format) {
// Make a rocksdb_filterpolicy_t, but override all of its methods so
// they delegate to a NewBloomFilterPolicy() instead of user
// supplied C functions.
Expand Down Expand Up @@ -3875,16 +3876,17 @@ rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_bloom_format(int bits_per_ke
return wrapper;
}

rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_bloom_full(int bits_per_key) {
rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_bloom_full(
double bits_per_key) {
return rocksdb_filterpolicy_create_bloom_format(bits_per_key, false);
}

rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_bloom(int bits_per_key) {
rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_bloom(double bits_per_key) {
return rocksdb_filterpolicy_create_bloom_format(bits_per_key, true);
}

rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_ribbon_format(
int bloom_equivalent_bits_per_key) {
double bloom_equivalent_bits_per_key, int bloom_before_level) {
// Make a rocksdb_filterpolicy_t, but override all of its methods so
// they delegate to a NewRibbonFilterPolicy() instead of user
// supplied C functions.
Expand All @@ -3911,17 +3913,24 @@ rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_ribbon_format(
static void DoNothing(void*) {}
};
Wrapper* wrapper = new Wrapper;
wrapper->rep_ = NewRibbonFilterPolicy(bloom_equivalent_bits_per_key);
wrapper->rep_ =
NewRibbonFilterPolicy(bloom_equivalent_bits_per_key, bloom_before_level);
wrapper->state_ = nullptr;
wrapper->delete_filter_ = nullptr;
wrapper->destructor_ = &Wrapper::DoNothing;
return wrapper;
}

rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_ribbon(
int bloom_equivalent_bits_per_key) {
double bloom_equivalent_bits_per_key) {
return rocksdb_filterpolicy_create_ribbon_format(
bloom_equivalent_bits_per_key);
bloom_equivalent_bits_per_key, /*bloom_before_level = disabled*/ -1);
}

rocksdb_filterpolicy_t* rocksdb_filterpolicy_create_ribbon_hybrid(
double bloom_equivalent_bits_per_key, int bloom_before_level) {
return rocksdb_filterpolicy_create_ribbon_format(
bloom_equivalent_bits_per_key, bloom_before_level);
}

rocksdb_mergeoperator_t* rocksdb_mergeoperator_create(
Expand Down
20 changes: 12 additions & 8 deletions db/c_test.c
Original file line number Diff line number Diff line change
Expand Up @@ -1043,21 +1043,25 @@ int main(int argc, char** argv) {
}

StartPhase("filter");
for (run = 0; run <= 3; run++) {
// First run uses custom filter
// Second run uses old block-based bloom filter
// Third run uses full bloom filter
for (run = 0; run <= 4; run++) {
// run=0 uses custom filter
// run=1 uses old block-based bloom filter
// run=2 run uses full bloom filter
// run=3 uses Ribbon
// run=4 uses Ribbon-Bloom hybrid configuration
CheckNoError(err);
rocksdb_filterpolicy_t* policy;
if (run == 0) {
policy = rocksdb_filterpolicy_create(NULL, FilterDestroy, FilterCreate,
FilterKeyMatch, NULL, FilterName);
} else if (run == 1) {
policy = rocksdb_filterpolicy_create_bloom(8);
policy = rocksdb_filterpolicy_create_bloom(8.0);
} else if (run == 2) {
policy = rocksdb_filterpolicy_create_bloom_full(8);
policy = rocksdb_filterpolicy_create_bloom_full(8.0);
} else if (run == 3) {
policy = rocksdb_filterpolicy_create_ribbon(8.0);
} else {
policy = rocksdb_filterpolicy_create_ribbon(8);
policy = rocksdb_filterpolicy_create_ribbon_hybrid(8.0, 1);
}
rocksdb_block_based_options_set_filter_policy(table_options, policy);

Expand Down Expand Up @@ -1123,7 +1127,7 @@ int main(int argc, char** argv) {
} else if (run == 1) {
// Essentially a fingerprint of the block-based Bloom schema
CheckCondition(hits == 241);
} else if (run == 2) {
} else if (run == 2 || run == 4) {
// Essentially a fingerprint of full Bloom schema, format_version=5
CheckCondition(hits == 188);
} else {
Expand Down
2 changes: 1 addition & 1 deletion db_stress_tool/db_stress_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ DECLARE_bool(enable_write_thread_adaptive_yield);
DECLARE_int32(reopen);
DECLARE_double(bloom_bits);
DECLARE_bool(use_block_based_filter);
DECLARE_bool(use_ribbon_filter);
DECLARE_int32(ribbon_starting_level);
DECLARE_bool(partition_filters);
DECLARE_bool(optimize_filters_for_memory);
DECLARE_int32(index_type);
Expand Down
7 changes: 5 additions & 2 deletions db_stress_tool/db_stress_gflags.cc
Original file line number Diff line number Diff line change
Expand Up @@ -419,8 +419,11 @@ DEFINE_bool(use_block_based_filter, false,
"use block based filter"
"instead of full filter for block based table");

DEFINE_bool(use_ribbon_filter, false,
"Use Ribbon filter instead of Bloom filter");
DEFINE_int32(
ribbon_starting_level, 999,
"Use Bloom filter on levels below specified and Ribbon beginning on level "
"specified. Flush is considered level -1. 999 or more -> always Bloom. 0 "
"-> Ribbon except Bloom for flush. -1 -> always Ribbon.");

DEFINE_bool(partition_filters, false,
"use partitioned filters "
Expand Down
22 changes: 12 additions & 10 deletions db_stress_tool/db_stress_test_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,21 @@ std::shared_ptr<const FilterPolicy> CreateFilterPolicy() {
return BlockBasedTableOptions().filter_policy;
}
const FilterPolicy* new_policy;
if (FLAGS_use_ribbon_filter) {
// Old and new API should be same
if (std::random_device()() & 1) {
new_policy = NewExperimentalRibbonFilterPolicy(FLAGS_bloom_bits);
if (FLAGS_use_block_based_filter) {
if (FLAGS_ribbon_starting_level < 999) {
fprintf(
stderr,
"Cannot combine use_block_based_filter and ribbon_starting_level\n");
exit(1);
} else {
new_policy = NewRibbonFilterPolicy(FLAGS_bloom_bits);
}
} else {
if (FLAGS_use_block_based_filter) {
new_policy = NewBloomFilterPolicy(FLAGS_bloom_bits, true);
} else {
new_policy = NewBloomFilterPolicy(FLAGS_bloom_bits, false);
}
} else if (FLAGS_ribbon_starting_level >= 999) {
// Use Bloom API
new_policy = NewBloomFilterPolicy(FLAGS_bloom_bits, false);
} else {
new_policy = NewRibbonFilterPolicy(
FLAGS_bloom_bits, /* bloom_before_level */ FLAGS_ribbon_starting_level);
}
return std::shared_ptr<const FilterPolicy>(new_policy);
}
Expand Down
9 changes: 6 additions & 3 deletions include/rocksdb/c.h
Original file line number Diff line number Diff line change
Expand Up @@ -1599,11 +1599,14 @@ extern ROCKSDB_LIBRARY_API void rocksdb_filterpolicy_destroy(
rocksdb_filterpolicy_t*);

extern ROCKSDB_LIBRARY_API rocksdb_filterpolicy_t*
rocksdb_filterpolicy_create_bloom(int bits_per_key);
rocksdb_filterpolicy_create_bloom(double bits_per_key);
extern ROCKSDB_LIBRARY_API rocksdb_filterpolicy_t*
rocksdb_filterpolicy_create_bloom_full(int bits_per_key);
rocksdb_filterpolicy_create_bloom_full(double bits_per_key);
extern ROCKSDB_LIBRARY_API rocksdb_filterpolicy_t*
rocksdb_filterpolicy_create_ribbon(int bloom_equivalent_bits_per_key);
rocksdb_filterpolicy_create_ribbon(double bloom_equivalent_bits_per_key);
extern ROCKSDB_LIBRARY_API rocksdb_filterpolicy_t*
rocksdb_filterpolicy_create_ribbon_hybrid(double bloom_equivalent_bits_per_key,
int bloom_before_level);

/* Merge Operator */

Expand Down
20 changes: 17 additions & 3 deletions include/rocksdb/filter_policy.h
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,20 @@ extern const FilterPolicy* NewBloomFilterPolicy(
// you pass in 10 for bloom_equivalent_bits_per_key, you'll get the same
// 0.95% FP rate as Bloom filter but only using about 7 bits per key.
//
// The space savings of Ribbon filters makes sense for lower (higher
// numbered; larger; longer-lived) levels of LSM, whereas the speed of
// Bloom filters make sense for highest levels of LSM. Setting
// bloom_before_level allows for this design with Level and Universal
// compaction styles. For example, bloom_before_level=1 means that Bloom
// filters will be used in level 0, including flushes, and Ribbon
// filters elsewhere, including FIFO compaction and external SST files.
// For this option, memtable flushes are considered level -1 (so that
// flushes can be distinguished from intra-L0 compaction).
// bloom_before_level=0 (default) -> Generate Bloom filters only for
// flushes under Level and Universal compaction styles.
// bloom_before_level=-1 -> Always generate Ribbon filters (except in
// some extreme or exceptional cases).
//
// Ribbon filters are compatible with RocksDB >= 6.15.0. Earlier
// versions reading the data will behave as if no filter was used
// (degraded performance until compaction rebuilds filters). All
Expand All @@ -266,12 +280,12 @@ extern const FilterPolicy* NewBloomFilterPolicy(
// Also consider using optimize_filters_for_memory to save filter
// memory.
extern const FilterPolicy* NewRibbonFilterPolicy(
double bloom_equivalent_bits_per_key);
double bloom_equivalent_bits_per_key, int bloom_before_level = 0);

// Old name
// Old name and old default behavior
inline const FilterPolicy* NewExperimentalRibbonFilterPolicy(
double bloom_equivalent_bits_per_key) {
return NewRibbonFilterPolicy(bloom_equivalent_bits_per_key);
return NewRibbonFilterPolicy(bloom_equivalent_bits_per_key, -1);
}

} // namespace ROCKSDB_NAMESPACE
39 changes: 37 additions & 2 deletions options/options_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -932,14 +932,49 @@ TEST_F(OptionsTest, GetBlockBasedTableOptionsFromString) {
new_opt.cache_index_and_filter_blocks);
ASSERT_EQ(table_opt.filter_policy, new_opt.filter_policy);

// Ribbon filter policy
// Ribbon filter policy (no Bloom hybrid)
ASSERT_OK(GetBlockBasedTableOptionsFromString(
config_options, table_opt, "filter_policy=ribbonfilter:5.678;",
config_options, table_opt, "filter_policy=ribbonfilter:5.678:-1;",
&new_opt));
ASSERT_TRUE(new_opt.filter_policy != nullptr);
bfp = dynamic_cast<const BloomFilterPolicy*>(new_opt.filter_policy.get());
EXPECT_EQ(bfp->GetMillibitsPerKey(), 5678);
EXPECT_EQ(bfp->GetMode(), BloomFilterPolicy::kStandard128Ribbon);

// Ribbon filter policy (default Bloom hybrid)
ASSERT_OK(GetBlockBasedTableOptionsFromString(
config_options, table_opt, "filter_policy=ribbonfilter:6.789;",
&new_opt));
ASSERT_TRUE(new_opt.filter_policy != nullptr);
auto ltfp = dynamic_cast<const LevelThresholdFilterPolicy*>(
new_opt.filter_policy.get());
EXPECT_EQ(ltfp->TEST_GetStartingLevelForB(), 0);

bfp = dynamic_cast<const BloomFilterPolicy*>(ltfp->TEST_GetPolicyA());
EXPECT_EQ(bfp->GetMillibitsPerKey(), 6789);
EXPECT_EQ(bfp->GetMode(), BloomFilterPolicy::kFastLocalBloom);

bfp = dynamic_cast<const BloomFilterPolicy*>(ltfp->TEST_GetPolicyB());
EXPECT_EQ(bfp->GetMillibitsPerKey(), 6789);
EXPECT_EQ(bfp->GetMode(), BloomFilterPolicy::kStandard128Ribbon);

// Ribbon filter policy (custom Bloom hybrid)
ASSERT_OK(GetBlockBasedTableOptionsFromString(
config_options, table_opt, "filter_policy=ribbonfilter:6.789:5;",
&new_opt));
ASSERT_TRUE(new_opt.filter_policy != nullptr);
ltfp = dynamic_cast<const LevelThresholdFilterPolicy*>(
new_opt.filter_policy.get());
EXPECT_EQ(ltfp->TEST_GetStartingLevelForB(), 5);

bfp = dynamic_cast<const BloomFilterPolicy*>(ltfp->TEST_GetPolicyA());
EXPECT_EQ(bfp->GetMillibitsPerKey(), 6789);
EXPECT_EQ(bfp->GetMode(), BloomFilterPolicy::kFastLocalBloom);

bfp = dynamic_cast<const BloomFilterPolicy*>(ltfp->TEST_GetPolicyB());
EXPECT_EQ(bfp->GetMillibitsPerKey(), 6789);
EXPECT_EQ(bfp->GetMode(), BloomFilterPolicy::kStandard128Ribbon);

// Old name
ASSERT_OK(GetBlockBasedTableOptionsFromString(
config_options, table_opt, "filter_policy=experimental_ribbon:6.789;",
Expand Down
Loading

0 comments on commit 2a383f2

Please sign in to comment.