From ec4ebeff307ea163989d98c28c734a7406f97e77 Mon Sep 17 00:00:00 2001 From: Gang Liao Date: Sun, 17 Jul 2022 07:13:59 -0700 Subject: [PATCH] Support prepopulating/warming the blob cache (#10298) Summary: Many workloads have temporal locality, where recently written items are read back in a short period of time. When using remote file systems, this is inefficient since it involves network traffic and higher latencies. Because of this, we would like to support prepopulating the blob cache during flush. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10298 Reviewed By: ltamasi Differential Revision: D37908743 Pulled By: gangliao fbshipit-source-id: 9feaed234bc719d38f0c02975c1ad19fa4bb37d1 --- HISTORY.md | 7 +- db/blob/blob_file_builder.cc | 80 ++++++++++-- db/blob/blob_file_builder.h | 13 +- db/blob/blob_file_builder_test.cc | 35 +++-- db/blob/blob_source.cc | 5 +- db/blob/db_blob_basic_test.cc | 120 ++++++++++++++++++ db/builder.cc | 8 +- db/c.cc | 9 ++ db/c_test.c | 3 + db/compaction/compaction_job.cc | 8 +- db_stress_tool/db_stress_common.h | 1 + db_stress_tool/db_stress_gflags.cc | 4 + db_stress_tool/db_stress_test_base.cc | 20 ++- include/rocksdb/advanced_options.h | 19 +++ include/rocksdb/c.h | 11 ++ include/rocksdb/utilities/ldb_cmd.h | 1 + java/CMakeLists.txt | 1 + java/rocksjni/options.cc | 53 ++++++++ java/rocksjni/portal.h | 34 +++++ .../org/rocksdb/AbstractMutableOptions.java | 14 +- ...edMutableColumnFamilyOptionsInterface.java | 23 ++++ .../java/org/rocksdb/ColumnFamilyOptions.java | 34 +++++ .../rocksdb/MutableColumnFamilyOptions.java | 14 +- java/src/main/java/org/rocksdb/Options.java | 14 ++ .../org/rocksdb/PrepopulateBlobCache.java | 117 +++++++++++++++++ .../java/org/rocksdb/BlobOptionsTest.java | 28 +++- .../MutableColumnFamilyOptionsTest.java | 6 +- .../test/java/org/rocksdb/OptionsTest.java | 12 ++ options/cf_options.cc | 9 +- options/cf_options.h | 3 + options/options.cc | 8 +- options/options_helper.cc | 6 + options/options_helper.h | 4 + options/options_settable_test.cc | 1 + options/options_test.cc | 4 + tools/benchmark.sh | 5 +- tools/db_bench_tool.cc | 28 +++- tools/db_bench_tool_test.cc | 1 + tools/db_crashtest.py | 1 + tools/ldb_cmd.cc | 20 +++ tools/run_blob_bench.sh | 4 + 41 files changed, 730 insertions(+), 58 deletions(-) create mode 100644 java/src/main/java/org/rocksdb/PrepopulateBlobCache.java diff --git a/HISTORY.md b/HISTORY.md index cb10fdd806..0ad4471d4c 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -8,8 +8,9 @@ * Provide support for ReadOptions.async_io with direct_io to improve Seek latency by using async IO to parallelize child iterator seek and doing asynchronous prefetching on sequential scans. * Added support for blob caching in order to cache frequently used blobs for BlobDB. * User can configure the new ColumnFamilyOptions `blob_cache` to enable/disable blob caching. - * Either sharing the backend cache with the block cache or using a completely separate cache is supported. - * A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`). + * Either sharing the backend cache with the block cache or using a completely separate cache is supported. + * A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`). + * Added `prepopulate_blob_cache` to ColumnFamilyOptions. If enabled, prepopulate warm/hot blobs which are already in memory into blob cache at the time of flush. On a flush, the blob that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this blob back into memory again, which is avoided by enabling this option. This further helps if the workload exhibits high temporal locality, where most of the reads go to recently written data. This also helps in case of the remote file system since it involves network traffic and higher latencies. * Support using secondary cache with the blob cache. When creating a blob cache, the user can set a secondary blob cache by configuring `secondary_cache` in LRUCacheOptions. * Add experimental tiered compaction feature `AdvancedColumnFamilyOptions::preclude_last_level_data_seconds`, which makes sure the new data inserted within preclude_last_level_data_seconds won't be placed on cold tier (the feature is not complete). @@ -23,6 +24,8 @@ * When using block cache strict capacity limit (`LRUCache` with `strict_capacity_limit=true`), DB operations now fail with Status code `kAborted` subcode `kMemoryLimit` (`IsMemoryLimit()`) instead of `kIncomplete` (`IsIncomplete()`) when the capacity limit is reached, because Incomplete can mean other specific things for some operations. In more detail, `Cache::Insert()` now returns the updated Status code and this usually propagates through RocksDB to the user on failure. * NewClockCache calls temporarily return an LRUCache (with similar characteristics as the desired ClockCache). This is because ClockCache is being replaced by a new version (the old one had unknown bugs) but this is still under development. * Add two functions `int ReserveThreads(int threads_to_be_reserved)` and `int ReleaseThreads(threads_to_be_released)` into `Env` class. In the default implementation, both return 0. Newly added `xxxEnv` class that inherits `Env` should implement these two functions for thread reservation/releasing features. +* Add `rocksdb_options_get_prepopulate_blob_cache` and `rocksdb_options_set_prepopulate_blob_cache` to C API. +* Add `prepopulateBlobCache` and `setPrepopulateBlobCache` to Java API. ### Bug Fixes * Fix a bug in which backup/checkpoint can include a WAL deleted by RocksDB. diff --git a/db/blob/blob_file_builder.cc b/db/blob/blob_file_builder.cc index 48817984ab..0e6fa46aa6 100644 --- a/db/blob/blob_file_builder.cc +++ b/db/blob/blob_file_builder.cc @@ -32,9 +32,9 @@ BlobFileBuilder::BlobFileBuilder( VersionSet* versions, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, const FileOptions* file_options, - int job_id, uint32_t column_family_id, - const std::string& column_family_name, Env::IOPriority io_priority, - Env::WriteLifeTimeHint write_hint, + std::string db_id, std::string db_session_id, int job_id, + uint32_t column_family_id, const std::string& column_family_name, + Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint, const std::shared_ptr& io_tracer, BlobFileCompletionCallback* blob_callback, BlobFileCreationReason creation_reason, @@ -42,17 +42,18 @@ BlobFileBuilder::BlobFileBuilder( std::vector* blob_file_additions) : BlobFileBuilder([versions]() { return versions->NewFileNumber(); }, fs, immutable_options, mutable_cf_options, file_options, - job_id, column_family_id, column_family_name, io_priority, - write_hint, io_tracer, blob_callback, creation_reason, - blob_file_paths, blob_file_additions) {} + db_id, db_session_id, job_id, column_family_id, + column_family_name, io_priority, write_hint, io_tracer, + blob_callback, creation_reason, blob_file_paths, + blob_file_additions) {} BlobFileBuilder::BlobFileBuilder( std::function file_number_generator, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, const FileOptions* file_options, - int job_id, uint32_t column_family_id, - const std::string& column_family_name, Env::IOPriority io_priority, - Env::WriteLifeTimeHint write_hint, + std::string db_id, std::string db_session_id, int job_id, + uint32_t column_family_id, const std::string& column_family_name, + Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint, const std::shared_ptr& io_tracer, BlobFileCompletionCallback* blob_callback, BlobFileCreationReason creation_reason, @@ -64,7 +65,10 @@ BlobFileBuilder::BlobFileBuilder( min_blob_size_(mutable_cf_options->min_blob_size), blob_file_size_(mutable_cf_options->blob_file_size), blob_compression_type_(mutable_cf_options->blob_compression_type), + prepopulate_blob_cache_(mutable_cf_options->prepopulate_blob_cache), file_options_(file_options), + db_id_(std::move(db_id)), + db_session_id_(std::move(db_session_id)), job_id_(job_id), column_family_id_(column_family_id), column_family_name_(column_family_name), @@ -133,6 +137,16 @@ Status BlobFileBuilder::Add(const Slice& key, const Slice& value, } } + { + const Status s = + PutBlobIntoCacheIfNeeded(value, blob_file_number, blob_offset); + if (!s.ok()) { + ROCKS_LOG_WARN(immutable_options_->info_log, + "Failed to pre-populate the blob into blob cache: %s", + s.ToString().c_str()); + } + } + BlobIndex::EncodeBlob(blob_index, blob_file_number, blob_offset, blob.size(), blob_compression_type_); @@ -372,4 +386,52 @@ void BlobFileBuilder::Abandon(const Status& s) { blob_count_ = 0; blob_bytes_ = 0; } + +Status BlobFileBuilder::PutBlobIntoCacheIfNeeded(const Slice& blob, + uint64_t blob_file_number, + uint64_t blob_offset) const { + Status s = Status::OK(); + + auto blob_cache = immutable_options_->blob_cache; + auto statistics = immutable_options_->statistics.get(); + bool warm_cache = + prepopulate_blob_cache_ == PrepopulateBlobCache::kFlushOnly && + creation_reason_ == BlobFileCreationReason::kFlush; + + if (blob_cache && warm_cache) { + // The blob file during flush is unknown to be exactly how big it is. + // Therefore, we set the file size to kMaxOffsetStandardEncoding. For any + // max_offset <= this value, the same encoding scheme is guaranteed. + const OffsetableCacheKey base_cache_key( + db_id_, db_session_id_, blob_file_number, + OffsetableCacheKey::kMaxOffsetStandardEncoding); + const CacheKey cache_key = base_cache_key.WithOffset(blob_offset); + const Slice key = cache_key.AsSlice(); + + const Cache::Priority priority = Cache::Priority::LOW; + + // Objects to be put into the cache have to be heap-allocated and + // self-contained, i.e. own their contents. The Cache has to be able to + // take unique ownership of them. Therefore, we copy the blob into a + // string directly, and insert that into the cache. + std::unique_ptr buf = std::make_unique(); + buf->assign(blob.data(), blob.size()); + + // TODO: support custom allocators and provide a better estimated memory + // usage using malloc_usable_size. + s = blob_cache->Insert(key, buf.get(), buf->size(), + &DeleteCacheEntry, + nullptr /* cache_handle */, priority); + if (s.ok()) { + RecordTick(statistics, BLOB_DB_CACHE_ADD); + RecordTick(statistics, BLOB_DB_CACHE_BYTES_WRITE, buf->size()); + buf.release(); + } else { + RecordTick(statistics, BLOB_DB_CACHE_ADD_FAILURES); + } + } + + return s; +} + } // namespace ROCKSDB_NAMESPACE diff --git a/db/blob/blob_file_builder.h b/db/blob/blob_file_builder.h index 745af20eb4..8e7aab502d 100644 --- a/db/blob/blob_file_builder.h +++ b/db/blob/blob_file_builder.h @@ -10,6 +10,7 @@ #include #include +#include "rocksdb/advanced_options.h" #include "rocksdb/compression_type.h" #include "rocksdb/env.h" #include "rocksdb/rocksdb_namespace.h" @@ -35,7 +36,8 @@ class BlobFileBuilder { BlobFileBuilder(VersionSet* versions, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, - const FileOptions* file_options, int job_id, + const FileOptions* file_options, std::string db_id, + std::string db_session_id, int job_id, uint32_t column_family_id, const std::string& column_family_name, Env::IOPriority io_priority, @@ -49,7 +51,8 @@ class BlobFileBuilder { BlobFileBuilder(std::function file_number_generator, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, - const FileOptions* file_options, int job_id, + const FileOptions* file_options, std::string db_id, + std::string db_session_id, int job_id, uint32_t column_family_id, const std::string& column_family_name, Env::IOPriority io_priority, @@ -78,13 +81,19 @@ class BlobFileBuilder { Status CloseBlobFile(); Status CloseBlobFileIfNeeded(); + Status PutBlobIntoCacheIfNeeded(const Slice& blob, uint64_t blob_file_number, + uint64_t blob_offset) const; + std::function file_number_generator_; FileSystem* fs_; const ImmutableOptions* immutable_options_; uint64_t min_blob_size_; uint64_t blob_file_size_; CompressionType blob_compression_type_; + PrepopulateBlobCache prepopulate_blob_cache_; const FileOptions* file_options_; + const std::string db_id_; + const std::string db_session_id_; int job_id_; uint32_t column_family_id_; std::string column_family_name_; diff --git a/db/blob/blob_file_builder_test.cc b/db/blob/blob_file_builder_test.cc index 4b30fcfbfb..6636819662 100644 --- a/db/blob/blob_file_builder_test.cc +++ b/db/blob/blob_file_builder_test.cc @@ -144,8 +144,9 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckOneFile) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); std::vector> expected_key_value_pairs( @@ -228,8 +229,9 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckMultipleFiles) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); std::vector> expected_key_value_pairs( @@ -315,8 +317,9 @@ TEST_F(BlobFileBuilderTest, InlinedValues) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); for (size_t i = 0; i < number_of_blobs; ++i) { @@ -369,8 +372,9 @@ TEST_F(BlobFileBuilderTest, Compression) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); const std::string key("1"); @@ -452,8 +456,9 @@ TEST_F(BlobFileBuilderTest, CompressionError) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); SyncPoint::GetInstance()->SetCallBack("CompressData:TamperWithReturnValue", @@ -531,8 +536,9 @@ TEST_F(BlobFileBuilderTest, Checksum) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); const std::string key("1"); @@ -628,8 +634,9 @@ TEST_P(BlobFileBuilderIOErrorTest, IOError) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); SyncPoint::GetInstance()->SetCallBack(sync_point_, [this](void* arg) { diff --git a/db/blob/blob_source.cc b/db/blob/blob_source.cc index 70595ef06a..c02fae9df9 100644 --- a/db/blob/blob_source.cc +++ b/db/blob/blob_source.cc @@ -63,15 +63,16 @@ Status BlobSource::PutBlobIntoCache(const Slice& cache_key, // self-contained, i.e. own their contents. The Cache has to be able to take // unique ownership of them. Therefore, we copy the blob into a string // directly, and insert that into the cache. - std::string* buf = new std::string(); + std::unique_ptr buf = std::make_unique(); buf->assign(blob->data(), blob->size()); // TODO: support custom allocators and provide a better estimated memory // usage using malloc_usable_size. Cache::Handle* cache_handle = nullptr; - s = InsertEntryIntoCache(cache_key, buf, buf->size(), &cache_handle, + s = InsertEntryIntoCache(cache_key, buf.get(), buf->size(), &cache_handle, priority); if (s.ok()) { + buf.release(); assert(cache_handle != nullptr); *cached_blob = CacheHandleGuard(blob_cache_.get(), cache_handle); diff --git a/db/blob/db_blob_basic_test.cc b/db/blob/db_blob_basic_test.cc index 417e5c7399..18ab1ef7df 100644 --- a/db/blob/db_blob_basic_test.cc +++ b/db/blob/db_blob_basic_test.cc @@ -1432,6 +1432,126 @@ TEST_P(DBBlobBasicIOErrorTest, CompactionFilterReadBlob_IOError) { SyncPoint::GetInstance()->ClearAllCallBacks(); } +TEST_F(DBBlobBasicTest, WarmCacheWithBlobsDuringFlush) { + Options options = GetDefaultOptions(); + + LRUCacheOptions co; + co.capacity = 1 << 25; + co.num_shard_bits = 2; + co.metadata_charge_policy = kDontChargeCacheMetadata; + auto backing_cache = NewLRUCache(co); + + options.blob_cache = backing_cache; + + BlockBasedTableOptions block_based_options; + block_based_options.no_block_cache = false; + block_based_options.block_cache = backing_cache; + block_based_options.cache_index_and_filter_blocks = true; + options.table_factory.reset(NewBlockBasedTableFactory(block_based_options)); + + options.enable_blob_files = true; + options.create_if_missing = true; + options.disable_auto_compactions = true; + options.enable_blob_garbage_collection = true; + options.blob_garbage_collection_age_cutoff = 1.0; + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics(); + + DestroyAndReopen(options); + + constexpr size_t kNumBlobs = 10; + constexpr size_t kValueSize = 100; + + std::string value(kValueSize, 'a'); + + for (size_t i = 1; i <= kNumBlobs; i++) { + ASSERT_OK(Put(std::to_string(i), value)); + ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap + ASSERT_OK(Flush()); + ASSERT_EQ(i * 2, options.statistics->getTickerCount(BLOB_DB_CACHE_ADD)); + ASSERT_EQ(value, Get(std::to_string(i))); + ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs))); + ASSERT_EQ(0, options.statistics->getTickerCount(BLOB_DB_CACHE_MISS)); + ASSERT_EQ(i * 2, options.statistics->getTickerCount(BLOB_DB_CACHE_HIT)); + } + + // Verify compaction not counted + ASSERT_OK(db_->CompactRange(CompactRangeOptions(), /*begin=*/nullptr, + /*end=*/nullptr)); + EXPECT_EQ(kNumBlobs * 2, + options.statistics->getTickerCount(BLOB_DB_CACHE_ADD)); +} + +#ifndef ROCKSDB_LITE +TEST_F(DBBlobBasicTest, DynamicallyWarmCacheDuringFlush) { + Options options = GetDefaultOptions(); + + LRUCacheOptions co; + co.capacity = 1 << 25; + co.num_shard_bits = 2; + co.metadata_charge_policy = kDontChargeCacheMetadata; + auto backing_cache = NewLRUCache(co); + + options.blob_cache = backing_cache; + + BlockBasedTableOptions block_based_options; + block_based_options.no_block_cache = false; + block_based_options.block_cache = backing_cache; + block_based_options.cache_index_and_filter_blocks = true; + options.table_factory.reset(NewBlockBasedTableFactory(block_based_options)); + + options.enable_blob_files = true; + options.create_if_missing = true; + options.disable_auto_compactions = true; + options.enable_blob_garbage_collection = true; + options.blob_garbage_collection_age_cutoff = 1.0; + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics(); + + DestroyAndReopen(options); + + constexpr size_t kNumBlobs = 10; + constexpr size_t kValueSize = 100; + + std::string value(kValueSize, 'a'); + + for (size_t i = 1; i <= 5; i++) { + ASSERT_OK(Put(std::to_string(i), value)); + ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap + ASSERT_OK(Flush()); + ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + + ASSERT_EQ(value, Get(std::to_string(i))); + ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs))); + ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + ASSERT_EQ(0, + options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_MISS)); + ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_HIT)); + } + + ASSERT_OK(dbfull()->SetOptions({{"prepopulate_blob_cache", "kDisable"}})); + + for (size_t i = 6; i <= kNumBlobs; i++) { + ASSERT_OK(Put(std::to_string(i), value)); + ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap + ASSERT_OK(Flush()); + ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + + ASSERT_EQ(value, Get(std::to_string(i))); + ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs))); + ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + ASSERT_EQ(2, + options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_MISS)); + ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_HIT)); + } + + // Verify compaction not counted + ASSERT_OK(db_->CompactRange(CompactRangeOptions(), /*begin=*/nullptr, + /*end=*/nullptr)); + EXPECT_EQ(0, options.statistics->getTickerCount(BLOB_DB_CACHE_ADD)); +} +#endif // !ROCKSDB_LITE + } // namespace ROCKSDB_NAMESPACE int main(int argc, char** argv) { diff --git a/db/builder.cc b/db/builder.cc index 7d8244af90..03760ec910 100644 --- a/db/builder.cc +++ b/db/builder.cc @@ -188,10 +188,10 @@ Status BuildTable( blob_file_additions) ? new BlobFileBuilder( versions, fs, &ioptions, &mutable_cf_options, &file_options, - job_id, tboptions.column_family_id, - tboptions.column_family_name, io_priority, write_hint, - io_tracer, blob_callback, blob_creation_reason, - &blob_file_paths, blob_file_additions) + tboptions.db_id, tboptions.db_session_id, job_id, + tboptions.column_family_id, tboptions.column_family_name, + io_priority, write_hint, io_tracer, blob_callback, + blob_creation_reason, &blob_file_paths, blob_file_additions) : nullptr); const std::atomic kManualCompactionCanceledFalse{false}; diff --git a/db/c.cc b/db/c.cc index de5744ee4b..4ac5e57975 100644 --- a/db/c.cc +++ b/db/c.cc @@ -99,6 +99,7 @@ using ROCKSDB_NAMESPACE::Options; using ROCKSDB_NAMESPACE::PerfContext; using ROCKSDB_NAMESPACE::PerfLevel; using ROCKSDB_NAMESPACE::PinnableSlice; +using ROCKSDB_NAMESPACE::PrepopulateBlobCache; using ROCKSDB_NAMESPACE::RandomAccessFile; using ROCKSDB_NAMESPACE::Range; using ROCKSDB_NAMESPACE::RateLimiter; @@ -3140,6 +3141,14 @@ void rocksdb_options_set_blob_cache(rocksdb_options_t* opt, opt->rep.blob_cache = blob_cache->rep; } +void rocksdb_options_set_prepopulate_blob_cache(rocksdb_options_t* opt, int t) { + opt->rep.prepopulate_blob_cache = static_cast(t); +} + +int rocksdb_options_get_prepopulate_blob_cache(rocksdb_options_t* opt) { + return static_cast(opt->rep.prepopulate_blob_cache); +} + void rocksdb_options_set_num_levels(rocksdb_options_t* opt, int n) { opt->rep.num_levels = n; } diff --git a/db/c_test.c b/db/c_test.c index 343ce3cf3e..12d6fd1431 100644 --- a/db/c_test.c +++ b/db/c_test.c @@ -2068,6 +2068,9 @@ int main(int argc, char** argv) { rocksdb_options_set_blob_file_starting_level(o, 5); CheckCondition(5 == rocksdb_options_get_blob_file_starting_level(o)); + rocksdb_options_set_prepopulate_blob_cache(o, 1 /* flush only */); + CheckCondition(1 == rocksdb_options_get_prepopulate_blob_cache(o)); + // Create a copy that should be equal to the original. rocksdb_options_t* copy; copy = rocksdb_options_create_copy(o); diff --git a/db/compaction/compaction_job.cc b/db/compaction/compaction_job.cc index e5334cb99d..b914f5e9d6 100644 --- a/db/compaction/compaction_job.cc +++ b/db/compaction/compaction_job.cc @@ -999,10 +999,10 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) { ? new BlobFileBuilder( versions_, fs_.get(), sub_compact->compaction->immutable_options(), - mutable_cf_options, &file_options_, job_id_, cfd->GetID(), - cfd->GetName(), Env::IOPriority::IO_LOW, write_hint_, - io_tracer_, blob_callback_, BlobFileCreationReason::kCompaction, - &blob_file_paths, + mutable_cf_options, &file_options_, db_id_, db_session_id_, + job_id_, cfd->GetID(), cfd->GetName(), Env::IOPriority::IO_LOW, + write_hint_, io_tracer_, blob_callback_, + BlobFileCreationReason::kCompaction, &blob_file_paths, sub_compact->Current().GetBlobFileAdditionsPtr()) : nullptr); diff --git a/db_stress_tool/db_stress_common.h b/db_stress_tool/db_stress_common.h index b822518eb6..c445f1e9d1 100644 --- a/db_stress_tool/db_stress_common.h +++ b/db_stress_tool/db_stress_common.h @@ -272,6 +272,7 @@ DECLARE_bool(use_blob_cache); DECLARE_bool(use_shared_block_and_blob_cache); DECLARE_uint64(blob_cache_size); DECLARE_int32(blob_cache_numshardbits); +DECLARE_int32(prepopulate_blob_cache); DECLARE_int32(approximate_size_one_in); DECLARE_bool(sync_fault_injection); diff --git a/db_stress_tool/db_stress_gflags.cc b/db_stress_tool/db_stress_gflags.cc index db7837e204..5bfccd2cd0 100644 --- a/db_stress_tool/db_stress_gflags.cc +++ b/db_stress_tool/db_stress_gflags.cc @@ -474,6 +474,10 @@ DEFINE_int32(blob_cache_numshardbits, 6, "the block and blob caches are different " "(use_shared_block_and_blob_cache = false)."); +DEFINE_int32(prepopulate_blob_cache, 0, + "[Integrated BlobDB] Pre-populate hot/warm blobs in blob cache. 0 " + "to disable and 1 to insert during flush."); + static const bool FLAGS_subcompactions_dummy __attribute__((__unused__)) = RegisterFlagValidator(&FLAGS_subcompactions, &ValidateUint32Range); diff --git a/db_stress_tool/db_stress_test_base.cc b/db_stress_tool/db_stress_test_base.cc index ee9baddfd7..3847d6c8a0 100644 --- a/db_stress_tool/db_stress_test_base.cc +++ b/db_stress_tool/db_stress_test_base.cc @@ -270,6 +270,8 @@ bool StressTest::BuildOptionsTable() { std::vector{"0", "1M", "4M"}); options_tbl.emplace("blob_file_starting_level", std::vector{"0", "1", "2"}); + options_tbl.emplace("prepopulate_blob_cache", + std::vector{"kDisable", "kFlushOnly"}); } options_table_ = std::move(options_tbl); @@ -2401,9 +2403,12 @@ void StressTest::Open(SharedState* shared) { fprintf(stdout, "Integrated BlobDB: blob cache enabled, block and blob caches " "shared: %d, blob cache size %" PRIu64 - ", blob cache num shard bits: %d\n", + ", blob cache num shard bits: %d, blob cache prepopulated: %s\n", FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size, - FLAGS_blob_cache_numshardbits); + FLAGS_blob_cache_numshardbits, + options_.prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly + ? "flush only" + : "disable"); } else { fprintf(stdout, "Integrated BlobDB: blob cache disabled\n"); } @@ -3043,6 +3048,17 @@ void InitializeOptionsFromFlags( exit(1); } } + switch (FLAGS_prepopulate_blob_cache) { + case 0: + options.prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + break; + case 1: + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + break; + default: + fprintf(stderr, "Unknown prepopulate blob cache mode\n"); + exit(1); + } } options.wal_compression = diff --git a/include/rocksdb/advanced_options.h b/include/rocksdb/advanced_options.h index 32598d04d8..cd2582e8a8 100644 --- a/include/rocksdb/advanced_options.h +++ b/include/rocksdb/advanced_options.h @@ -246,6 +246,11 @@ enum UpdateStatus { // Return status For inplace update callback UPDATED = 2, // No inplace update. Merged value set }; +enum class PrepopulateBlobCache : uint8_t { + kDisable = 0x0, // Disable prepopulate blob cache + kFlushOnly = 0x1, // Prepopulate blobs during flush only +}; + struct AdvancedColumnFamilyOptions { // The maximum number of write buffers that are built up in memory. // The default and the minimum number is 2, so that when 1 write buffer @@ -992,6 +997,20 @@ struct AdvancedColumnFamilyOptions { // Default: nullptr (disabled) std::shared_ptr blob_cache = nullptr; + // If enabled, prepopulate warm/hot blobs which are already in memory into + // blob cache at the time of flush. On a flush, the blob that is in memory (in + // memtables) get flushed to the device. If using Direct IO, additional IO is + // incurred to read this blob back into memory again, which is avoided by + // enabling this option. This further helps if the workload exhibits high + // temporal locality, where most of the reads go to recently written data. + // This also helps in case of the remote file system since it involves network + // traffic and higher latencies. + // + // Default: disabled + // + // Dynamically changeable through the SetOptions() API + PrepopulateBlobCache prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + // Create ColumnFamilyOptions with default values for all fields AdvancedColumnFamilyOptions(); // Create ColumnFamilyOptions from Options diff --git a/include/rocksdb/c.h b/include/rocksdb/c.h index 3c42f904e6..8cc51b57cb 100644 --- a/include/rocksdb/c.h +++ b/include/rocksdb/c.h @@ -1302,6 +1302,17 @@ extern ROCKSDB_LIBRARY_API int rocksdb_options_get_blob_file_starting_level( extern ROCKSDB_LIBRARY_API void rocksdb_options_set_blob_cache( rocksdb_options_t* opt, rocksdb_cache_t* blob_cache); +enum { + rocksdb_prepopulate_blob_disable = 0, + rocksdb_prepopulate_blob_flush_only = 1 +}; + +extern ROCKSDB_LIBRARY_API void rocksdb_options_set_prepopulate_blob_cache( + rocksdb_options_t* opt, int val); + +extern ROCKSDB_LIBRARY_API int rocksdb_options_get_prepopulate_blob_cache( + rocksdb_options_t* opt); + /* returns a pointer to a malloc()-ed, null terminated string */ extern ROCKSDB_LIBRARY_API char* rocksdb_options_statistics_get_string( rocksdb_options_t* opt); diff --git a/include/rocksdb/utilities/ldb_cmd.h b/include/rocksdb/utilities/ldb_cmd.h index cddb531e91..0076381925 100644 --- a/include/rocksdb/utilities/ldb_cmd.h +++ b/include/rocksdb/utilities/ldb_cmd.h @@ -70,6 +70,7 @@ class LDBCommand { static const std::string ARG_BLOB_GARBAGE_COLLECTION_FORCE_THRESHOLD; static const std::string ARG_BLOB_COMPACTION_READAHEAD_SIZE; static const std::string ARG_BLOB_FILE_STARTING_LEVEL; + static const std::string ARG_PREPOPULATE_BLOB_CACHE; static const std::string ARG_DECODE_BLOB_INDEX; static const std::string ARG_DUMP_UNCOMPRESSED_BLOBS; diff --git a/java/CMakeLists.txt b/java/CMakeLists.txt index 8eada17e88..5d62630fde 100644 --- a/java/CMakeLists.txt +++ b/java/CMakeLists.txt @@ -194,6 +194,7 @@ set(JAVA_MAIN_CLASSES src/main/java/org/rocksdb/OptionsUtil.java src/main/java/org/rocksdb/PersistentCache.java src/main/java/org/rocksdb/PlainTableConfig.java + src/main/java/org/rocksdb/PrepopulateBlobCache.java src/main/java/org/rocksdb/Priority.java src/main/java/org/rocksdb/Range.java src/main/java/org/rocksdb/RateLimiter.java diff --git a/java/rocksjni/options.cc b/java/rocksjni/options.cc index 6fc232c7f2..34eb900b32 100644 --- a/java/rocksjni/options.cc +++ b/java/rocksjni/options.cc @@ -3885,6 +3885,31 @@ jint Java_org_rocksdb_Options_blobFileStartingLevel(JNIEnv*, jobject, return static_cast(opts->blob_file_starting_level); } +/* + * Class: org_rocksdb_Options + * Method: setPrepopulateBlobCache + * Signature: (JB)V + */ +void Java_org_rocksdb_Options_setPrepopulateBlobCache( + JNIEnv*, jobject, jlong jhandle, jbyte jprepopulate_blob_cache_value) { + auto* opts = reinterpret_cast(jhandle); + opts->prepopulate_blob_cache = + ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toCppPrepopulateBlobCache( + jprepopulate_blob_cache_value); +} + +/* + * Class: org_rocksdb_Options + * Method: prepopulateBlobCache + * Signature: (J)B + */ +jbyte Java_org_rocksdb_Options_prepopulateBlobCache(JNIEnv*, jobject, + jlong jhandle) { + auto* opts = reinterpret_cast(jhandle); + return ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toJavaPrepopulateBlobCache( + opts->prepopulate_blob_cache); +} + ////////////////////////////////////////////////////////////////////////////// // ROCKSDB_NAMESPACE::ColumnFamilyOptions @@ -5717,6 +5742,34 @@ jint Java_org_rocksdb_ColumnFamilyOptions_blobFileStartingLevel(JNIEnv*, return static_cast(opts->blob_file_starting_level); } +/* + * Class: org_rocksdb_ColumnFamilyOptions + * Method: setPrepopulateBlobCache + * Signature: (JB)V + */ +void Java_org_rocksdb_ColumnFamilyOptions_setPrepopulateBlobCache( + JNIEnv*, jobject, jlong jhandle, jbyte jprepopulate_blob_cache_value) { + auto* opts = + reinterpret_cast(jhandle); + opts->prepopulate_blob_cache = + ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toCppPrepopulateBlobCache( + jprepopulate_blob_cache_value); +} + +/* + * Class: org_rocksdb_ColumnFamilyOptions + * Method: prepopulateBlobCache + * Signature: (J)B + */ +jbyte Java_org_rocksdb_ColumnFamilyOptions_prepopulateBlobCache(JNIEnv*, + jobject, + jlong jhandle) { + auto* opts = + reinterpret_cast(jhandle); + return ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toJavaPrepopulateBlobCache( + opts->prepopulate_blob_cache); +} + ///////////////////////////////////////////////////////////////////// // ROCKSDB_NAMESPACE::DBOptions diff --git a/java/rocksjni/portal.h b/java/rocksjni/portal.h index bf2135070d..b9fde668ed 100644 --- a/java/rocksjni/portal.h +++ b/java/rocksjni/portal.h @@ -7894,6 +7894,40 @@ class SanityLevelJni { } }; +// The portal class for org.rocksdb.PrepopulateBlobCache +class PrepopulateBlobCacheJni { + public: + // Returns the equivalent org.rocksdb.PrepopulateBlobCache for the provided + // C++ ROCKSDB_NAMESPACE::PrepopulateBlobCache enum + static jbyte toJavaPrepopulateBlobCache( + ROCKSDB_NAMESPACE::PrepopulateBlobCache prepopulate_blob_cache) { + switch (prepopulate_blob_cache) { + case ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable: + return 0x0; + case ROCKSDB_NAMESPACE::PrepopulateBlobCache::kFlushOnly: + return 0x1; + default: + return 0x7f; // undefined + } + } + + // Returns the equivalent C++ ROCKSDB_NAMESPACE::PrepopulateBlobCache enum for + // the provided Java org.rocksdb.PrepopulateBlobCache + static ROCKSDB_NAMESPACE::PrepopulateBlobCache toCppPrepopulateBlobCache( + jbyte jprepopulate_blob_cache) { + switch (jprepopulate_blob_cache) { + case 0x0: + return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable; + case 0x1: + return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kFlushOnly; + case 0x7F: + default: + // undefined/default + return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable; + } + } +}; + // The portal class for org.rocksdb.AbstractListener.EnabledEventCallback class EnabledEventCallbackJni { public: diff --git a/java/src/main/java/org/rocksdb/AbstractMutableOptions.java b/java/src/main/java/org/rocksdb/AbstractMutableOptions.java index 243b8040ab..7189272b88 100644 --- a/java/src/main/java/org/rocksdb/AbstractMutableOptions.java +++ b/java/src/main/java/org/rocksdb/AbstractMutableOptions.java @@ -341,8 +341,18 @@ private U fromOptionString(final OptionString.Entry option, final boolean ignore return setIntArray(key, value); case ENUM: - final CompressionType compressionType = CompressionType.getFromInternal(valueStr); - return setEnum(key, compressionType); + String optionName = key.name(); + if (optionName.equals("prepopulate_blob_cache")) { + final PrepopulateBlobCache prepopulateBlobCache = + PrepopulateBlobCache.getFromInternal(valueStr); + return setEnum(key, prepopulateBlobCache); + } else if (optionName.equals("compression") + || optionName.equals("blob_compression_type")) { + final CompressionType compressionType = CompressionType.getFromInternal(valueStr); + return setEnum(key, compressionType); + } else { + throw new IllegalArgumentException("Unknown enum type: " + key.name()); + } default: throw new IllegalStateException(key + " has unknown value type: " + key.getValueType()); diff --git a/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java b/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java index 928750446b..162d15d80b 100644 --- a/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java +++ b/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java @@ -801,6 +801,29 @@ T setReportBgIoStats( */ int blobFileStartingLevel(); + /** + * Set a certain prepopulate blob cache option. + * + * Default: 0 + * + * Dynamically changeable through + * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)}. + * + * @param prepopulateBlobCache the prepopulate blob cache option + * + * @return the reference to the current options. + */ + T setPrepopulateBlobCache(final PrepopulateBlobCache prepopulateBlobCache); + + /** + * Get the prepopulate blob cache option. + * + * Default: 0 + * + * @return the current prepopulate blob cache option. + */ + PrepopulateBlobCache prepopulateBlobCache(); + // // END options for blobs (integrated BlobDB) // diff --git a/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java b/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java index 433fbcf080..a642cb6fab 100644 --- a/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java +++ b/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java @@ -1280,6 +1280,37 @@ public int blobFileStartingLevel() { return blobFileStartingLevel(nativeHandle_); } + /** + * Set a certain prepopulate blob cache option. + * + * Default: 0 + * + * Dynamically changeable through + * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)}. + * + * @param prepopulateBlobCache the prepopulate blob cache option + * + * @return the reference to the current options. + */ + @Override + public ColumnFamilyOptions setPrepopulateBlobCache( + final PrepopulateBlobCache prepopulateBlobCache) { + setPrepopulateBlobCache(nativeHandle_, prepopulateBlobCache.getValue()); + return this; + } + + /** + * Get the prepopulate blob cache option. + * + * Default: 0 + * + * @return the current prepopulate blob cache option. + */ + @Override + public PrepopulateBlobCache prepopulateBlobCache() { + return PrepopulateBlobCache.getPrepopulateBlobCache(prepopulateBlobCache(nativeHandle_)); + } + // // END options for blobs (integrated BlobDB) // @@ -1488,6 +1519,9 @@ private native void setBlobCompactionReadaheadSize( private native void setBlobFileStartingLevel( final long nativeHandle_, final int blobFileStartingLevel); private native int blobFileStartingLevel(final long nativeHandle_); + private native void setPrepopulateBlobCache( + final long nativeHandle_, final byte prepopulateBlobCache); + private native byte prepopulateBlobCache(final long nativeHandle_); // instance variables // NOTE: If you add new member variables, please update the copy constructor above! diff --git a/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java b/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java index 8f55c52015..af28fa8ce7 100644 --- a/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java +++ b/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java @@ -123,7 +123,8 @@ public enum BlobOption implements MutableColumnFamilyOptionKey { blob_garbage_collection_age_cutoff(ValueType.DOUBLE), blob_garbage_collection_force_threshold(ValueType.DOUBLE), blob_compaction_readahead_size(ValueType.LONG), - blob_file_starting_level(ValueType.INT); + blob_file_starting_level(ValueType.INT), + prepopulate_blob_cache(ValueType.ENUM); private final ValueType valueType; BlobOption(final ValueType valueType) { @@ -607,5 +608,16 @@ public MutableColumnFamilyOptionsBuilder setBlobFileStartingLevel( public int blobFileStartingLevel() { return getInt(BlobOption.blob_file_starting_level); } + + @Override + public MutableColumnFamilyOptionsBuilder setPrepopulateBlobCache( + final PrepopulateBlobCache prepopulateBlobCache) { + return setEnum(BlobOption.prepopulate_blob_cache, prepopulateBlobCache); + } + + @Override + public PrepopulateBlobCache prepopulateBlobCache() { + return (PrepopulateBlobCache) getEnum(BlobOption.prepopulate_blob_cache); + } } } diff --git a/java/src/main/java/org/rocksdb/Options.java b/java/src/main/java/org/rocksdb/Options.java index f7e725f075..1f1e5507a5 100644 --- a/java/src/main/java/org/rocksdb/Options.java +++ b/java/src/main/java/org/rocksdb/Options.java @@ -2104,6 +2104,17 @@ public int blobFileStartingLevel() { return blobFileStartingLevel(nativeHandle_); } + @Override + public Options setPrepopulateBlobCache(final PrepopulateBlobCache prepopulateBlobCache) { + setPrepopulateBlobCache(nativeHandle_, prepopulateBlobCache.getValue()); + return this; + } + + @Override + public PrepopulateBlobCache prepopulateBlobCache() { + return PrepopulateBlobCache.getPrepopulateBlobCache(prepopulateBlobCache(nativeHandle_)); + } + // // END options for blobs (integrated BlobDB) // @@ -2541,6 +2552,9 @@ private native void setBlobCompactionReadaheadSize( private native void setBlobFileStartingLevel( final long nativeHandle_, final int blobFileStartingLevel); private native int blobFileStartingLevel(final long nativeHandle_); + private native void setPrepopulateBlobCache( + final long nativeHandle_, final byte prepopulateBlobCache); + private native byte prepopulateBlobCache(final long nativeHandle_); // instance variables // NOTE: If you add new member variables, please update the copy constructor above! diff --git a/java/src/main/java/org/rocksdb/PrepopulateBlobCache.java b/java/src/main/java/org/rocksdb/PrepopulateBlobCache.java new file mode 100644 index 0000000000..f1237aa7c9 --- /dev/null +++ b/java/src/main/java/org/rocksdb/PrepopulateBlobCache.java @@ -0,0 +1,117 @@ +// Copyright (c) Meta Platforms, Inc. and affiliates. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Enum PrepopulateBlobCache + * + *

Prepopulate warm/hot blobs which are already in memory into blob + * cache at the time of flush. On a flush, the blob that is in memory + * (in memtables) get flushed to the device. If using Direct IO, + * additional IO is incurred to read this blob back into memory again, + * which is avoided by enabling this option. This further helps if the + * workload exhibits high temporal locality, where most of the reads go + * to recently written data. This also helps in case of the remote file + * system since it involves network traffic and higher latencies.

+ */ +public enum PrepopulateBlobCache { + PREPOPULATE_BLOB_DISABLE((byte) 0x0, "prepopulate_blob_disable", "kDisable"), + PREPOPULATE_BLOB_FLUSH_ONLY((byte) 0x1, "prepopulate_blob_flush_only", "kFlushOnly"); + + /** + *

Get the PrepopulateBlobCache enumeration value by + * passing the library name to this method.

+ * + *

If library cannot be found the enumeration + * value {@code PREPOPULATE_BLOB_DISABLE} will be returned.

+ * + * @param libraryName prepopulate blob cache library name. + * + * @return PrepopulateBlobCache instance. + */ + public static PrepopulateBlobCache getPrepopulateBlobCache(String libraryName) { + if (libraryName != null) { + for (PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + if (prepopulateBlobCache.getLibraryName() != null + && prepopulateBlobCache.getLibraryName().equals(libraryName)) { + return prepopulateBlobCache; + } + } + } + return PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE; + } + + /** + *

Get the PrepopulateBlobCache enumeration value by + * passing the byte identifier to this method.

+ * + * @param byteIdentifier of PrepopulateBlobCache. + * + * @return PrepopulateBlobCache instance. + * + * @throws IllegalArgumentException If PrepopulateBlobCache cannot be found for the + * provided byteIdentifier + */ + public static PrepopulateBlobCache getPrepopulateBlobCache(byte byteIdentifier) { + for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + if (prepopulateBlobCache.getValue() == byteIdentifier) { + return prepopulateBlobCache; + } + } + + throw new IllegalArgumentException("Illegal value provided for PrepopulateBlobCache."); + } + + /** + *

Get a PrepopulateBlobCache value based on the string key in the C++ options output. + * This gets used in support of getting options into Java from an options string, + * which is generated at the C++ level. + *

+ * + * @param internalName the internal (C++) name by which the option is known. + * + * @return PrepopulateBlobCache instance (optional) + */ + static PrepopulateBlobCache getFromInternal(final String internalName) { + for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + if (prepopulateBlobCache.internalName_.equals(internalName)) { + return prepopulateBlobCache; + } + } + + throw new IllegalArgumentException( + "Illegal internalName '" + internalName + " ' provided for PrepopulateBlobCache."); + } + + /** + *

Returns the byte value of the enumerations value.

+ * + * @return byte representation + */ + public byte getValue() { + return value_; + } + + /** + *

Returns the library name of the prepopulate blob cache mode + * identified by the enumeration value.

+ * + * @return library name + */ + public String getLibraryName() { + return libraryName_; + } + + PrepopulateBlobCache(final byte value, final String libraryName, final String internalName) { + value_ = value; + libraryName_ = libraryName; + internalName_ = internalName; + } + + private final byte value_; + private final String libraryName_; + private final String internalName_; +} diff --git a/java/src/test/java/org/rocksdb/BlobOptionsTest.java b/java/src/test/java/org/rocksdb/BlobOptionsTest.java index 7b94702da0..fe3d9b246a 100644 --- a/java/src/test/java/org/rocksdb/BlobOptionsTest.java +++ b/java/src/test/java/org/rocksdb/BlobOptionsTest.java @@ -79,6 +79,8 @@ public void blobOptions() { assertThat(options.blobGarbageCollectionAgeCutoff()).isEqualTo(0.25); assertThat(options.blobGarbageCollectionForceThreshold()).isEqualTo(1.0); assertThat(options.blobCompactionReadaheadSize()).isEqualTo(0); + assertThat(options.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); assertThat(options.setEnableBlobFiles(true)).isEqualTo(options); assertThat(options.setMinBlobSize(132768L)).isEqualTo(options); @@ -90,6 +92,8 @@ public void blobOptions() { assertThat(options.setBlobGarbageCollectionForceThreshold(0.80)).isEqualTo(options); assertThat(options.setBlobCompactionReadaheadSize(262144L)).isEqualTo(options); assertThat(options.setBlobFileStartingLevel(0)).isEqualTo(options); + assertThat(options.setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY)) + .isEqualTo(options); assertThat(options.enableBlobFiles()).isEqualTo(true); assertThat(options.minBlobSize()).isEqualTo(132768L); @@ -100,6 +104,8 @@ public void blobOptions() { assertThat(options.blobGarbageCollectionForceThreshold()).isEqualTo(0.80); assertThat(options.blobCompactionReadaheadSize()).isEqualTo(262144L); assertThat(options.blobFileStartingLevel()).isEqualTo(0); + assertThat(options.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY); } } @@ -130,6 +136,9 @@ public void blobColumnFamilyOptions() { assertThat(columnFamilyOptions.setBlobCompactionReadaheadSize(262144L)) .isEqualTo(columnFamilyOptions); assertThat(columnFamilyOptions.setBlobFileStartingLevel(0)).isEqualTo(columnFamilyOptions); + assertThat(columnFamilyOptions.setPrepopulateBlobCache( + PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE)) + .isEqualTo(columnFamilyOptions); assertThat(columnFamilyOptions.enableBlobFiles()).isEqualTo(true); assertThat(columnFamilyOptions.minBlobSize()).isEqualTo(132768L); @@ -141,6 +150,8 @@ public void blobColumnFamilyOptions() { assertThat(columnFamilyOptions.blobGarbageCollectionForceThreshold()).isEqualTo(0.80); assertThat(columnFamilyOptions.blobCompactionReadaheadSize()).isEqualTo(262144L); assertThat(columnFamilyOptions.blobFileStartingLevel()).isEqualTo(0); + assertThat(columnFamilyOptions.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); } } @@ -156,7 +167,8 @@ public void blobMutableColumnFamilyOptionsBuilder() { .setBlobGarbageCollectionAgeCutoff(0.89) .setBlobGarbageCollectionForceThreshold(0.80) .setBlobCompactionReadaheadSize(262144) - .setBlobFileStartingLevel(1); + .setBlobFileStartingLevel(1) + .setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY); assertThat(builder.enableBlobFiles()).isEqualTo(true); assertThat(builder.minBlobSize()).isEqualTo(1024); @@ -167,6 +179,8 @@ public void blobMutableColumnFamilyOptionsBuilder() { assertThat(builder.blobGarbageCollectionForceThreshold()).isEqualTo(0.80); assertThat(builder.blobCompactionReadaheadSize()).isEqualTo(262144); assertThat(builder.blobFileStartingLevel()).isEqualTo(1); + assertThat(builder.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY); builder.setEnableBlobFiles(false) .setMinBlobSize(4096) @@ -176,7 +190,8 @@ public void blobMutableColumnFamilyOptionsBuilder() { .setBlobGarbageCollectionAgeCutoff(0.91) .setBlobGarbageCollectionForceThreshold(0.96) .setBlobCompactionReadaheadSize(1024) - .setBlobFileStartingLevel(0); + .setBlobFileStartingLevel(0) + .setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); assertThat(builder.enableBlobFiles()).isEqualTo(false); assertThat(builder.minBlobSize()).isEqualTo(4096); @@ -187,16 +202,19 @@ public void blobMutableColumnFamilyOptionsBuilder() { assertThat(builder.blobGarbageCollectionForceThreshold()).isEqualTo(0.96); assertThat(builder.blobCompactionReadaheadSize()).isEqualTo(1024); assertThat(builder.blobFileStartingLevel()).isEqualTo(0); + assertThat(builder.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); final MutableColumnFamilyOptions options = builder.build(); assertThat(options.getKeys()) .isEqualTo(new String[] {"enable_blob_files", "min_blob_size", "blob_file_size", "blob_compression_type", "enable_blob_garbage_collection", "blob_garbage_collection_age_cutoff", "blob_garbage_collection_force_threshold", - "blob_compaction_readahead_size", "blob_file_starting_level"}); + "blob_compaction_readahead_size", "blob_file_starting_level", + "prepopulate_blob_cache"}); assertThat(options.getValues()) - .isEqualTo(new String[] { - "false", "4096", "2048", "LZ4_COMPRESSION", "false", "0.91", "0.96", "1024", "0"}); + .isEqualTo(new String[] {"false", "4096", "2048", "LZ4_COMPRESSION", "false", "0.91", + "0.96", "1024", "0", "PREPOPULATE_BLOB_DISABLE"}); } /** diff --git a/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java b/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java index 66b458a9cf..b2b2599a7f 100644 --- a/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java +++ b/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java @@ -96,8 +96,9 @@ public void mutableColumnFamilyOptions_parse() { public void mutableColumnFamilyOptions_parse_getOptions_output() { final String optionsString = "bottommost_compression=kDisableCompressionOption; sample_for_compression=0; " - + "blob_garbage_collection_age_cutoff=0.250000; blob_garbage_collection_force_threshold=0.800000; arena_block_size=1048576; enable_blob_garbage_collection=false; " - + "level0_stop_writes_trigger=36; min_blob_size=65536; blob_compaction_readahead_size=262144; blob_file_starting_level=5; " + + "blob_garbage_collection_age_cutoff=0.250000; blob_garbage_collection_force_threshold=0.800000;" + + "arena_block_size=1048576; enable_blob_garbage_collection=false; level0_stop_writes_trigger=36; min_blob_size=65536;" + + "blob_compaction_readahead_size=262144; blob_file_starting_level=5; prepopulate_blob_cache=kDisable;" + "compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;" + "compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}; " + "target_file_size_base=67108864; max_bytes_for_level_base=268435456; memtable_whole_key_filtering=false; " @@ -133,6 +134,7 @@ public void mutableColumnFamilyOptions_parse_getOptions_output() { assertThat(cf.minBlobSize()).isEqualTo(65536); assertThat(cf.blobCompactionReadaheadSize()).isEqualTo(262144); assertThat(cf.blobFileStartingLevel()).isEqualTo(5); + assertThat(cf.prepopulateBlobCache()).isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); assertThat(cf.targetFileSizeBase()).isEqualTo(67108864); assertThat(cf.maxBytesForLevelBase()).isEqualTo(268435456); assertThat(cf.softPendingCompactionBytesLimit()).isEqualTo(68719476736L); diff --git a/java/src/test/java/org/rocksdb/OptionsTest.java b/java/src/test/java/org/rocksdb/OptionsTest.java index 612b1bd7ca..129f1c39ae 100644 --- a/java/src/test/java/org/rocksdb/OptionsTest.java +++ b/java/src/test/java/org/rocksdb/OptionsTest.java @@ -1033,6 +1033,18 @@ public void compressionTypes() { } } + @Test + public void prepopulateBlobCache() { + try (final Options options = new Options()) { + for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + options.setPrepopulateBlobCache(prepopulateBlobCache); + assertThat(options.prepopulateBlobCache()).isEqualTo(prepopulateBlobCache); + assertThat(PrepopulateBlobCache.valueOf("PREPOPULATE_BLOB_DISABLE")) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); + } + } + } + @Test public void compressionPerLevel() { try (final Options options = new Options()) { diff --git a/options/cf_options.cc b/options/cf_options.cc index 6fbc31151b..c8c45ecf8e 100644 --- a/options/cf_options.cc +++ b/options/cf_options.cc @@ -451,6 +451,10 @@ static std::unordered_map {offsetof(struct MutableCFOptions, blob_file_starting_level), OptionType::kInt, OptionVerificationType::kNormal, OptionTypeFlags::kMutable}}, + {"prepopulate_blob_cache", + OptionTypeInfo::Enum( + offsetof(struct MutableCFOptions, prepopulate_blob_cache), + &prepopulate_blob_cache_string_map, OptionTypeFlags::kMutable)}, {"sample_for_compression", {offsetof(struct MutableCFOptions, sample_for_compression), OptionType::kUInt64T, OptionVerificationType::kNormal, @@ -1097,7 +1101,10 @@ void MutableCFOptions::Dump(Logger* log) const { blob_compaction_readahead_size); ROCKS_LOG_INFO(log, " blob_file_starting_level: %d", blob_file_starting_level); - + ROCKS_LOG_INFO(log, " prepopulate_blob_cache: %s", + prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly + ? "flush only" + : "disable"); ROCKS_LOG_INFO(log, " bottommost_temperature: %d", static_cast(bottommost_temperature)); } diff --git a/options/cf_options.h b/options/cf_options.h index feba505229..ff4df0d7a1 100644 --- a/options/cf_options.h +++ b/options/cf_options.h @@ -147,6 +147,7 @@ struct MutableCFOptions { options.blob_garbage_collection_force_threshold), blob_compaction_readahead_size(options.blob_compaction_readahead_size), blob_file_starting_level(options.blob_file_starting_level), + prepopulate_blob_cache(options.prepopulate_blob_cache), max_sequential_skip_in_iterations( options.max_sequential_skip_in_iterations), check_flush_compaction_key_order( @@ -198,6 +199,7 @@ struct MutableCFOptions { blob_garbage_collection_force_threshold(0.0), blob_compaction_readahead_size(0), blob_file_starting_level(0), + prepopulate_blob_cache(PrepopulateBlobCache::kDisable), max_sequential_skip_in_iterations(0), check_flush_compaction_key_order(true), paranoid_file_checks(false), @@ -281,6 +283,7 @@ struct MutableCFOptions { double blob_garbage_collection_force_threshold; uint64_t blob_compaction_readahead_size; int blob_file_starting_level; + PrepopulateBlobCache prepopulate_blob_cache; // Misc options uint64_t max_sequential_skip_in_iterations; diff --git a/options/options.cc b/options/options.cc index b870bad286..6dff5e62f1 100644 --- a/options/options.cc +++ b/options/options.cc @@ -105,7 +105,8 @@ AdvancedColumnFamilyOptions::AdvancedColumnFamilyOptions(const Options& options) options.blob_garbage_collection_force_threshold), blob_compaction_readahead_size(options.blob_compaction_readahead_size), blob_file_starting_level(options.blob_file_starting_level), - blob_cache(options.blob_cache) { + blob_cache(options.blob_cache), + prepopulate_blob_cache(options.prepopulate_blob_cache) { assert(memtable_factory.get() != nullptr); if (max_bytes_for_level_multiplier_additional.size() < static_cast(num_levels)) { @@ -428,6 +429,11 @@ void ColumnFamilyOptions::Dump(Logger* log) const { blob_cache->Name()); ROCKS_LOG_HEADER(log, " blob_cache options: %s", blob_cache->GetPrintableOptions().c_str()); + ROCKS_LOG_HEADER( + log, " blob_cache prepopulated: %s", + prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly + ? "flush only" + : "disabled"); } ROCKS_LOG_HEADER(log, "Options.experimental_mempurge_threshold: %f", experimental_mempurge_threshold); diff --git a/options/options_helper.cc b/options/options_helper.cc index 280f59a4f5..0424ba3a5a 100644 --- a/options/options_helper.cc +++ b/options/options_helper.cc @@ -257,6 +257,7 @@ void UpdateColumnFamilyOptions(const MutableCFOptions& moptions, cf_opts->blob_compaction_readahead_size = moptions.blob_compaction_readahead_size; cf_opts->blob_file_starting_level = moptions.blob_file_starting_level; + cf_opts->prepopulate_blob_cache = moptions.prepopulate_blob_cache; // Misc options cf_opts->max_sequential_skip_in_iterations = @@ -850,6 +851,11 @@ std::unordered_map {"kWarm", Temperature::kWarm}, {"kCold", Temperature::kCold}}; +std::unordered_map + OptionsHelper::prepopulate_blob_cache_string_map = { + {"kDisable", PrepopulateBlobCache::kDisable}, + {"kFlushOnly", PrepopulateBlobCache::kFlushOnly}}; + Status OptionTypeInfo::NextToken(const std::string& opts, char delimiter, size_t pos, size_t* end, std::string* token) { while (pos < opts.size() && isspace(opts[pos])) { diff --git a/options/options_helper.h b/options/options_helper.h index 60b7dac49a..7c751fc252 100644 --- a/options/options_helper.h +++ b/options/options_helper.h @@ -82,6 +82,8 @@ struct OptionsHelper { static std::unordered_map checksum_type_string_map; static std::unordered_map compression_type_string_map; + static std::unordered_map + prepopulate_blob_cache_string_map; #ifndef ROCKSDB_LITE static std::unordered_map compaction_stop_style_string_map; @@ -113,6 +115,8 @@ static auto& compaction_style_string_map = static auto& compaction_pri_string_map = OptionsHelper::compaction_pri_string_map; static auto& temperature_string_map = OptionsHelper::temperature_string_map; +static auto& prepopulate_blob_cache_string_map = + OptionsHelper::prepopulate_blob_cache_string_map; #endif // !ROCKSDB_LITE } // namespace ROCKSDB_NAMESPACE diff --git a/options/options_settable_test.cc b/options/options_settable_test.cc index e4b932a448..86edbff416 100644 --- a/options/options_settable_test.cc +++ b/options/options_settable_test.cc @@ -526,6 +526,7 @@ TEST_F(OptionsSettableTest, ColumnFamilyOptionsAllFieldsSettable) { "blob_garbage_collection_force_threshold=0.75;" "blob_compaction_readahead_size=262144;" "blob_file_starting_level=1;" + "prepopulate_blob_cache=kDisable;" "bottommost_temperature=kWarm;" "preclude_last_level_data_seconds=86400;" "compaction_options_fifo={max_table_files_size=3;allow_" diff --git a/options/options_test.cc b/options/options_test.cc index 0eeaca484d..fbd078ba66 100644 --- a/options/options_test.cc +++ b/options/options_test.cc @@ -127,6 +127,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { {"blob_garbage_collection_force_threshold", "0.75"}, {"blob_compaction_readahead_size", "256K"}, {"blob_file_starting_level", "1"}, + {"prepopulate_blob_cache", "kDisable"}, {"bottommost_temperature", "kWarm"}, }; @@ -266,6 +267,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { ASSERT_EQ(new_cf_opt.blob_garbage_collection_force_threshold, 0.75); ASSERT_EQ(new_cf_opt.blob_compaction_readahead_size, 262144); ASSERT_EQ(new_cf_opt.blob_file_starting_level, 1); + ASSERT_EQ(new_cf_opt.prepopulate_blob_cache, PrepopulateBlobCache::kDisable); ASSERT_EQ(new_cf_opt.bottommost_temperature, Temperature::kWarm); cf_options_map["write_buffer_size"] = "hello"; @@ -2356,6 +2358,7 @@ TEST_F(OptionsOldApiTest, GetOptionsFromMapTest) { {"blob_garbage_collection_force_threshold", "0.75"}, {"blob_compaction_readahead_size", "256K"}, {"blob_file_starting_level", "1"}, + {"prepopulate_blob_cache", "kDisable"}, {"bottommost_temperature", "kWarm"}, }; @@ -2489,6 +2492,7 @@ TEST_F(OptionsOldApiTest, GetOptionsFromMapTest) { ASSERT_EQ(new_cf_opt.blob_garbage_collection_force_threshold, 0.75); ASSERT_EQ(new_cf_opt.blob_compaction_readahead_size, 262144); ASSERT_EQ(new_cf_opt.blob_file_starting_level, 1); + ASSERT_EQ(new_cf_opt.prepopulate_blob_cache, PrepopulateBlobCache::kDisable); ASSERT_EQ(new_cf_opt.bottommost_temperature, Temperature::kWarm); cf_options_map["write_buffer_size"] = "hello"; diff --git a/tools/benchmark.sh b/tools/benchmark.sh index 105d6f83ef..1773f9d6ef 100755 --- a/tools/benchmark.sh +++ b/tools/benchmark.sh @@ -95,6 +95,7 @@ function display_usage() { echo -e "\tUSE_SHARED_BLOCK_AND_BLOB_CACHE\t\t\tUse the same backing cache for block cache and blob cache (default: 1)" echo -e "\tBLOB_CACHE_SIZE\t\t\tSize of the blob cache (default: 16GB)" echo -e "\tBLOB_CACHE_NUMSHARDBITS\t\t\tNumber of shards for the blob cache is 2 ** blob_cache_numshardbits (default: 6)" + echo -e "\tPREPOPULATE_BLOB_CACHE\t\t\tPre-populate hot/warm blobs in blob cache (default: 0)" } if [ $# -lt 1 ]; then @@ -239,6 +240,7 @@ use_blob_cache=${USE_BLOB_CACHE:-1} use_shared_block_and_blob_cache=${USE_SHARED_BLOCK_AND_BLOB_CACHE:-1} blob_cache_size=${BLOB_CACHE_SIZE:-$(( 16 * $G ))} blob_cache_numshardbits=${BLOB_CACHE_NUMSHARDBITS:-6} +prepopulate_blob_cache=${PREPOPULATE_BLOB_CACHE:-0} const_params_base=" --db=$DB_DIR \ @@ -306,7 +308,8 @@ blob_const_params=" --use_shared_block_and_blob_cache=$use_shared_block_and_blob_cache \ --blob_cache_size=$blob_cache_size \ --blob_cache_numshardbits=$blob_cache_numshardbits \ - --undefok=use_blob_cache,use_shared_block_and_blob_cache,blob_cache_size,blob_cache_numshardbits \ + --prepopulate_blob_cache=$prepopulate_blob_cache \ + --undefok=use_blob_cache,use_shared_block_and_blob_cache,blob_cache_size,blob_cache_numshardbits,prepopulate_blob_cache \ " # TODO: diff --git a/tools/db_bench_tool.cc b/tools/db_bench_tool.cc index 61aa184d35..3df9fcb78b 100644 --- a/tools/db_bench_tool.cc +++ b/tools/db_bench_tool.cc @@ -1096,6 +1096,10 @@ DEFINE_int32(blob_cache_numshardbits, 6, "the block and blob caches are different " "(use_shared_block_and_blob_cache = false)."); +DEFINE_int32(prepopulate_blob_cache, 0, + "[Integrated BlobDB] Pre-populate hot/warm blobs in blob cache. 0 " + "to disable and 1 to insert during flush."); + #ifndef ROCKSDB_LITE // Secondary DB instance Options @@ -4524,12 +4528,24 @@ class Benchmark { exit(1); } } - fprintf(stdout, - "Integrated BlobDB: blob cache enabled, block and blob caches " - "shared: %d, blob cache size %" PRIu64 - ", blob cache num shard bits: %d\n", - FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size, - FLAGS_blob_cache_numshardbits); + switch (FLAGS_prepopulate_blob_cache) { + case 0: + options.prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + break; + case 1: + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + break; + default: + fprintf(stderr, "Unknown prepopulate blob cache mode\n"); + exit(1); + } + fprintf( + stdout, + "Integrated BlobDB: blob cache enabled, block and blob caches " + "shared: %d, blob cache size %" PRIu64 + ", blob cache num shard bits: %d, hot/warm blobs prepopulated: %d\n", + FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size, + FLAGS_blob_cache_numshardbits, FLAGS_prepopulate_blob_cache); } else { fprintf(stdout, "Integrated BlobDB: blob cache disabled\n"); } diff --git a/tools/db_bench_tool_test.cc b/tools/db_bench_tool_test.cc index 9dc2bf277d..7d09c67a5c 100644 --- a/tools/db_bench_tool_test.cc +++ b/tools/db_bench_tool_test.cc @@ -276,6 +276,7 @@ const std::string options_file_content = R"OPTIONS_FILE( blob_garbage_collection_force_threshold=0.75 blob_compaction_readahead_size=262144 blob_file_starting_level=0 + prepopulate_blob_cache=kDisable; [TableOptions/BlockBasedTable "default"] format_version=0 diff --git a/tools/db_crashtest.py b/tools/db_crashtest.py index 56aa9380b5..dbb8bc9e9c 100644 --- a/tools/db_crashtest.py +++ b/tools/db_crashtest.py @@ -349,6 +349,7 @@ def is_direct_io_supported(dbname): "use_blob_cache": lambda: random.randint(0, 1), "use_shared_block_and_blob_cache": lambda: random.randint(0, 1), "blob_cache_size": lambda: random.choice([1048576, 2097152, 4194304, 8388608]), + "prepopulate_blob_cache": lambda: random.randint(0, 1), } ts_params = { diff --git a/tools/ldb_cmd.cc b/tools/ldb_cmd.cc index 25c4463ce0..01dac5ff97 100644 --- a/tools/ldb_cmd.cc +++ b/tools/ldb_cmd.cc @@ -102,6 +102,8 @@ const std::string LDBCommand::ARG_BLOB_COMPACTION_READAHEAD_SIZE = "blob_compaction_readahead_size"; const std::string LDBCommand::ARG_BLOB_FILE_STARTING_LEVEL = "blob_file_starting_level"; +const std::string LDBCommand::ARG_PREPOPULATE_BLOB_CACHE = + "prepopulate_blob_cache"; const std::string LDBCommand::ARG_DECODE_BLOB_INDEX = "decode_blob_index"; const std::string LDBCommand::ARG_DUMP_UNCOMPRESSED_BLOBS = "dump_uncompressed_blobs"; @@ -556,6 +558,7 @@ std::vector LDBCommand::BuildCmdLineOptions( ARG_BLOB_GARBAGE_COLLECTION_FORCE_THRESHOLD, ARG_BLOB_COMPACTION_READAHEAD_SIZE, ARG_BLOB_FILE_STARTING_LEVEL, + ARG_PREPOPULATE_BLOB_CACHE, ARG_IGNORE_UNKNOWN_OPTIONS, ARG_CF_NAME}; ret.insert(ret.end(), options.begin(), options.end()); @@ -833,6 +836,23 @@ void LDBCommand::OverrideBaseCFOptions(ColumnFamilyOptions* cf_opts) { } } + int prepopulate_blob_cache; + if (ParseIntOption(option_map_, ARG_PREPOPULATE_BLOB_CACHE, + prepopulate_blob_cache, exec_state_)) { + switch (prepopulate_blob_cache) { + case 0: + cf_opts->prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + break; + case 1: + cf_opts->prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + break; + default: + exec_state_ = LDBCommandExecuteResult::Failed( + ARG_PREPOPULATE_BLOB_CACHE + + " must be 0 (disable) or 1 (flush only)."); + } + } + auto itr = option_map_.find(ARG_AUTO_COMPACTION); if (itr != option_map_.end()) { cf_opts->disable_auto_compactions = !StringToBool(itr->second); diff --git a/tools/run_blob_bench.sh b/tools/run_blob_bench.sh index 32b45717af..3755a9e56b 100755 --- a/tools/run_blob_bench.sh +++ b/tools/run_blob_bench.sh @@ -59,6 +59,7 @@ function display_usage() { echo -e "\tUSE_SHARED_BLOCK_AND_BLOB_CACHE\t\t\tUse the same backing cache for block cache and blob cache. (default: 1)" echo -e "\tBLOB_CACHE_SIZE\t\t\tSize of the blob cache (default: 16GB)" echo -e "\tBLOB_CACHE_NUMSHARDBITS\t\t\tNumber of shards for the blob cache is 2 ** blob_cache_numshardbits (default: 6)" + echo -e "\tPREPOPULATE_BLOB_CACHE\t\t\tPre-populate hot/warm blobs in blob cache (default: 0)" echo -e "\tTARGET_FILE_SIZE_BASE\t\tTarget SST file size for compactions (default: write buffer size, scaled down if blob files are enabled)" echo -e "\tMAX_BYTES_FOR_LEVEL_BASE\tMaximum size for the base level (default: 8 * target SST file size)" } @@ -123,6 +124,7 @@ use_blob_cache=${USE_BLOB_CACHE:-1} use_shared_block_and_blob_cache=${USE_SHARED_BLOCK_AND_BLOB_CACHE:-1} blob_cache_size=${BLOB_CACHE_SIZE:-$((16 * G))} blob_cache_numshardbits=${BLOB_CACHE_NUMSHARDBITS:-6} +prepopulate_blob_cache=${PREPOPULATE_BLOB_CACHE:-0} if [ "$enable_blob_files" == "1" ]; then target_file_size_base=${TARGET_FILE_SIZE_BASE:-$((32 * write_buffer_size / value_size))} @@ -157,6 +159,7 @@ echo -e "Blob cache enabled:\t\t\t$use_blob_cache" echo -e "Blob cache and block cache shared:\t\t\t$use_shared_block_and_blob_cache" echo -e "Blob cache size:\t\t$blob_cache_size" echo -e "Blob cache number of shard bits:\t\t$blob_cache_numshardbits" +echo -e "Blob cache prepopulated:\t\t\t$prepopulate_blob_cache" echo -e "Target SST file size:\t\t\t$target_file_size_base" echo -e "Maximum size of base level:\t\t$max_bytes_for_level_base" echo "=================================================================" @@ -187,6 +190,7 @@ PARAMS="\ --use_shared_block_and_blob_cache=$use_shared_block_and_blob_cache \ --blob_cache_size=$blob_cache_size \ --blob_cache_numshardbits=$blob_cache_numshardbits \ + --prepopulate_blob_cache=$prepopulate_blob_cache \ --write_buffer_size=$write_buffer_size \ --target_file_size_base=$target_file_size_base \ --max_bytes_for_level_base=$max_bytes_for_level_base"