Skip to content

Commit

Permalink
New stat rocksdb.{cf|db}-write-stall-stats exposed in a structural way (
Browse files Browse the repository at this point in the history
facebook#11300)

Summary:
**Context/Summary:**
Users are interested in figuring out what has caused write stall.
- Refactor write stall related stats from property `kCFStats` into its own db property `rocksdb.cf-write-stall-stats` as a map or string. For now, this only contains count of different combination of (CF-scope `WriteStallCause`) + (`WriteStallCondition`)
- Add new `WriteStallCause::kWriteBufferManagerLimit` to reflect write stall caused by write buffer manager
- Add new `rocksdb.db-write-stall-stats`. For now, this only contains `WriteStallCause::kWriteBufferManagerLimit` + `WriteStallCondition::kStopped`

- Expose functions in new class `WriteStallStatsMapKeys` for examining the above two properties returned as map
- Misc: rename/comment some write stall InternalStats for clarity

Pull Request resolved: facebook#11300

Test Plan:
- New UT
- Stress test
`python3 tools/db_crashtest.py blackbox --simple --get_property_one_in=1`
- Perf test: Both converge very slowly at similar rates but post-change has higher average ops/sec than pre-change even though they are run at the same time.
```
./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3
```
pre-change:
```
fillseq [AVG 15 runs] : 1176 (± 732) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1052.671 micros/op 949 ops/sec 105.267 seconds 100000 operations;    0.5 MB/s
fillseq [AVG 16 runs] : 1162 (± 685) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1387.330 micros/op 720 ops/sec 138.733 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 17 runs] : 1136 (± 646) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1232.011 micros/op 811 ops/sec 123.201 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 18 runs] : 1118 (± 610) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1282.567 micros/op 779 ops/sec 128.257 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 19 runs] : 1100 (± 578) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1914.336 micros/op 522 ops/sec 191.434 seconds 100000 operations;    0.3 MB/s
fillseq [AVG 20 runs] : 1071 (± 551) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1227.510 micros/op 814 ops/sec 122.751 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 21 runs] : 1059 (± 525) ops/sec;    0.5 (± 0.3) MB/sec
```
post-change:
```
fillseq [AVG 15 runs] : 1226 (± 732) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1323.825 micros/op 755 ops/sec 132.383 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 16 runs] : 1196 (± 687) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1223.905 micros/op 817 ops/sec 122.391 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 17 runs] : 1174 (± 647) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1168.996 micros/op 855 ops/sec 116.900 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 18 runs] : 1156 (± 611) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1348.729 micros/op 741 ops/sec 134.873 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 19 runs] : 1134 (± 579) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1196.887 micros/op 835 ops/sec 119.689 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 20 runs] : 1119 (± 550) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1193.697 micros/op 837 ops/sec 119.370 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 21 runs] : 1106 (± 524) ops/sec;    0.6 (± 0.3) MB/sec
```

Reviewed By: ajkr

Differential Revision: D44159541

Pulled By: hx235

fbshipit-source-id: 8d29efb70001fdc52d34535eeb3364fc3e71e40b
  • Loading branch information
hx235 authored and facebook-github-bot committed Mar 18, 2023
1 parent 204fcff commit cb58477
Show file tree
Hide file tree
Showing 17 changed files with 667 additions and 97 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -748,6 +748,7 @@ set(SOURCES
db/write_batch.cc
db/write_batch_base.cc
db/write_controller.cc
db/write_stall_stats.cc
db/write_thread.cc
env/composite_env.cc
env/env.cc
Expand Down
1 change: 1 addition & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
* Added a new PerfContext counter `internal_merge_point_lookup_count` which tracks the number of Merge operands applied while serving point lookup queries.
* Add new statistics rocksdb.table.open.prefetch.tail.read.bytes, rocksdb.table.open.prefetch.tail.{miss|hit}
* Add support for SecondaryCache with HyperClockCache (`HyperClockCacheOptions` inherits `secondary_cache` option from `ShardedCacheOptions`)
* Add new db properties `rocksdb.cf-write-stall-stats`, `rocksdb.db-write-stall-stats`and APIs to examine them in a structured way. In particular, users of `GetMapProperty()` with property `kCFWriteStallStats`/`kDBWriteStallStats` can now use the functions in `WriteStallStatsMapKeys` to find stats in the map.

### Public API Changes
* Changed various functions and features in `Cache` that are mostly relevant to custom implementations or wrappers. Especially, asychronous lookup functionality is moved from `Lookup()` to a new `StartAsyncLookup()` function.
Expand Down
2 changes: 2 additions & 0 deletions TARGETS
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ cpp_library_wrapper(name="rocksdb_lib", srcs=[
"db/write_batch.cc",
"db/write_batch_base.cc",
"db/write_controller.cc",
"db/write_stall_stats.cc",
"db/write_thread.cc",
"env/composite_env.cc",
"env/env.cc",
Expand Down Expand Up @@ -451,6 +452,7 @@ cpp_library_wrapper(name="rocksdb_whole_archive_lib", srcs=[
"db/write_batch.cc",
"db/write_batch_base.cc",
"db/write_controller.cc",
"db/write_stall_stats.cc",
"db/write_thread.cc",
"env/composite_env.cc",
"env/env.cc",
Expand Down
15 changes: 8 additions & 7 deletions db/column_family.cc
Original file line number Diff line number Diff line change
Expand Up @@ -869,7 +869,7 @@ int GetL0ThresholdSpeedupCompaction(int level0_file_num_compaction_trigger,
}
} // anonymous namespace

std::pair<WriteStallCondition, ColumnFamilyData::WriteStallCause>
std::pair<WriteStallCondition, WriteStallCause>
ColumnFamilyData::GetWriteStallConditionAndCause(
int num_unflushed_memtables, int num_l0_files,
uint64_t num_compaction_needed_bytes,
Expand Down Expand Up @@ -942,7 +942,8 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_STOPS, 1);
if (compaction_picker_->IsLevel0CompactionInProgress()) {
internal_stats_->AddCFStats(
InternalStats::LOCKED_L0_FILE_COUNT_LIMIT_STOPS, 1);
InternalStats::L0_FILE_COUNT_LIMIT_STOPS_WITH_ONGOING_COMPACTION,
1);
}
ROCKS_LOG_WARN(ioptions_.logger,
"[%s] Stopping writes because we have %d level-0 files",
Expand All @@ -963,7 +964,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
SetupDelay(write_controller, compaction_needed_bytes,
prev_compaction_needed_bytes_, was_stopped,
mutable_cf_options.disable_auto_compactions);
internal_stats_->AddCFStats(InternalStats::MEMTABLE_LIMIT_SLOWDOWNS, 1);
internal_stats_->AddCFStats(InternalStats::MEMTABLE_LIMIT_DELAYS, 1);
ROCKS_LOG_WARN(
ioptions_.logger,
"[%s] Stalling writes because we have %d immutable memtables "
Expand All @@ -981,11 +982,11 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
SetupDelay(write_controller, compaction_needed_bytes,
prev_compaction_needed_bytes_, was_stopped || near_stop,
mutable_cf_options.disable_auto_compactions);
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_SLOWDOWNS,
1);
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_DELAYS, 1);
if (compaction_picker_->IsLevel0CompactionInProgress()) {
internal_stats_->AddCFStats(
InternalStats::LOCKED_L0_FILE_COUNT_LIMIT_SLOWDOWNS, 1);
InternalStats::L0_FILE_COUNT_LIMIT_DELAYS_WITH_ONGOING_COMPACTION,
1);
}
ROCKS_LOG_WARN(ioptions_.logger,
"[%s] Stalling writes because we have %d level-0 files "
Expand All @@ -1011,7 +1012,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
prev_compaction_needed_bytes_, was_stopped || near_stop,
mutable_cf_options.disable_auto_compactions);
internal_stats_->AddCFStats(
InternalStats::PENDING_COMPACTION_BYTES_LIMIT_SLOWDOWNS, 1);
InternalStats::PENDING_COMPACTION_BYTES_LIMIT_DELAYS, 1);
ROCKS_LOG_WARN(
ioptions_.logger,
"[%s] Stalling writes because of estimated pending compaction "
Expand Down
6 changes: 0 additions & 6 deletions db/column_family.h
Original file line number Diff line number Diff line change
Expand Up @@ -462,12 +462,6 @@ class ColumnFamilyData {
bool queued_for_flush() { return queued_for_flush_; }
bool queued_for_compaction() { return queued_for_compaction_; }

enum class WriteStallCause {
kNone,
kMemtableLimit,
kL0FileCountLimit,
kPendingCompactionBytes,
};
static std::pair<WriteStallCondition, WriteStallCause>
GetWriteStallConditionAndCause(
int num_unflushed_memtables, int num_l0_files,
Expand Down
3 changes: 3 additions & 0 deletions db/db_impl/db_impl_write.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1211,6 +1211,9 @@ Status DBImpl::PreprocessWrite(const WriteOptions& write_options,
// exceeded at this point so no new write (including current one) will go
// through until memory usage is decreased.
if (UNLIKELY(status.ok() && write_buffer_manager_->ShouldStall())) {
default_cf_internal_stats_->AddDBStats(
InternalStats::kIntStatsWriteBufferManagerLimitStopsCounts, 1,
true /* concurrent */);
if (write_options.no_slowdown) {
status = Status::Incomplete("Write stall");
} else {
Expand Down
175 changes: 175 additions & 0 deletions db/db_properties_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include <string>

#include "db/db_test_util.h"
#include "db/write_stall_stats.h"
#include "options/cf_options.h"
#include "port/stack_trace.h"
#include "rocksdb/listener.h"
Expand Down Expand Up @@ -2106,6 +2107,180 @@ TEST_F(DBPropertiesTest, GetMapPropertyBlockCacheEntryStats) {
ASSERT_EQ(3 * kNumCacheEntryRoles + 4, values.size());
}

TEST_F(DBPropertiesTest, WriteStallStatsSanityCheck) {
for (uint32_t i = 0; i < static_cast<uint32_t>(WriteStallCause::kNone); ++i) {
std::string str = kWriteStallCauseToHyphenString[i];
ASSERT_TRUE(!str.empty())
<< "Please ensure mapping from `WriteStallCause` to "
"`kWriteStallCauseToHyphenString` is complete";
WriteStallCause cause = static_cast<WriteStallCause>(i);
if (cause == WriteStallCause::kCFScopeWriteStallCauseEnumMax ||
cause == WriteStallCause::kDBScopeWriteStallCauseEnumMax) {
ASSERT_EQ(str, kInvalidWriteStallCauseHyphenString)
<< "Please ensure order in `kWriteStallCauseToHyphenString` is "
"consistent with `WriteStallCause`";
}
}

for (uint32_t i = 0; i < static_cast<uint32_t>(WriteStallCondition::kNormal);
++i) {
std::string str = kWriteStallConditionToHyphenString[i];
ASSERT_TRUE(!str.empty())
<< "Please ensure mapping from `WriteStallCondition` to "
"`kWriteStallConditionToHyphenString` is complete";
}

for (uint32_t i = 0; i < static_cast<uint32_t>(WriteStallCause::kNone); ++i) {
for (uint32_t j = 0;
j < static_cast<uint32_t>(WriteStallCondition::kNormal); ++j) {
WriteStallCause cause = static_cast<WriteStallCause>(i);
WriteStallCondition condition = static_cast<WriteStallCondition>(j);

if (isCFScopeWriteStallCause(cause)) {
ASSERT_TRUE(InternalCFStat(cause, condition) !=
InternalStats::INTERNAL_CF_STATS_ENUM_MAX)
<< "Please ensure the combination of WriteStallCause(" +
std::to_string(static_cast<uint32_t>(cause)) +
") + WriteStallCondition(" +
std::to_string(static_cast<uint32_t>(condition)) +
") is correctly mapped to a valid `InternalStats` or bypass "
"its check in this test";
} else if (isDBScopeWriteStallCause(cause)) {
InternalStats::InternalDBStatsType internal_db_stat =
InternalDBStat(cause, condition);
if (internal_db_stat == InternalStats::kIntStatsNumMax) {
ASSERT_TRUE(cause == WriteStallCause::kWriteBufferManagerLimit &&
condition == WriteStallCondition::kDelayed)
<< "Please ensure the combination of WriteStallCause(" +
std::to_string(static_cast<uint32_t>(cause)) +
") + WriteStallCondition(" +
std::to_string(static_cast<uint32_t>(condition)) +
") is correctly mapped to a valid `InternalStats` or "
"bypass its check in this test";
}
} else if (cause != WriteStallCause::kCFScopeWriteStallCauseEnumMax &&
cause != WriteStallCause::kDBScopeWriteStallCauseEnumMax) {
ASSERT_TRUE(false) << "Please ensure the WriteStallCause(" +
std::to_string(static_cast<uint32_t>(cause)) +
") is either CF-scope or DB-scope write "
"stall cause in enum `WriteStallCause`";
}
}
}
}
TEST_F(DBPropertiesTest, GetMapPropertyWriteStallStats) {
Options options = CurrentOptions();
CreateAndReopenWithCF({"heavy_write_cf"}, options);

for (auto test_cause : {WriteStallCause::kWriteBufferManagerLimit,
WriteStallCause::kMemtableLimit}) {
if (test_cause == WriteStallCause::kWriteBufferManagerLimit) {
options.write_buffer_manager.reset(
new WriteBufferManager(100000, nullptr, true));
} else if (test_cause == WriteStallCause::kMemtableLimit) {
options.max_write_buffer_number = 2;
options.disable_auto_compactions = true;
}
ReopenWithColumnFamilies({"default", "heavy_write_cf"}, options);

// Assert initial write stall stats are all 0
std::map<std::string, std::string> db_values;
ASSERT_TRUE(dbfull()->GetMapProperty(DB::Properties::kDBWriteStallStats,
&db_values));
ASSERT_EQ(std::stoi(db_values[WriteStallStatsMapKeys::CauseConditionCount(
WriteStallCause::kWriteBufferManagerLimit,
WriteStallCondition::kStopped)]),
0);

for (int cf = 0; cf <= 1; ++cf) {
std::map<std::string, std::string> cf_values;
ASSERT_TRUE(dbfull()->GetMapProperty(
handles_[cf], DB::Properties::kCFWriteStallStats, &cf_values));
ASSERT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalStops()]), 0);
ASSERT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalDelays()]), 0);
}

// Pause flush thread to help coerce write stall
std::unique_ptr<test::SleepingBackgroundTask> sleeping_task(
new test::SleepingBackgroundTask());
env_->SetBackgroundThreads(1, Env::HIGH);
env_->Schedule(&test::SleepingBackgroundTask::DoSleepTask,
sleeping_task.get(), Env::Priority::HIGH);
sleeping_task->WaitUntilSleeping();

// Coerce write stall
if (test_cause == WriteStallCause::kWriteBufferManagerLimit) {
ASSERT_OK(dbfull()->Put(
WriteOptions(), handles_[1], Key(1),
DummyString(options.write_buffer_manager->buffer_size())));

WriteOptions wo;
wo.no_slowdown = true;
Status s = dbfull()->Put(
wo, handles_[1], Key(2),
DummyString(options.write_buffer_manager->buffer_size()));
ASSERT_TRUE(s.IsIncomplete());
ASSERT_TRUE(s.ToString().find("Write stall") != std::string::npos);
} else if (test_cause == WriteStallCause::kMemtableLimit) {
FlushOptions fo;
fo.allow_write_stall = true;
fo.wait = false;

ASSERT_OK(
dbfull()->Put(WriteOptions(), handles_[1], Key(1), DummyString(1)));
ASSERT_OK(dbfull()->Flush(fo, handles_[1]));

ASSERT_OK(
dbfull()->Put(WriteOptions(), handles_[1], Key(2), DummyString(1)));
ASSERT_OK(dbfull()->Flush(fo, handles_[1]));
}

if (test_cause == WriteStallCause::kWriteBufferManagerLimit) {
db_values.clear();
EXPECT_TRUE(dbfull()->GetMapProperty(DB::Properties::kDBWriteStallStats,
&db_values));
EXPECT_EQ(std::stoi(db_values[WriteStallStatsMapKeys::CauseConditionCount(
WriteStallCause::kWriteBufferManagerLimit,
WriteStallCondition::kStopped)]),
1);
// `WriteStallCause::kWriteBufferManagerLimit` should not result in any
// CF-scope write stall stats changes
for (int cf = 0; cf <= 1; ++cf) {
std::map<std::string, std::string> cf_values;
EXPECT_TRUE(dbfull()->GetMapProperty(
handles_[cf], DB::Properties::kCFWriteStallStats, &cf_values));
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalStops()]),
0);
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalDelays()]),
0);
}
} else if (test_cause == WriteStallCause::kMemtableLimit) {
for (int cf = 0; cf <= 1; ++cf) {
std::map<std::string, std::string> cf_values;
EXPECT_TRUE(dbfull()->GetMapProperty(
handles_[cf], DB::Properties::kCFWriteStallStats, &cf_values));
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalStops()]),
cf == 1 ? 1 : 0);
EXPECT_EQ(
std::stoi(cf_values[WriteStallStatsMapKeys::CauseConditionCount(
WriteStallCause::kMemtableLimit,
WriteStallCondition::kStopped)]),
cf == 1 ? 1 : 0);
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalDelays()]),
0);
EXPECT_EQ(
std::stoi(cf_values[WriteStallStatsMapKeys::CauseConditionCount(
WriteStallCause::kMemtableLimit,
WriteStallCondition::kDelayed)]),
0);
}
}

sleeping_task->WakeUp();
sleeping_task->WaitUntilDone();
}
}

namespace {
std::string PopMetaIndexKey(InternalIterator* meta_iter) {
Status s = meta_iter->status();
Expand Down
Loading

0 comments on commit cb58477

Please sign in to comment.