Skip to content

Commit

Permalink
Implement XXH3 block checksum type (#9069)
Browse files Browse the repository at this point in the history
Summary:
XXH3 - latest hash function that is extremely fast on large
data, easily faster than crc32c on most any x86_64 hardware. In
integrating this hash function, I have handled the compression type byte
in a non-standard way to avoid using the streaming API (extra data
movement and active code size because of hash function complexity). This
approach got a thumbs-up from Yann Collet.

Existing functionality change:
* reject bad ChecksumType in options with InvalidArgument

This change split off from facebook/rocksdb#9058 because context-aware checksum is
likely to be handled through different configuration than ChecksumType.

Pull Request resolved: facebook/rocksdb#9069

Test Plan:
tests updated, and substantially expanded. Unit tests now check
that we don't accidentally change the values generated by the checksum
algorithms ("schema test") and that we properly handle
invalid/unrecognized checksum types in options or in file footer.

DBTestBase::ChangeOptions (etc.) updated from two to one configuration
changing from default CRC32c ChecksumType. The point of this test code
is to detect possible interactions among features, and the likelihood of
some bad interaction being detected by including configurations other
than XXH3 and CRC32c--and then not detected by stress/crash test--is
extremely low.

Stress/crash test also updated (manual run long enough to see it accepts
new checksum type). db_bench also updated for microbenchmarking
checksums.

 ### Performance microbenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor)

./db_bench -benchmarks=crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3
crc32c       :       0.200 micros/op 5005220 ops/sec; 19551.6 MB/s (4096 per op)
xxhash       :       0.807 micros/op 1238408 ops/sec; 4837.5 MB/s (4096 per op)
xxhash64     :       0.421 micros/op 2376514 ops/sec; 9283.3 MB/s (4096 per op)
xxh3         :       0.171 micros/op 5858391 ops/sec; 22884.3 MB/s (4096 per op)
crc32c       :       0.206 micros/op 4859566 ops/sec; 18982.7 MB/s (4096 per op)
xxhash       :       0.793 micros/op 1260850 ops/sec; 4925.2 MB/s (4096 per op)
xxhash64     :       0.410 micros/op 2439182 ops/sec; 9528.1 MB/s (4096 per op)
xxh3         :       0.161 micros/op 6202872 ops/sec; 24230.0 MB/s (4096 per op)
crc32c       :       0.203 micros/op 4924686 ops/sec; 19237.1 MB/s (4096 per op)
xxhash       :       0.839 micros/op 1192388 ops/sec; 4657.8 MB/s (4096 per op)
xxhash64     :       0.424 micros/op 2357391 ops/sec; 9208.6 MB/s (4096 per op)
xxh3         :       0.162 micros/op 6182678 ops/sec; 24151.1 MB/s (4096 per op)

As you can see, especially once warmed up, xxh3 is fastest.

 ### Performance macrobenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor)

Test

    for I in `seq 1 50`; do for CHK in 0 1 2 3 4; do TEST_TMPDIR=/dev/shm/rocksdb$CHK ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -checksum_type=$CHK 2>&1 | grep 'micros/op' | tee -a results-$CHK & done; wait; done

Results (ops/sec)

    for FILE in results*; do echo -n "$FILE "; awk '{ s += $5; c++; } END { print 1.0 * s / c; }' < $FILE; done

results-0 252118 # kNoChecksum
results-1 251588 # kCRC32c
results-2 251863 # kxxHash
results-3 252016 # kxxHash64
results-4 252038 # kXXH3

Reviewed By: mrambacher

Differential Revision: D31905249

Pulled By: pdillinger

fbshipit-source-id: cb9b998ebe2523fc7c400eedf62124a78bf4b4d1
  • Loading branch information
pdillinger authored and facebook-github-bot committed Oct 29, 2021
1 parent f24c39a commit a7d4bea
Show file tree
Hide file tree
Showing 16 changed files with 304 additions and 120 deletions.
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Rocksdb Change Log
## Unreleased
### New Features
* Added new ChecksumType kXXH3 which is faster than kCRC32c on almost all x86\_64 hardware.

### Bug Fixes
* Prevent a `CompactRange()` with `CompactRangeOptions::change_level == true` from possibly causing corruption to the LSM state (overlapping files within a level) when run in parallel with another manual compaction. Note that setting `force_consistency_checks == true` (the default) would cause the DB to enter read-only mode in this scenario and return `Status::Corruption`, rather than committing any corruption.

Expand Down
21 changes: 14 additions & 7 deletions db/db_basic_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include <cstring>

#include "db/db_test_util.h"
#include "options/options_helper.h"
#include "port/stack_trace.h"
#include "rocksdb/flush_block_policy.h"
#include "rocksdb/merge_operator.h"
Expand Down Expand Up @@ -974,13 +975,14 @@ TEST_F(DBBasicTest, MultiGetEmpty) {
TEST_F(DBBasicTest, ChecksumTest) {
BlockBasedTableOptions table_options;
Options options = CurrentOptions();
// change when new checksum type added
int max_checksum = static_cast<int>(kxxHash64);
const int kNumPerFile = 2;

const auto algs = GetSupportedChecksums();
const int algs_size = static_cast<int>(algs.size());

// generate one table with each type of checksum
for (int i = 0; i <= max_checksum; ++i) {
table_options.checksum = static_cast<ChecksumType>(i);
for (int i = 0; i < algs_size; ++i) {
table_options.checksum = algs[i];
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
Reopen(options);
for (int j = 0; j < kNumPerFile; ++j) {
Expand All @@ -990,15 +992,20 @@ TEST_F(DBBasicTest, ChecksumTest) {
}

// with each valid checksum type setting...
for (int i = 0; i <= max_checksum; ++i) {
table_options.checksum = static_cast<ChecksumType>(i);
for (int i = 0; i < algs_size; ++i) {
table_options.checksum = algs[i];
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
Reopen(options);
// verify every type of checksum (should be regardless of that setting)
for (int j = 0; j < (max_checksum + 1) * kNumPerFile; ++j) {
for (int j = 0; j < algs_size * kNumPerFile; ++j) {
ASSERT_EQ(Key(j), Get(Key(j)));
}
}

// Now test invalid checksum type
table_options.checksum = static_cast<ChecksumType>(123);
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
ASSERT_TRUE(TryReopen(options).IsInvalidArgument());
}

// On Windows you can have either memory mapped file or a file
Expand Down
8 changes: 2 additions & 6 deletions db/db_test_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -475,12 +475,8 @@ Options DBTestBase::GetOptions(
case kInfiniteMaxOpenFiles:
options.max_open_files = -1;
break;
case kxxHashChecksum: {
table_options.checksum = kxxHash;
break;
}
case kxxHash64Checksum: {
table_options.checksum = kxxHash64;
case kXXH3Checksum: {
table_options.checksum = kXXH3;
break;
}
case kFIFOCompaction: {
Expand Down
3 changes: 1 addition & 2 deletions db/db_test_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -854,7 +854,7 @@ class DBTestBase : public testing::Test {
kUniversalCompactionMultiLevel = 20,
kCompressedBlockCache = 21,
kInfiniteMaxOpenFiles = 22,
kxxHashChecksum = 23,
kXXH3Checksum = 23,
kFIFOCompaction = 24,
kOptimizeFiltersForHits = 25,
kRowCache = 26,
Expand All @@ -869,7 +869,6 @@ class DBTestBase : public testing::Test {
kBlockBasedTableWithPartitionedIndexFormat4,
kPartitionedFilterWithNewTableReaderForCompactions,
kUniversalSubcompactions,
kxxHash64Checksum,
kUnorderedWrite,
// This must be the last line
kEnd,
Expand Down
6 changes: 3 additions & 3 deletions db/external_sst_file_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "db/db_test_util.h"
#include "db/dbformat.h"
#include "file/filename.h"
#include "options/options_helper.h"
#include "port/port.h"
#include "port/stack_trace.h"
#include "rocksdb/sst_file_reader.h"
Expand Down Expand Up @@ -2383,10 +2384,9 @@ TEST_F(ExternalSSTFileTest, IngestFileWrittenWithCompressionDictionary) {

// Very slow, not worth the cost to run regularly
TEST_F(ExternalSSTFileTest, DISABLED_HugeBlockChecksum) {
int max_checksum = static_cast<int>(kxxHash64);
for (int i = 0; i <= max_checksum; ++i) {
for (auto t : GetSupportedChecksums()) {
BlockBasedTableOptions table_options;
table_options.checksum = static_cast<ChecksumType>(i);
table_options.checksum = t;
Options options = CurrentOptions();
options.table_factory.reset(NewBlockBasedTableFactory(table_options));

Expand Down
1 change: 1 addition & 0 deletions include/rocksdb/table.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ enum ChecksumType : char {
kCRC32c = 0x1,
kxxHash = 0x2,
kxxHash64 = 0x3,
kXXH3 = 0x4, // Supported since RocksDB 6.27
};

// `PinningTier` is used to specify which tier of block-based tables should
Expand Down
28 changes: 21 additions & 7 deletions options/options_helper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include <cassert>
#include <cctype>
#include <cstdlib>
#include <set>
#include <unordered_set>
#include <vector>

Expand Down Expand Up @@ -329,7 +330,8 @@ std::unordered_map<std::string, ChecksumType>
OptionsHelper::checksum_type_string_map = {{"kNoChecksum", kNoChecksum},
{"kCRC32c", kCRC32c},
{"kxxHash", kxxHash},
{"kxxHash64", kxxHash64}};
{"kxxHash64", kxxHash64},
{"kXXH3", kXXH3}};

std::unordered_map<std::string, CompressionType>
OptionsHelper::compression_type_string_map = {
Expand All @@ -345,25 +347,37 @@ std::unordered_map<std::string, CompressionType>
{"kDisableCompressionOption", kDisableCompressionOption}};

std::vector<CompressionType> GetSupportedCompressions() {
std::vector<CompressionType> supported_compressions;
// std::set internally to deduplicate potential name aliases
std::set<CompressionType> supported_compressions;
for (const auto& comp_to_name : OptionsHelper::compression_type_string_map) {
CompressionType t = comp_to_name.second;
if (t != kDisableCompressionOption && CompressionTypeSupported(t)) {
supported_compressions.push_back(t);
supported_compressions.insert(t);
}
}
return supported_compressions;
return std::vector<CompressionType>(supported_compressions.begin(),
supported_compressions.end());
}

std::vector<CompressionType> GetSupportedDictCompressions() {
std::vector<CompressionType> dict_compression_types;
std::set<CompressionType> dict_compression_types;
for (const auto& comp_to_name : OptionsHelper::compression_type_string_map) {
CompressionType t = comp_to_name.second;
if (t != kDisableCompressionOption && DictCompressionTypeSupported(t)) {
dict_compression_types.push_back(t);
dict_compression_types.insert(t);
}
}
return dict_compression_types;
return std::vector<CompressionType>(dict_compression_types.begin(),
dict_compression_types.end());
}

std::vector<ChecksumType> GetSupportedChecksums() {
std::set<ChecksumType> checksum_types;
for (const auto& e : OptionsHelper::checksum_type_string_map) {
checksum_types.insert(e.second);
}
return std::vector<ChecksumType>(checksum_types.begin(),
checksum_types.end());
}

#ifndef ROCKSDB_LITE
Expand Down
2 changes: 2 additions & 0 deletions options/options_helper.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ std::vector<CompressionType> GetSupportedCompressions();

std::vector<CompressionType> GetSupportedDictCompressions();

std::vector<ChecksumType> GetSupportedChecksums();

// Checks that the combination of DBOptions and ColumnFamilyOptions are valid
Status ValidateOptions(const DBOptions& db_opts,
const ColumnFamilyOptions& cf_opts);
Expand Down
100 changes: 59 additions & 41 deletions table/block_based/block_based_table_builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1207,6 +1207,60 @@ void BlockBasedTableBuilder::CompressAndVerifyBlock(
}
}

void BlockBasedTableBuilder::ComputeBlockTrailer(
const Slice& block_contents, CompressionType compression_type,
ChecksumType checksum_type, std::array<char, kBlockTrailerSize>* trailer) {
(*trailer)[0] = compression_type;
uint32_t checksum = 0;
switch (checksum_type) {
case kNoChecksum:
break;
case kCRC32c: {
uint32_t crc =
crc32c::Value(block_contents.data(), block_contents.size());
// Extend to cover compression type
crc = crc32c::Extend(crc, trailer->data(), 1);
checksum = crc32c::Mask(crc);
break;
}
case kxxHash: {
XXH32_state_t* const state = XXH32_createState();
XXH32_reset(state, 0);
XXH32_update(state, block_contents.data(), block_contents.size());
// Extend to cover compression type
XXH32_update(state, trailer->data(), 1);
checksum = XXH32_digest(state);
XXH32_freeState(state);
break;
}
case kxxHash64: {
XXH64_state_t* const state = XXH64_createState();
XXH64_reset(state, 0);
XXH64_update(state, block_contents.data(), block_contents.size());
// Extend to cover compression type
XXH64_update(state, trailer->data(), 1);
checksum = Lower32of64(XXH64_digest(state));
XXH64_freeState(state);
break;
}
case kXXH3: {
// XXH3 is a complicated hash function that is extremely fast on
// contiguous input, but that makes its streaming support rather
// complex. It is worth custom handling of the last byte (`type`)
// in order to avoid allocating a large state object and bringing
// that code complexity into CPU working set.
checksum = Lower32of64(
XXH3_64bits(block_contents.data(), block_contents.size()));
checksum = ModifyChecksumForCompressionType(checksum, compression_type);
break;
}
default:
assert(false);
break;
}
EncodeFixed32(trailer->data() + 1, checksum);
}

void BlockBasedTableBuilder::WriteRawBlock(const Slice& block_contents,
CompressionType type,
BlockHandle* handle,
Expand All @@ -1223,50 +1277,14 @@ void BlockBasedTableBuilder::WriteRawBlock(const Slice& block_contents,
assert(io_status().ok());
io_s = r->file->Append(block_contents);
if (io_s.ok()) {
char trailer[kBlockTrailerSize];
trailer[0] = type;
uint32_t checksum = 0;
switch (r->table_options.checksum) {
case kNoChecksum:
break;
case kCRC32c: {
uint32_t crc =
crc32c::Value(block_contents.data(), block_contents.size());
// Extend to cover compression type
crc = crc32c::Extend(crc, trailer, 1);
checksum = crc32c::Mask(crc);
break;
}
case kxxHash: {
XXH32_state_t* const state = XXH32_createState();
XXH32_reset(state, 0);
XXH32_update(state, block_contents.data(), block_contents.size());
// Extend to cover compression type
XXH32_update(state, trailer, 1);
checksum = XXH32_digest(state);
XXH32_freeState(state);
break;
}
case kxxHash64: {
XXH64_state_t* const state = XXH64_createState();
XXH64_reset(state, 0);
XXH64_update(state, block_contents.data(), block_contents.size());
// Extend to cover compression type
XXH64_update(state, trailer, 1);
checksum = Lower32of64(XXH64_digest(state));
XXH64_freeState(state);
break;
}
default:
assert(false);
break;
}
EncodeFixed32(trailer + 1, checksum);
std::array<char, kBlockTrailerSize> trailer;
ComputeBlockTrailer(block_contents, type, r->table_options.checksum,
&trailer);
assert(io_s.ok());
TEST_SYNC_POINT_CALLBACK(
"BlockBasedTableBuilder::WriteRawBlock:TamperWithChecksum",
static_cast<char*>(trailer));
io_s = r->file->Append(Slice(trailer, kBlockTrailerSize));
trailer.data());
io_s = r->file->Append(Slice(trailer.data(), trailer.size()));
if (io_s.ok()) {
assert(s.ok());
bool warm_cache;
Expand Down
9 changes: 8 additions & 1 deletion table/block_based/block_based_table_builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#pragma once
#include <stdint.h>

#include <array>
#include <limits>
#include <string>
#include <utility>
Expand All @@ -20,6 +21,7 @@
#include "rocksdb/listener.h"
#include "rocksdb/options.h"
#include "rocksdb/status.h"
#include "rocksdb/table.h"
#include "table/meta_blocks.h"
#include "table/table_builder.h"
#include "util/compression.h"
Expand Down Expand Up @@ -98,6 +100,12 @@ class BlockBasedTableBuilder : public TableBuilder {
// Get file checksum function name
const char* GetFileChecksumFuncName() const override;

// Computes and populates block trailer for a block
static void ComputeBlockTrailer(const Slice& block_contents,
CompressionType compression_type,
ChecksumType checksum_type,
std::array<char, kBlockTrailerSize>* trailer);

private:
bool ok() const { return status().ok(); }

Expand All @@ -117,7 +125,6 @@ class BlockBasedTableBuilder : public TableBuilder {
BlockType block_type);
// Directly write data to the file.
void WriteRawBlock(const Slice& data, CompressionType, BlockHandle* handle,

BlockType block_type, const Slice* raw_data = nullptr);

void SetupCacheKeyPrefix(const TableBuilderOptions& tbo);
Expand Down
10 changes: 10 additions & 0 deletions table/block_based/block_based_table_factory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@
#include <string>

#include "logging/logging.h"
#include "options/options_helper.h"
#include "port/port.h"
#include "rocksdb/cache.h"
#include "rocksdb/convenience.h"
#include "rocksdb/filter_policy.h"
#include "rocksdb/flush_block_policy.h"
#include "rocksdb/rocksdb_namespace.h"
#include "rocksdb/utilities/options_type.h"
#include "table/block_based/block_based_table_builder.h"
#include "table/block_based/block_based_table_reader.h"
Expand Down Expand Up @@ -564,6 +566,14 @@ Status BlockBasedTableFactory::ValidateOptions(
"max_successive_merges larger than 0 is currently inconsistent with "
"unordered_write");
}
std::string garbage;
if (!SerializeEnum<ChecksumType>(checksum_type_string_map,
table_options_.checksum, &garbage)) {
return Status::InvalidArgument(
"Unrecognized ChecksumType for checksum: " +
ROCKSDB_NAMESPACE::ToString(
static_cast<uint32_t>(table_options_.checksum)));
}
return TableFactory::ValidateOptions(db_opts, cf_opts);
}

Expand Down
Loading

0 comments on commit a7d4bea

Please sign in to comment.