Releases: facebook/rocksdb
Releases · facebook/rocksdb
RocksDB 9.11.2 Release
Rocksdb Change Log
NOTE: Entries for next release do not go here. Follow instructions in
unreleased_history/README.txt
9.11.2 (03/29/2025)
Bump patch version to fix a mistake in the previous 9.11 release tag
9.11.1 (02/19/2025)
New Features
- Added the ability to plug-in a custom table reader implementation. See include/rocksdb/external_table_reader.h for more details.
9.11.0 (01/17/2025)
New Features
- Introduce CancelAwaitingJobs() in CompactionService interface which will allow users to implement cancellation of running remote compactions from the primary instance
- Experimental feature: RocksDB now supports defining secondary indices, which are automatically maintained by the storage engine. Secondary indices provide a new customization point: applications can provide their own by implementing the new
SecondaryIndex
interface. See theSecondaryIndex
API comments for more details. Note: this feature is currently only available in conjunction with write-committed pessimistic transactions, andMerge
is not yet supported. - Provide a new option
track_and_verify_wals
to track and verify various information about WAL during WAL recovery. This is intended to be a better replacement totrack_and_verify_wals_in_manifest
.
Public API Changes
- Add
io_buffer_size
to BackupEngineOptions to enable optimal configuration of IO size - Clean up all the references to
random_access_max_buffer_size
, related rules and all the clients wrappers. This option has been officially deprecated in 5.4.0. - Add
file_ingestion_nanos
andfile_ingestion_blocking_live_writes_nanos
in PerfContext to observe file ingestions - Offer new DB::Open and variants that use
std::unique_ptr<DB>*
output parameters and deprecate the old versions that useDB**
output parameters. - The DB::DeleteFile API is officially deprecated.
Behavior Changes
- For leveled compaction, manual compaction (CompactRange()) will be more strict about keeping compaction size under
max_compaction_bytes
. This prevents overly large compactions in some cases (#13306). - Experimental tiering options
preclude_last_level_data_seconds
andpreserve_internal_time_seconds
are now mutable withSetOptions()
. Some changes to handling of these features along with long-lived snapshots and range deletes made this possible.
Bug Fixes
- Fix a longstanding major bug with SetOptions() in which setting changes can be quietly reverted.
RocksDB 10.0.1 Release
10.0.1 (03/05/2025)
Public API Changes
- Add an unordered map of name/value pairs, ReadOptions::property_bag, to pass opaque options through to an external table when creating an Iterator.
- Introduced CompactionServiceJobStatus::kAborted to allow handling aborted scenario in Schedule(), Wait() or OnInstallation() APIs in Remote Compactions.
- Added a column family option disallow_memtable_writes to safely fail any attempts to write to a non-default column family. This can be used for column families that are ingest only.
10.0.0 (02/21/2025)
New Features
- Introduced new
auto_refresh_iterator_with_snapshot
opt-in knob that (when enabled) will periodically release obsolete memory and storage resources for as long as the iterator is making progress and its suppliedread_options.snapshot
was initialized with non-nullptr value. - Added the ability to plug-in a custom table reader implementation. See include/rocksdb/external_table_reader.h for more details.
- Experimental feature: RocksDB now supports FAISS inverted file based indices via the secondary indexing framework. Applications can use FAISS secondary indices to automatically quantize embeddings and perform K-nearest-neighbors similarity searches. See
FaissIVFIndex
andSecondaryIndex
for more details. Note: the FAISS integration currently requires using the BUCK build. - Add new DB property
num_running_compaction_sorted_runs
that tracks the number of sorted runs being processed by currently running compactions - Experimental feature: added support for simple secondary indices that index the specified column as-is. See
SimpleSecondaryIndex
andSecondaryIndex
for more details. - Added new
TransactionDBOptions::txn_commit_bypass_memtable_threshold
, which enables optimized transaction commit (seeTransactionOptions::commit_bypass_memtable
) when the transaction size exceeds a configured threshold.
Public API Changes
- Updated the query API of the experimental secondary indexing feature by removing the earlier
SecondaryIndex::NewIterator
virtual and adding aSecondaryIndexIterator
class that can be utilized by applications to find the primary keys for a given search target. - Added back the ability to leverage the primary key when building secondary index entries. This involved changes to the signatures of
SecondaryIndex::GetSecondary{KeyPrefix,Value}
as well as the addition of a new methodSecondaryIndex::FinalizeSecondaryKeyPrefix
. See the API comments for more details. - Minimum supported version of ZSTD is now 1.4.0, for code simplification. Obsolete
CompressionType
kZSTDNotFinalCompression
is also removed.
Behavior Changes
VerifyBackup
inverify_with_checksum
=true
mode will now evaluate checksums in parallel. As a result, unlike in case of original implementation, the API won't bail out on a very first corruption / mismatch and instead will iterate over all the backup files logging success / degree_of_failure for each.- Reversed the order of updates to the same key in WriteBatchWithIndex. This means if there are multiple updates to the same key, the most recent update is ordered first. This affects the output of WBWIIterator. When WriteBatchWithIndex is created with
overwrite_key=true
, this affects the output only if Merge is used (#13387). - Added support for Merge operations in transactions using option
TransactionOptions::commit_bypass_memtable
.
Bug Fixes
- Fixed GetMergeOperands() API in ReadOnlyDB and SecondaryDB
- Fix a bug in
GetMergeOperands()
that can return incorrect status (MergeInProgress) and incorrect number of merge operands. This can happen whenGetMergeOperandsOptions::continue_cb
is set, both active and immutable memtables have merge operands and the callback stops the look up at the immutable memtable.
RocksDB 9.10.0 Release
9.10.0 (2024-12-12)
New Features
- Introduce
TransactionOptions::commit_bypass_memtable
to enable transaction commit to bypass memtable insertions. This can be beneficial for transactions with many operations, as it reduces commit time that is mostly spent on memtable insertion.
Public API Changes
- Deprecated Remote Compaction APIs (StartV2, WaitForCompleteV2) are completely removed from the codebase
Behavior Changes
- DB::KeyMayExist() now follows its function comment, which means
value
parameter can be null, and it will be set only ifvalue_found
is passed in.
Bug Fixes
- Fix the issue where compaction incorrectly drops a key when there is a snapshot with a sequence number of zero.
- Honor ConfigOptions.ignore_unknown_options in ParseStruct()
Performance Improvements
- Enable reuse of file system allocated buffer for synchronous prefetching.
- In buffered IO mode, try to align writes on power of 2 if checksum handoff is not enabled for the file type being written.
RocksDB release 9.9.3
9.9.3 (2024-12-03)
Performance Improvements
- In buffered IO mode, try to align writes on power of 2 if checksum handoff is not enabled for the file type being written.
9.9.2 (2024-11-22)
Bug Fixes
- Honor ConfigOptions.ignore_unknown_options in ParseStruct()
9.9.1 (2024-11-30)
Behavior Changes
- Updates the hidden hook RocksDbThreadYieldAndCheckAbort() to support MySQL to abort long-running query.
9.9.0 (2024-11-18)
New Features
- Multi-Column-Family-Iterator (CoalescingIterator/AttributeGroupIterator) is no longer marked as experimental
- Adds a new table property "rocksdb.newest.key.time" which records the unix timestamp of the newest key. Uses this table property for FIFO TTL and temperature change compaction.
Public API Changes
- Added a new API
Transaction::GetAttributeGroupIterator
that can be used to create a multi-column-family attribute group iterator over the specified column families, including the data from both the transaction and the underlying database. This API is currently supported for optimistic and write-committed pessimistic transactions. - Added a new API
Transaction::GetCoalescingIterator
that can be used to create a multi-column-family coalescing iterator over the specified column families, including the data from both the transaction and the underlying database. This API is currently supported for optimistic and write-committed pessimistic transactions.
Behavior Changes
BaseDeltaIterator
now honors the read optionallow_unprepared_value
.
Bug Fixes
BaseDeltaIterator
now callsPrepareValue
on the base iterator in case it has been created with theallow_unprepared_value
read option set. Earlier, such base iterators could lead to incorrect values being exposed fromBaseDeltaIterator
.- Fix a leak of obsolete blob files left open until DB::Close(). This bug was introduced in version 9.4.0.
- Fix missing cases of corruption retry during DB open and read API processing.
- Fix a bug for transaction db with 2pc where an old WAL may be retained longer than needed (#13127).
- Fix leaks of some open SST files (until
DB::Close()
) that are written but never become live due to various failures. (We now have a check for such leaks with no outstanding issues.) - Fix a bug for replaying WALs for WriteCommitted transaction DB when its user-defined timestamps setting is toggled on/off between DB sessions.
Performance Improvements
- Fix regression in issue #12038 due to
Options::compaction_readahead_size
greater thanmax_sectors_kb
(i.e, largest I/O size that the OS issues to a block device defined in linux)
RocksDB 9.8.4
9.8.4 (2024-11-18)
Behavior Changes
- When Remote Compaction is enabled, do not purge OPTIONS file immediately by DeleteObsoleteOptionsFiles() after SetOptions(). Rely on PurgeObsoleteFiles() to clean up obsolete OPTIONS file after each compaction.
9.8.3 (2024-11-12)
Bug Fixes
- Fix missing cases of corruption retry during DB open and read API processing.
9.8.2 (2024-11-06)
Public API Changes
- Added a new API
Transaction::GetAttributeGroupIterator
that can be used to create a multi-column-family attribute group iterator over the specified column families, including the data from both the transaction and the underlying database. This API is currently supported for optimistic and write-committed pessimistic transactions.
Behavior Changes
BaseDeltaIterator
now honors the read optionallow_unprepared_value
.
Bug Fixes
BaseDeltaIterator
now callsPrepareValue
on the base iterator in case it has been created with theallow_unprepared_value
read option set. Earlier, such base iterators could lead to incorrect values being exposed fromBaseDeltaIterator
.- Fix a bug for replaying WALs for WriteCommitted transaction DB when its user-defined timestamps setting is toggled on/off between DB sessions.
9.8.1 (2024-10-31)
Bug Fixes
- Fix a leak of obsolete blob files left open until DB::Close(). This bug was introduced in version
9.4.0.
9.8.0 (2024-10-25)
New Features
- All non-
block_cache
options inBlockBasedTableOptions
are now mutable withDB::SetOptions()
.
See also Bug Fixes below. - When using iterators with BlobDB, it is now possible to load large values on an on-demand basis, i
.e. only if they are actually needed by the application. This can save I/O in use cases where the va
lues associated with certain keys are not needed. For more details, see the new read optionallow_u nprepared_value
and the iterator APIPrepareValue
. - Add a new file ingestion option
IngestExternalFileOptions::fill_cache
to support not adding bloc
ks from ingested files into block cache during file ingestion. - The option
allow_unprepared_value
is now also supported for multi-column-family iterators (i.e.
CoalescingIterator
andAttributeGroupIterator
). - When a file with just one range deletion (standalone range deletion file) is ingested via bulk loa
ding, it will be marked for compaction. During compaction, this type of files can be used to directl
y filter out some input files that are not protected by any snapshots and completely deleted by the
standalone range deletion file.
Behavior Changes
- During file ingestion, overlapping files level assignment are done in multiple batches, so that th
ey can potentially be assigned to lower levels other than always land on L0. - OPTIONS file to be loaded by remote worker is now preserved so that it does not get purged by the
primary host. A similar technique as how we are preserving new SST files from getting purged is used
for this. min_options_file_numbers_ is tracked like pending_outputs_ is tracked. - Trim readahead_size during scans so data blocks containing keys that are not in the same prefix as
the seek key inSeek()
are not prefetched whenReadOptions::auto_readahead_size=true
(default v
alue) andReadOptions::prefix_same_as_start = true
- Assigning levels for external files are done in the same way for universal compaction and leveled
compaction. The old behavior tends to assign files to L0 while the new behavior will assign the file
s to the lowest level possible.
Bug Fixes
- Fix a longstanding race condition in SetOptions for
block_based_table_factory
options. The fix h
as some subtle behavior changes because of copying and replacing the TableFactory on a change with S
etOptions, including requiring an Iterator::Refresh() for an existing Iterator to use the latest opt
ions. - Fix under counting of allocated memory in the compressed secondary cache due to looking at the com
pressed block size rather than the actual memory allocated, which could be larger due to internal fr
agmentation. GetApproximateMemTableStats()
could return disastrously bad estimates 5-25% of the time. The fun
ction has been re-engineered to return much better estimates with similar CPU cost.- Skip insertion of compressed blocks in the secondary cache if the lowest_used_cache_tier DB option
is kVolatileTier. - Fix an issue in level compaction where a small CF with small compaction debt can cause the DB to a
llow parallel compactions. (#13054) - Several DB option settings could be lost through
GetOptionsFromString()
, possibly elsewhere as w
ell. Affected options, now fixed:background_close_inactive_wals
,write_dbid_to_manifest
,write_ identity_file
,prefix_seek_opt_in_only
RocksDB 9.7.4
9.7.4 (2024-10-31)
Bug Fixes
- Fix a leak of obsolete blob files left open until DB::Close(). This bug was introduced in version 9.4.0.
9.7.3 (2024-10-16)
Behavior Changes
- OPTIONS file to be loaded by remote worker is now preserved so that it does not get purged by the primary host. A similar technique as how we are preserving new SST files from getting purged is used for this. min_options_file_numbers_ is tracked like pending_outputs_ is tracked.
9.7.2 (2024-10-08)
Bug Fixes
- Fix a bug for surfacing write unix time:
Iterator::GetProperty("rocksdb.iterator.write-time")
for non-L0 files.
9.7.1 (2024-09-26)
Bug Fixes
- Several DB option settings could be lost through
GetOptionsFromString()
, possibly elsewhere as well. Affected options, now fixed:background_close_inactive_wals
,write_dbid_to_manifest
,write_identity_file
,prefix_seek_opt_in_only
- Fix under counting of allocated memory in the compressed secondary cache due to looking at the compressed block size rather than the actual memory allocated, which could be larger due to internal fragmentation.
- Skip insertion of compressed blocks in the secondary cache if the lowest_used_cache_tier DB option is kVolatileTier.
9.7.0 (2024-09-20)
New Features
- Make Cache a customizable class that can be instantiated by the object registry.
- Add new option
prefix_seek_opt_in_only
that makes iterators generally safer when you might set aprefix_extractor
. Whenprefix_seek_opt_in_only=true
, which is expected to be the future default, prefix seek is only used whenprefix_same_as_start
orauto_prefix_mode
are set. Also,prefix_same_as_start
andauto_prefix_mode
now allow prefix filtering even withtotal_order_seek=true
. - Add a new table property "rocksdb.key.largest.seqno" which records the largest sequence number of all keys in file. It is verified to be zero during SST file ingestion.
Behavior Changes
- Changed the semantics of the BlobDB configuration option
blob_garbage_collection_force_threshold
to define
a threshold for the overall garbage ratio of all blob files currently eligible for garbage collection (accordin
g toblob_garbage_collection_age_cutoff
). This can provide better control over space amplification at the cos
t of slightly higher write amplification. - Set
write_dbid_to_manifest=true
by default. This means DB ID will now be preserved through backups, checkpo
ints, etc. by default. Also addwrite_identity_file
option which can be set to false for anticipated future b
ehavior. - In FIFO compaction, compactions for changing file temperature (configured by option
file_temperature_age_thr esholds
) will compact one file at a time, instead of merging multiple eligible file together (#13018). - Support ingesting db generated files using hard link, i.e. IngestExternalFileOptions::move_files/link_files a
nd IngestExternalFileOptions::allow_db_generated_files. - Add a new file ingestion option
IngestExternalFileOptions::link_files
to hard link input files and preserve
original files links after ingestion. - DB::Close now untracks files in SstFileManager, making avaialble any space used
by them. Prior to this change they would be orphaned until the DB is re-opened.
Bug Fixes
- Fix a bug in CompactRange() where result files may not be compacted in any future compaction. This can only h
appen when users configure CompactRangeOptions::change_level to true and the change level step of manual compac
tion fails (#13009). - Fix handling of dynamic change of
prefix_extractor
with memtable prefix filter. Previously, prefix seek cou
ld mix different prefix interpretations between memtable and SST files. Now the latestprefix_extractor
at th
e time of iterator creation or refresh is respected. - Fix a bug with manual_wal_flush and auto error recovery from WAL failure that may cause CFs to be inconsisten
t (#12995). The fix will set potential WAL write failure as fatal error when manual_wal_flush is true, and disa
bles auto error recovery from these errors.
RocksDB 9.6.2
9.6.2 (10/31/2024)
Bug Fixes
- Fix a leak of obsolete blob files left open until DB::Close(). This bug was introduced in version 9.4.0.
9.6.1 (08/24/2024)
Bug Fixes
- Fix correctness of MultiGet across column families with user timestamp.
9.6.0 (08/19/2024)
New Features
- *Best efforts recovery supports recovering to incomplete Version with a clean seqno cut that presents a valid point in time view from the user's perspective, if versioning history doesn't include atomic flush.
- New option
BlockBasedTableOptions::decouple_partitioned_filters
should improve efficiency in serving read queries because filter and index partitions can consistently target the configuredmetadata_block_size
. This option is currently opt-in. - Introduce a new mutable CF option
paranoid_memory_checks
. It enables additional validation on data integrity during reads/scanning. Currently, skip list based memtable will validate key ordering during look up and scans.
Public API Changes
- Add ticker stats to count file read retries due to checksum mismatch
- Adds optional installation callback function for remote compaction
Behavior Changes
- There may be less intra-L0 compaction triggered by total L0 size being too small. We now use compensated file size (tombstones are assigned some value size) when calculating L0 size and reduce the threshold for L0 size limit. This is to avoid accumulating too much data/tombstones in L0.
Bug Fixes
- *Make DestroyDB supports slow deletion when it's configured in
SstFileManager
. The slow deletion is subject to the configuredrate_bytes_per_sec
, but not subject to themax_trash_db_ratio
. - Fixed a bug where we set unprep_seqs_ even when WriteImpl() fails. This was caught by stress test write fault injection in WriteImpl(). This may have incorrectly caused iteration creation failure for unvalidated writes or returned wrong result for WriteUnpreparedTxn::GetUnpreparedSequenceNumbers().
- Fixed a bug where successful write right after error recovery for last failed write finishes causes duplicate WAL entries
- Fixed a data race involving the background error status in
unordered_write
mode. - *Fix a bug where file snapshot functions like backup, checkpoint may attempt to copy a non-existing manifest
file. #12882 - Fix a bug where per kv checksum corruption may be ignored in MultiGet().
- Fix a race condition in pessimistic transactions that could allow multiple transactions with the same name to
be registered simultaneously, resulting in a crash or other unpredictable behavior.
RocksDB 9.7.3
9.7.3 (2024-10-16)
Behavior Changes
- OPTIONS file to be loaded by remote worker is now preserved so that it does not get purged by the primary host. A similar technique as how we are preserving new SST files from getting purged is used for this. min_options_file_numbers_ is tracked like pending_outputs_ is tracked.
9.7.2 (2024-08-10)
Bug Fixes
- Fix a bug for surfacing write unix time:
Iterator::GetProperty("rocksdb.iterator.write-time")
for non-L0 files.
9.7.1 (2024-09-26)
Bug Fixes
- Several DB option settings could be lost through
GetOptionsFromString()
, possibly elsewhere as well. Affected options, now fixed:background_close_inactive_wals
,write_dbid_to_manifest
,write_identity_file
,prefix_seek_opt_in_only
- Fix under counting of allocated memory in the compressed secondary cache due to looking at the compressed block size rather than the actual memory allocated, which could be larger due to internal fragmentation.
- Skip insertion of compressed blocks in the secondary cache if the lowest_used_cache_tier DB option is kVolatileTier.
9.7.0 (2024-09-20)
New Features
- Make Cache a customizable class that can be instantiated by the object registry.
- Add new option
prefix_seek_opt_in_only
that makes iterators generally safer when you might set aprefix_extractor
. Whenprefix_seek_opt_in_only=true
, which is expected to be the future default, prefix seek is only used whenprefix_same_as_start
orauto_prefix_mode
are set. Also,prefix_same_as_start
andauto_prefix_mode
now allow prefix filtering even withtotal_order_seek=true
. - Add a new table property "rocksdb.key.largest.seqno" which records the largest sequence number of all keys in file. It is verified to be zero during SST file ingestion.
Behavior Changes
- Changed the semantics of the BlobDB configuration option
blob_garbage_collection_force_threshold
to define a threshold for the overall garbage ratio of all blob files currently eligible for garbage collection (according toblob_garbage_collection_age_cutoff
). This can provide better control over space amplification at the cost of slightly higher write amplification. - Set
write_dbid_to_manifest=true
by default. This means DB ID will now be preserved through backups, checkpoints, etc. by default. Also addwrite_identity_file
option which can be set to false for anticipated future behavior. - In FIFO compaction, compactions for changing file temperature (configured by option
file_temperature_age_thresholds
) will compact one file at a time, instead of merging multiple eligible file together (#13018). - Support ingesting db generated files using hard link, i.e. IngestExternalFileOptions::move_files/link_files and IngestExternalFileOptions::allow_db_generated_files.
- Add a new file ingestion option
IngestExternalFileOptions::link_files
to hard link input files and preserve original files links after ingestion. - DB::Close now untracks files in SstFileManager, making avaialble any space used
by them. Prior to this change they would be orphaned until the DB is re-opened.
Bug Fixes
- Fix a bug in CompactRange() where result files may not be compacted in any future compaction. This can only happen when users configure CompactRangeOptions::change_level to true and the change level step of manual compaction fails (#13009).
- Fix handling of dynamic change of
prefix_extractor
with memtable prefix filter. Previously, prefix seek could mix different prefix interpretations between memtable and SST files. Now the latestprefix_extractor
at the time of iterator creation or refresh is respected. - Fix a bug with manual_wal_flush and auto error recovery from WAL failure that may cause CFs to be inconsistent (#12995). The fix will set potential WAL write failure as fatal error when manual_wal_flush is true, and disables auto error recovery from these errors.
RocksDB 9.6.1
9.6.1 (2024-08-24)
Bug Fixes
- Fix correctness of MultiGet across column families with user timestamp.
9.6.0 (2024-08-19)
New Features
- *Best efforts recovery supports recovering to incomplete Version with a clean seqno cut that presents a valid point in time view from the user's perspective, if versioning history doesn't include atomic flush.
- New option
BlockBasedTableOptions::decouple_partitioned_filters
should improve efficiency in serving read queries because filter and index partitions can consistently target the configuredmetadata_block_size
. This option is currently opt-in. - Introduce a new mutable CF option
paranoid_memory_checks
. It enables additional validation on data integrity during reads/scanning. Currently, skip list based memtable will validate key ordering during look up and scans.
Public API Changes
- Add ticker stats to count file read retries due to checksum mismatch
- Adds optional installation callback function for remote compaction
Behavior Changes
- There may be less intra-L0 compaction triggered by total L0 size being too small. We now use compensated file size (tombstones are assigned some value size) when calculating L0 size and reduce the threshold for L0 size limit. This is to avoid accumulating too much data/tombstones in L0.
Bug Fixes
- *Make DestroyDB supports slow deletion when it's configured in
SstFileManager
. The slow deletion is subject to the configuredrate_bytes_per_sec
, but not subject to themax_trash_db_ratio
. - Fixed a bug where we set unprep_seqs_ even when WriteImpl() fails. This was caught by stress test write fault injection in WriteImpl(). This may have incorrectly caused iteration creation failure for unvalidated writes or returned wrong result for WriteUnpreparedTxn::GetUnpreparedSequenceNumbers().
- Fixed a bug where successful write right after error recovery for last failed write finishes causes duplicate WAL entries
- Fixed a data race involving the background error status in
unordered_write
mode. - *Fix a bug where file snapshot functions like backup, checkpoint may attempt to copy a non-existing manifest file. #12882
- Fix a bug where per kv checksum corruption may be ignored in MultiGet().
- Fix a race condition in pessimistic transactions that could allow multiple transactions with the same name to be registered simultaneously, resulting in a crash or other unpredictable behavior.
RocksDB 9.5.2
9.5.2 (2024-08-13)
Bug Fixes
- Fix a race condition in pessimistic transactions that could allow multiple transactions with the same name to be registered simultaneously, resulting in a crash or other unpredictable behavior.
Public API Changes
- Add ticker stats to count file read retries due to checksum mismatch
9.5.1 (2024-08-02)
Bug Fixes
- *Make DestroyDB supports slow deletion when it's configured in
SstFileManager
. The slow deletion is subject to the configuredrate_bytes_per_sec
, but not subject to themax_trash_db_ratio
.
9.5.0 (2024-07-19)
Public API Changes
- Introduced new C API function rocksdb_writebatch_iterate_cf for column family-aware iteration over the contents of a WriteBatch
- Add support to ingest SST files generated by a DB instead of SstFileWriter. This can be enabled with experimental option
IngestExternalFileOptions::allow_db_generated_files
.
Behavior Changes
- When calculating total log size for the
log_size_for_flush
argument inCreateCheckpoint
API, the size of the archived log will not be included to avoid unnecessary flush
Bug Fixes
- Fix a major bug in which an iterator using prefix filtering and SeekForPrev might miss data when the DB is using
whole_key_filtering=false
andpartition_filters=true
. - Fixed a bug where
OnErrorRecoveryBegin()
is not called before auto recovery starts. - Fixed a bug where event listener reads ErrorHandler's
bg_error_
member without holding db mutex(#12803). - Fixed a bug in handling MANIFEST write error that caused the latest valid MANIFEST file to get deleted, resulting in the DB being unopenable.
- Fixed a race between error recovery due to manifest sync or write failure and external SST file ingestion. Both attempt to write a new manifest file, which causes an assertion failure.
Performance Improvements
- Fix an issue where compactions were opening table files and reading table properties while holding db mutex_.
- Reduce unnecessary filesystem queries and DB mutex acquires in creating backups and checkpoints.