-
Notifications
You must be signed in to change notification settings - Fork 717
MyRocks data dictionary format
MyRocks manages a lot of internal information such as mappings between index id and column family id, inside what we call the data dictionary
. MyRocks stores all data dictionary entries in a dedicated RocksDB column family named __system__
. We call it the System Column Family (System CF). The System CF is separated from column families used by applications. For debugging purposes, MyRocks provides information_schema tables printing data dictionary entries.
Here are some concepts to help to understand MyRocks data dictionary.
- Column Family ID: This is an ID of the column family in RocksDB. Each MyRocks index belongs to one column family. Multiple indexes can belong to one column family. So there is a 1:N mapping between column family and indexes. Column family name can be specified by setting index COMMENT syntax.
- Index ID: This is an internal auto-generated id inside MyRocks. A new index id is assigned whenever creating a new index. Index id is assigned when creating a new index. Index id is sequentially incremented and never reused across different indexes. This means you can not create more than 2^32 indexes within the same MyRocks instance in total.
- Global Index ID: Column Family ID + Index ID.
- Table Name => internal index id mappings
key: Rdb_key_def::DDL_ENTRY_INDEX_START_NUMBER(0x1) + dbname.tablename
value: version + {global_index_id}*n_indexes_of_the_table
This dictionary is updated when index definition is updated -- adding/dropping a table/index. Version is internal data dictionary version (currently hard coded as 0x1) and uses 2 bytes. Dictionary id is 4 bytes, and global index id is 8 bytes.
- Index information
key: Rdb_key_def::INDEX_INFO(0x2) + global_index_id
value: version, index_type, key_value_format_version
A row is inserted when a new index is created. When dropping an index, matched row is removed. index_type is 1 byte. Currently it is used to differentiate primary key and secondary keys. key_value_format_version is 2 bytes. The version number will be increased when format is changed. This is for keeping compatibility easier.
- CF id => CF flags
key: Rdb_key_def::CF_DEFINITION(0x3) + cf_id
value: version, {is_reverse_cf, is_auto_cf, is_per_partition_cf}
A row is inserted when new column family is created. When dropping a column family, matched row is removed. cf_flags is 4 bytes in total. Currently only three bits are used.
- Binlog entry (updated at commit)
key: Rdb_key_def::BINLOG_INFO_INDEX_NUMBER (0x4)
value: version, {binlog_name,binlog_pos,binlog_gtid}
This dictionary entry is at most one record, and updated at transaction commit (binlog commit). If binary log was disabled, this entry was not updated. Binlog name and binlog gtid are two byte length encoded, and not null terminated. Binlog pos is 4 bytes.
- Ongoing drop index entry
key: Rdb_key_def::DDL_DROP_INDEX_ONGOING(0x5) + global_index_id
value: version
This data dictionary entry was introduced to support "Fast drop/truncate table" feature in MyRocks. When dropping a table (indexes), MyRocks adds target indexes into this data dictionary then a client gets reply very quickly -- without waiting for completing drop table. MyRocks background schedules a compaction filter, periodically checking rows, and if all of rows associated with the index were removed, it deletes the index id from this data dictionary.
- Index Statistics
key: Rdb_key_def::INDEX_STATISTICS(0x6) + global_index_id
value: version, {materialized PropertiesCollector::IndexStats}
This data dictionary is added/updated/deleted if index statistics need to be changed.
- Current maximum index id
key: Rdb_key_def::CURRENT_MAX_INDEX_ID(0x7)
value: version, current max index id
This data dictionary is updated when creating a new index.
- Ongoing create index entry
key: Rdb_key_def::DDL_CREATE_INDEX_ONGOING(0x8) + global_index_id
value: version
This data dictionary entry was introduced to support "Fast secondary index creation" in MyRocks. While an index is undergoing creation in MyRocks, this entry is updated, and removed once index creation is complete. It's primary use is during crash recovery, on startup if any partially created index is found its entries are removed from within RocksDB.
Data dictionary operations are atomic inside RocksDB. For example, when creating a table with two indexes, it is necessary to call Put three times, and MyRocks does it with single WriteBatch.
Documentation license here.
Installation
MyRocks
- Overview
- Transaction
- Backup
- Performance Tuning
- Monitoring
- Migration
- Internals
- Vector Database
DocStore
- Document column type
- Document Path: a new way to query JSON data
- Built-in Functions for JSON documents
MySQL/InnoDB Enhancements