Description
Description
Implement a readonly B-Tree SST index instead of the current flat one.
This improvement opens us the possibility of not always needing to keep SST's indexes in memory. We will be able to store a large amount of "cold" data without dozens of compute nodes to load needed parts of indexes on demand. Also enhances stability of Shared Cache and removes (actually reduces) passive bytes segment.
As we write a SST during the compaction process, its indexes will be readonly. However, the B-Tree structure minimizes needed disk I/O operations.
Also we will extend the current index format with additional information about data size and erased row count which will allow us to navigate easily and count statistics without the need of fully loading the whole index into memory.
Internal design doc.
Steps
1. Support simple flat index loading in all the usages
Instead of loading all SST's indexes on a tablet start, do it on the first usage. Before that loading keep memory free.
- Add and implement an index
IIndexIter
interface that loads index on the first usage 66a3bbb - Support index loading in
TRunIt
66a3bbb - Support index loading from
IPages
implementations aab08a6 10fe953 - Support index loading in
TCharge
55d8ae0 - Support index loading in
TKeysLoader
0f2a2ff bac30c7 - Support index loading in
BuildStats
287b4bb a18f545 - Support index loading in
TForward
5ad1dcc BTreeIndex Bugfix load indexes in scan #657 - Support index loading in
TDump
cc271cd - Remove the rest direct usages of index fields from
TPart
16b0f8d - Do not count flat index pages in transaction usage e4592d6
- Do not load index on tablet start bcae19f
2. Implement readonly B-Tree index
Implement B-Tree data structure, build and write them with an SST, implement searches.
- Implement B-Tree node class with serialization and deserialization to binary pages format cf4d301
- Implement B-Tree builder that accepts data pages and writes B-Tree nodes online 523950c
- Integrate B-Tree builder with
TPartWriter
1ccc850 - Optimize B-Tree node format for groups where all row records have the same size 4c94e44 ebe7a9b
- Implement search by row id 402da95
- Implement search by key 1eda4dd
- Implement next and prev methods that can handle page faults 2f7d8b6
- Integrate
TRunIt
with a new B-Tree index iterator (TPartBtreeIndexIt
) 2f7d8b6 - Add
EnableLocalDBBtreeIndex
setting ba6b7d6 - Integrate
TKeysLoader
with a new B-Tree index iterator 897ada7
3. Implement Precharge over readonly B-Tree index
Implement Precharge that works over tree structured index instead of the old flat one.
- Add
ICharge
interface and extractChargeRange
methods e7f64cc BTreeIndex Precharge Simplification #682 - Charge main pages within given row id range BTreeIndex Precharge RowId #754
- Charge main pages within given key range BTreeIndex Charge Keys #1102
- Charge groups pages BTreeIndex Charge Groups #1224
- Treat items limit BTreeIndex Charge Items Limit #1338
- Treats bytes limit BTreeIndex Charge Bytes Limit #1489
- Charge history pages BTreeIndex Charge History #1593
- Bugfix partially-loaded tree BTreeIndex Bugfix partially-loaded tree #1674
- Check that Iterator and Charge behaviour are consistent (all needed pages are precharged) BTreeIndex Test Charge Iter consistency #1869
4. Implement Build Stats over readonly B-Tree index
Implement Build Stats that works over tree structured index instead of the old flat one. We expect that it will no longer need to fully load index and only use upper-level B-Tree index nodes.
- Implement basic version BTreeIndex Stats iterator interface #1949 BTreeIndex Stats Load index nodes #2136
- Improve it not to load all index nodes BTreeIndex Stats Visit top nodes only #2177
5. Implement Scan over readonly B-Tree index
Implement Scan that works over over tree structured index.
- Implement Scan over B-Tree index basic version BTreeIndex Scan Implement basic version #2260
- Forward-load B-Tree index in Scan BTreeIndex Forwad load #3710
6. Make fixes
Manually check suspicious places and fix what is broken with B-Tree index.
- Fix sticky pages logic after table compaction dda2617
- Keep B-Tree index in memory after compactions BTreeIndex Keep B-TreeIndex in cache after compactions #1165
- Fix B-Tree index iterator behaviour with corner pages BTreeIndex Iterator Remove single run optimization #995
- Bugfix
TSchemeShard::Clear
method BTreeIndex Bugfix SchemeShard Clear #655 - Count flat or B-Tree index to index size depending on
EnableLocalDBBtreeIndex
setting BTreeIndex Iterator benchmark #653 BTreeIndex Bugfix index size #671 - Add a possibility to turn off Flat index writing Add
EnableLocalDBFlatIndex
setting #2546 - Remove flat index usage from most of tests Fix all the test with
EnableLocalDBFlatIndex = false
EnableLocalDBBtreeIndex = true
#2547 - Fix
Cache line head don't want to do fetch but should
verify Forward cache bugfix index pages queue verify #3134 -
Limit leaf B-Tree index nodes small enough for skipping them in BuildStatsTemporary use bigger index resolution to avoid full B-Tree index loading BTreeIndex Split Flush method, use bigger resolution #3182 - Check B-Tree index format in loader #4969
- 24-1 Prevent from using B-Tree index with incorrect configuration #4925
- B-Tree keys histogram is empty or invalid in some rare cases #15235
- VERIFY AdvanceNextPage(): requirement !queue.empty() failed #18151
6. Test
Verify that everything works together not dramatically worse than before. Make tests on a slice.
- Benchmark
TRunIt
over B-Tree index BTreeIndex Iterator benchmark #653 - Benchmark Charge over B-Tree index BTreeIndex Test Charge Iter consistency #1869
- Check that
EnableLocalDBBtreeIndex
andEnableLocalDBFlatIndex
settings work - Check large table with a small amount of Shared Cache
- Check YCSB benchmark
- Check TPC-C benchmark
- Check that our changes do not effect Hive balancing (in mem size)
7. Make optimizations
- Preload indexes during part switches on followers Preload indexes on followers #4744
- ❓ Do not allocate in CreateIndexIter Forward cache simplify #2514 (comment)
8. Release
- Wait for a new release tag
- Enable
EnableLocalDBBtreeIndex
setting on pre-prod clusters (both indexes will be produced) (for some databases first) - Wait until a new release is installed and won't be rolled back on prod clusters
- Enable
EnableLocalDBBtreeIndex
setting on prod clusters (for some databases first) - Enable
EnableLocalDBBtreeIndex
setting by default - Forcibly compact idle shards to build B-Tree index #7718
- Disable
EnableLocalDBFlatIndex
setting
Metadata
Metadata
Assignees
Type
Projects
Status