Skip to content

👑 Readonly B-Tree SST index #1483

Open
@kunga

Description

@kunga

Description

Implement a readonly B-Tree SST index instead of the current flat one.

This improvement opens us the possibility of not always needing to keep SST's indexes in memory. We will be able to store a large amount of "cold" data without dozens of compute nodes to load needed parts of indexes on demand. Also enhances stability of Shared Cache and removes (actually reduces) passive bytes segment.

As we write a SST during the compaction process, its indexes will be readonly. However, the B-Tree structure minimizes needed disk I/O operations.

Also we will extend the current index format with additional information about data size and erased row count which will allow us to navigate easily and count statistics without the need of fully loading the whole index into memory.

Internal design doc.

Steps

1. Support simple flat index loading in all the usages

Instead of loading all SST's indexes on a tablet start, do it on the first usage. Before that loading keep memory free.

2. Implement readonly B-Tree index

Implement B-Tree data structure, build and write them with an SST, implement searches.

  • Implement B-Tree node class with serialization and deserialization to binary pages format cf4d301
  • Implement B-Tree builder that accepts data pages and writes B-Tree nodes online 523950c
  • Integrate B-Tree builder with TPartWriter 1ccc850
  • Optimize B-Tree node format for groups where all row records have the same size 4c94e44 ebe7a9b
  • Implement search by row id 402da95
  • Implement search by key 1eda4dd
  • Implement next and prev methods that can handle page faults 2f7d8b6
  • Integrate TRunIt with a new B-Tree index iterator (TPartBtreeIndexIt) 2f7d8b6
  • Add EnableLocalDBBtreeIndex setting ba6b7d6
  • Integrate TKeysLoader with a new B-Tree index iterator 897ada7

3. Implement Precharge over readonly B-Tree index

Implement Precharge that works over tree structured index instead of the old flat one.

4. Implement Build Stats over readonly B-Tree index

Implement Build Stats that works over tree structured index instead of the old flat one. We expect that it will no longer need to fully load index and only use upper-level B-Tree index nodes.

5. Implement Scan over readonly B-Tree index

Implement Scan that works over over tree structured index.

6. Make fixes

Manually check suspicious places and fix what is broken with B-Tree index.

6. Test

Verify that everything works together not dramatically worse than before. Make tests on a slice.

7. Make optimizations

8. Release

  • Wait for a new release tag
  • Enable EnableLocalDBBtreeIndex setting on pre-prod clusters (both indexes will be produced) (for some databases first)
  • Wait until a new release is installed and won't be rolled back on prod clusters
  • Enable EnableLocalDBBtreeIndex setting on prod clusters (for some databases first)
  • Enable EnableLocalDBBtreeIndex setting by default
  • Forcibly compact idle shards to build B-Tree index  #7718
  • Disable EnableLocalDBFlatIndex setting

Metadata

Metadata

Assignees

Labels

area/datashardIssues related to datashard tablets (relational table partitions)epic

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions