Skip to content

Conversation

@touch-of-grey
Copy link
Contributor

Based on Jack's draft, separate out the scan reader part fist to be added without FTS and vector index. @jackye1995 can you take a look?

touch-of-grey and others added 2 commits January 26, 2026 19:03
This introduces an LSM (Log-Structured Merge) scanner that enables consistent
reads across multiple data sources:
- Base table (merged data, generation=0)
- Flushed MemTables (persisted, generation=1,2,...)
- Active MemTable (in-memory, highest generation)

Key components:
- LsmScanner: High-level API for LSM reads with deduplication
- LsmDataSourceCollector: Collects data sources from base table and regions
- LsmScanPlanner: Builds execution plan with Union + Dedup
- DeduplicateExec: Deduplicates by PK, keeping highest generation
- GenerationTagExec: Adds _gen and _rowaddr columns for dedup ordering

Also includes:
- mem_wal_read benchmark with DATASET_PREFIX support for S3 testing
- active_memtable_ref() method on RegionWriter for LSM integration
- Documentation fixes for generation numbering (unsigned, base=0)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added the enhancement New feature or request label Jan 29, 2026
@jackye1995
Copy link
Contributor

I have been thinking about this past 2 days, the part that we have to read each MemTable and then reverse the result feels just so inefficient to me. I think I have a good way to solve it now:

When we scan MemTable, everything is in memory, reverse scan is fine. So when we flush MemTable, we should read the whole BatchStore in reverse order. This means the indexes also need to reverse the row position mapping, so the new row position is length_of_batch_store - current_position - 1. By doing so, all the flushed MemTables are ordered from newest to oldest, not oldest to newest, so we can do the K-way merge much more efficiently.

What do you think?

@touch-of-grey
Copy link
Contributor Author

Makes sense! Let me try update based on the current draft

touch-of-grey and others added 2 commits January 29, 2026 17:18
When flushing MemTable to disk, write data in reverse order (newest to
oldest) so flushed generations are pre-sorted for K-way merge during
LSM scan. This eliminates the need to reverse data during reads.

Key changes:
- BatchStore: add to_vec_reversed() that reverses batch order and rows
- MemTable: add scan_batches_reversed() returning (batches, total_rows)
- Flush: use reversed batches and pass total_rows to index creation
- BTree index: add to_training_batches_reversed() with mapped positions
- IVF-PQ index: add to_partition_batches_reversed() with mapped positions

Row position mapping formula: flushed_pos = total_rows - original_pos - 1

Co-Authored-By: Jack Ye <yezhaoqin@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When flushing MemTable to disk, write FTS index files directly from the
in-memory FTS index without re-tokenizing the documents. This avoids
duplicate tokenization work during flush.

Key changes:
- FtsMemIndex: add to_index_builder_reversed() that exports index data
  with reversed row positions for proper LSM ordering
- InnerBuilder: add set_tokens/set_docs/set_posting_lists setters
- InvertedIndexParams: add has_positions() getter
- Flush: create_fts_indexes() now uses direct flush from in-memory data
  and properly commits index metadata to dataset manifest

Row position mapping formula: flushed_pos = total_rows - original_pos - 1

Co-Authored-By: Jack Ye <yezhaoqin@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change `to_vec_reversed()` to return `Result` instead of panicking
  on Arrow take kernel or RecordBatch creation errors
- Replace `expect()` calls in `to_index_builder_reversed()` with
  proper `Error::io` returns for defensive error handling
- Update callers to propagate errors appropriately

Co-Authored-By: Jack Ye <yezhaoqin@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants