Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/src/format/table/mem_wal.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ Readers **MUST** merge results from multiple data sources (base table, flushed M

When the same primary key exists in multiple sources, the reader must keep only the newest version based on:

1. **Generation number** (`_gen`): Higher generation wins. The base table has generation -1, MemTables have positive integers starting from 1.
1. **Generation number** (`_gen`): Higher generation wins. The base table has generation 0, MemTables have positive integers starting from 1.
2. **Row address** (`_rowaddr`): Within the same generation, higher row address wins (later writes within a batch overwrite earlier ones).

The ordering for "newest" is: highest `_gen` first, then highest `_rowaddr`.
Expand Down Expand Up @@ -506,7 +506,7 @@ Datasets come from:
2. flushed MemTables (persisted but not yet merged)
3. optionally in-memory MemTables (if accessible).

Each dataset is tagged with a generation number: -1 for the base table, and positive integers for MemTable generations.
Each dataset is tagged with a generation number: 0 for the base table, and positive integers for MemTable generations.
Within a region, the generation number determines data freshness, with higher numbers representing newer data.
Rows from different regions do not need deduplication since each primary key maps to exactly one region.

Expand Down
15 changes: 15 additions & 0 deletions rust/lance-index/src/scalar/inverted/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,21 @@ impl InnerBuilder {
self.id
}

/// Set the token set for this builder.
pub fn set_tokens(&mut self, tokens: TokenSet) {
self.tokens = tokens;
}

/// Set the document set for this builder.
pub fn set_docs(&mut self, docs: DocSet) {
self.docs = docs;
}

/// Set the posting lists for this builder.
pub fn set_posting_lists(&mut self, posting_lists: Vec<PostingListBuilder>) {
self.posting_lists = posting_lists;
}

pub async fn remap(&mut self, mapping: &HashMap<u64, Option<u64>>) -> Result<()> {
// for the docs, we need to remove the rows that are removed from the doc set,
// and update the row ids of the rows that are updated
Expand Down
5 changes: 5 additions & 0 deletions rust/lance-index/src/scalar/inverted/tokenizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,11 @@ impl InvertedIndexParams {
self
}

/// Get whether positions are stored in this index.
pub fn has_positions(&self) -> bool {
self.with_position
}

pub fn max_token_length(mut self, max_token_length: Option<usize>) -> Self {
self.max_token_length = max_token_length;
self
Expand Down
4 changes: 4 additions & 0 deletions rust/lance/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -183,5 +183,9 @@ harness = false
name = "memtable_read"
harness = false

[[bench]]
name = "mem_wal_read"
harness = false

[lints]
workspace = true
Loading