feat!: introduce fragment reuse index to defer compaction index remap#3847
feat!: introduce fragment reuse index to defer compaction index remap#3847jackye1995 merged 7 commits intolance-format:mainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3847 +/- ##
========================================
Coverage 78.29% 78.29%
========================================
Files 281 283 +2
Lines 108112 108783 +671
Branches 108112 108783 +671
========================================
+ Hits 84641 85167 +526
- Misses 20126 20254 +128
- Partials 3345 3362 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
rust/lance-index/src/reuse.rs
Outdated
| pub async fn save_row_id_map( | ||
| index_store: Arc<dyn IndexStore>, | ||
| dataset_version: &u64, | ||
| row_id_map: &HashMap<u64, Option<u64>>, |
There was a problem hiding this comment.
This structure feels a bit unfortunate given that the common case is mapping sorted contiguous ranges to other sorted contiguous ranges. Have you considered re-using some of the U64Segment or RowIdSequence machinery?
There was a problem hiding this comment.
Yeah I used HashMap becuase these row ID maps come from the RewriteResult of optimize.rewrite_files, at that point it's already a hashmap and the sorting information is already lost. But I can make this storage more efficient with the data structure you mentioned, and then make the upstream changes in optimize based on what we arrive at here.
rust/lance-index/src/reuse.rs
Outdated
| pub async fn load_row_id_map( | ||
| file_path: &str, | ||
| store: Arc<dyn IndexStore>, | ||
| ) -> Result<HashMap<u64, Option<u64>>> { |
There was a problem hiding this comment.
If we are going to use this within the conflict resolution code path, I think we should aggressive in limiting IO requests. You can add a unit test that measures the number of read and write IO requests using StatsHolder. Here is an example test that does this:
Might need to store the file size of the index file in the metadata so that you can skip the additional head request when opening the index file.
protos/table.proto
Outdated
| // version of the dataset corresponds to the dataset_version at the time the index adds this version entry | ||
| uint64 dataset_version = 1; |
There was a problem hiding this comment.
If we want to use this for conflict resolution, we should probably add this index in the same commit as compaction.
There was a problem hiding this comment.
Yes that is the goal, this should be the same dataset version as the compaction
417f95a to
35f58ec
Compare
|
looks like this breaks backwards compatibility, will do the version bump while the CI is running |
|
Given #3882 is also breaking, I will wait to see if this PR meets the next release, if not I will bump the versions. |
protos/table.proto
Outdated
| // a roaring treemap of the changed row addresses | ||
| bytes changed_row_addrs = 2; |
There was a problem hiding this comment.
I feel like this could be explained more, like how it is used. I feel like a roaring treemap acts as a set, but we likely want a mapping?
There was a problem hiding this comment.
Or is the intended use "if you have row addrs in here, then use the old fragments instead of the new fragments while scanning / taking"?
There was a problem hiding this comment.
the changed_row_addrs + old fragment ids + new fragment ids can be used to together recover the full old address to new address mapping, and that is used for separated remap (another PR) and use for scanning/taking at read path (another PR). I can add the details there.
| use snafu::location; | ||
| use std::{any::Any, collections::HashMap, sync::Arc}; | ||
|
|
||
| pub const FRAG_REUSE_INDEX_NAME: &str = "__lance_frag_reuse"; |
There was a problem hiding this comment.
Are we going to ignore this index type in the in list_indices() output?
There was a problem hiding this comment.
No, because we still need to be able to discover this index.
There was a problem hiding this comment.
Yeah, I'm just thinking about whether it will confuse users. We should be careful when we bring this up to lancedb. I think it might error if list_indices() returns an index type it doesn't recognize.
There was a problem hiding this comment.
I see let me create an issue to track this
| .fields | ||
| .iter() | ||
| .all(|field_id| field_ids.contains(field_id)) | ||
| || existing_index.name == FRAG_REUSE_INDEX_NAME |
There was a problem hiding this comment.
Does this mean the frag reuse indices will never go away? How long will we retain them?
There was a problem hiding this comment.
after all the indexes are remapped it can go away, so depending on the reindexing process. But given we want to continuously compact and reindex at the same time, this intermediate state might always exist.
| // TODO: Can we merge these two caches into one for uniform memory management? | ||
| scalar_cache: Arc<Cache<String, Arc<dyn ScalarIndex>>>, | ||
| vector_cache: Arc<Cache<String, Arc<dyn VectorIndex>>>, | ||
| frag_reuse_cache: Arc<Cache<String, Arc<FragReuseIndex>>>, |
There was a problem hiding this comment.
🤣 😭 I will consolidate these soon...
There was a problem hiding this comment.
yeah I saw the TODO so was thinking it could be done together, currently I am just keeping things consistent...
rust/lance/src/index/frag_reuse.rs
Outdated
| let pb_sequence = InlineContent::decode(data)?; | ||
| FragReuseIndexDetails::try_from(pb_sequence) |
There was a problem hiding this comment.
Are we double caching this data? Won't it also be stored in the index cache?
There was a problem hiding this comment.
yeah you are right, the content will be in the index cache anyway so this is not necessary. Let me remove
| // When combined with the old fragment IDs and new fragment IDs, | ||
| // it can recover the full mapping of old row addresses to either new row addresses or deleted. | ||
| // this mapping can then be used to remap indexes or satisfy index queries for the new unindexed fragments. |
There was a problem hiding this comment.
Does this assume that no re-ordering happens?
There was a problem hiding this comment.
yes the order for which the old and new fragments are stored must preserve what happened at the rewrite time. That's actually a good point, the comment below should be updated, since structures like roaring bitmap cannot be used to store this, only a list. Let me update the comments.
| rechunk_stable_row_ids(dataset.as_ref(), &mut new_fragments, &fragments).await?; | ||
|
|
||
| HashMap::new() | ||
| if options.defer_index_remap { |
There was a problem hiding this comment.
Hello, It seems that there will be a empty FragReuseIndex when use_stable_row_id && defer_index_remap.
Why we need the empty FragReuseIndex?
There was a problem hiding this comment.
Hi @ztorchan I would suggest you raise a new issue to file a problem you have encountered. This work was done before the stable row id feature. And recently we have changed some logic for realizing the update-stable row ID.
There was a problem hiding this comment.
Yeah let's discuss it separately. But overall I think maybe we should use a separated flag here as use_stable_row_id_for_index. Because that's what this flag is doing here, and it assumes either you enable that flag and you don't need to remap, or you enable the other flag then you defer the remap.
BREAKING CHANGE: Signature of
dataset::optimize::commit_compactionhas renamedoptionsasremap_options, and takes additionalCompactionOptionsasoptions. Thedataset::transaction::Operation::Rewriteadds a new fieldfrag_reuse_index: Option<Index>that holds optional fragment reuse index metadata if present.This PR introduces a new index, fragment reuse index, to store compaction row address information after compactions. The index will be created when compaction has option defer_index_remap enabled.
The index is named as "fragment reuse" since the ultimate goal is to allow query to reuse old indexed fragments in replacement of new unindexed fragments for index queries. An alternative was to name it a "remap index", but that name was too confusing as the term "remap" is also a verb with functions like
remap_indexalready in the codebase.Closes #3833 #3834