Skip to content

feat(format): track invalidated_fragments for index segments#5441

Draft
wjones127 wants to merge 6 commits intolance-format:mainfrom
wjones127:feat/track-invalidated-fragments
Draft

feat(format): track invalidated_fragments for index segments#5441
wjones127 wants to merge 6 commits intolance-format:mainfrom
wjones127:feat/track-invalidated-fragments

Conversation

@wjones127
Copy link
Contributor

@wjones127 wjones127 commented Dec 9, 2025

For things like DataReplacement, we need to start tracking within the index segments whether data in the index should be considered invalidated. This is slightly different than deleted, because the rows are still there in the same fragment, they just have been replaced in a new data file.

This PR adds a field invalidated_fragments to IndexMetadata, which is another fragment bitmap. The purpose of fragment_bitmap is also clarified, particular that it should be considered immutable.

Discussion here: #5453

Closes #5322, fixes #5321

@github-actions github-actions bot added the enhancement New feature or request label Dec 9, 2025
wjones127 and others added 4 commits December 8, 2025 20:31
- Fix effective_fragment_bitmap() to subtract invalidated_fragments
- Rename prune_updated_fields_from_indices to invalidate_updated_fields_in_indices
- Add fragment IDs to invalidated_fragments instead of removing from fragment_bitmap
- Add invalidation logic for DataReplacement operations
- Fix TODOs in frag_reuse.rs and remapping.rs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 94.24084% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-table/src/format/index.rs 95.18% 3 Missing and 1 partial ⚠️
rust/lance/src/dataset/write/merge_insert.rs 93.93% 4 Missing ⚠️
rust/lance/src/dataset/optimize/remapping.rs 50.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 changed the title feat: track invalidated fragments for index segments feat: track invalidated_fragments for index segments Dec 10, 2025
@wjones127 wjones127 changed the title feat: track invalidated_fragments for index segments feat(format): track invalidated_fragments for index segments Dec 10, 2025
- test_optimize_indices_resets_invalidated_fragments: verifies merged
  indices have empty invalidated_fragments
- test_invalidated_fragments_lifecycle: verifies accumulation of
  invalidated fragments across multiple DataReplacement operations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request table format

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Format: add new current_fragment_bitmap field to IndexMetadata Getting non-existent fragment error in merge_insert

1 participant