Completed Inline Table Maintenance #6005
Replies: 2 comments 2 replies
-
Agree that we should have a commit hook in lance. Wondering if we should build this ability by providing a commitTable spec in I think the term "table service" mentioned in this topic might be slightly misleading. IMO, a table maintenance service represents a distinct layer in the data ecosystem—at the very least, it should sit above the metadata service. As I understand it, the main idea is to complete the commit hook and provide the ability to perform compaction and optimizing_indices after commit. However, I'm a bit concerned about the workload implications in a production environment: for cleanup, the workload is primarily metadata scanning and garbage collection, which is relatively lightweight. But index optimization and fragment compaction involve substantial data processing—I don't think it's a good idea to execute these directly within a commit operator. I think an ideal solution for inline optimization might be to provide dedicated optimizing_indices and compaction operators that trigger asynchronously after commit in each engine. This would likely be a separate effort. What do you think? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, thank you for sharing this! Before discussing the tech details, can you demonstrate how will our users use the new API and what's the difference? |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Completed Inline Table Maintenance
BackGround
Currently, Lance supports various kind of table maintenance, including cleanup, optimize_indices(create delta index or merge index), and Compaction. To maintain a healthy Lance table, users typically need to:
Overall, this imposes a high burden on users, and the maintenance difficulty and dependencies of multiple tasks are complex.
Currently, for Clean, Lance supports providing a Clean up Hook after Commit. We can extend this idea by encapsulating numerous table maintenance operations into TableMaintainer. After the Commit is completed, corresponding operations will be executed according to the strategy, so as to reduce the user's burden and lower the threshold for use.
Summary
Introduce a unified TableMaintainer abstraction to run table maintenance actions inline after a successful
user commit, including:
Provide a certain degree of intelligent inference to determine which table maintenance operations should be triggered and which should be ignored
Goals
Covered Table maintenance operations
1. Configuration
Add rust/lance/src/dataset/table_tablemaintainer.rs (or similar):
2. TableMaintainer
3. Runner: TableMaintenanceRunner
Responsibilities
Execution Order (Decision Complete)
Failure Policy Semantics
4. Hook Point & Recursion Guard
4.1 Centralize post-commit behavior in Dataset::apply_commit
Refactor so low-level commit code only commits manifests and returns (no auto-cleanup in io/commit.rs).
Dataset::apply_commit:
4.2 Recursion Guard
Introduce internal options:
Beta Was this translation helpful? Give feedback.
All reactions