Replies: 7 comments 7 replies
-
|
In general I like the idea. You list one of the goals as "support time travel". However, if a manifest has been archived, we can't check it out right? How do archive files support time travel? I do like the idea of keeping the archive around for auditing and debugging a dataset. I think you are missing a message for |
Beta Was this translation helpful? Give feedback.
-
|
Overall I like the idea, I think we all want something like this, and I agree that doing it separately is better than making it a system index that could produce new versions. For existing system indexes, we want the related operation to check conflict with existing operations, so doing them as a part of table manifest makes sense. Here the case is the reverse, we don't want to interfere with table manifest versioning further. But just for the purpose of time travel, I didn't fully get the perf benefit. The archived file would be larger, and we will have to read those files, so the performance is not necessarily better than doing a binary search (or multiway search if we want to be even faster and fully leverage object store bandwidth) Just to brainstorm here, 2 approaches I could think of to accelerate the time travel: alternative 1: oldest_version_hint.jsonWhen cleanup, write a The problem of this approach is that if we keep may tags and branches, the range has holes, and could be unevenly distributed, causing search to be inefficient. alternative 2: checkpoint.binpbWhen cleanup, directly write a map of Then for any version higher than the checkpoint file's latest tracked version, we can do binary search if time travel falls out of the range. I feel we probably need this to be truly performant? |
Beta Was this translation helpful? Give feedback.
-
|
I've modified TL,DRThe Version Archive feature improves However, the current block only handles the case where Test MethodologyTest ScenarioWe measure the time to list all versions of a dataset under two conditions:
For each test we use 3 seconds to warnup and run 10 samples and take the average value. Test Variables
Test Environment
ResultsDetailed ResultsS3 Benchmark Results
Local Benchmark Results
Performance Speedup SummaryThe following charts show the performance improvement (speedup) when using Version Archive: S3 Storage
Local Storage
Response Time ComparisonS3 Storage
Local Storage
AnalysisKey Findings
Why It WorksWithout Version Archive:
With Version Archive:
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
I recall that delta-rs has a similar feature, where evolutions go from JSON to PB to Parquet. I suggest we write a Lance file as our checkpoint instead of using a custom file format or PB. This way, we can even query the Lance file directly, which unlocks more possibilities. |
Beta Was this translation helpful? Give feedback.
-
|
@wjones127 curious what you think about this proposal. We've discussed this topic in other threads. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, @majin1102, could you please update the proposal once more to reflect the latest status? I think we have agreed to name this feature "version checkpoint" instead of "archive"? |
Beta Was this translation helpful? Give feedback.






Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Motivation
dataset.versions()), where N = number of versions, which would significantly impact memory and response time if the manifests are big enough.Goal:
versions()performance: Makedataset.versions()highly performant with consistent latencyThere's also a related API proposal #5885 and the motivation has been raised in that thread.
Overall design
Layout
{version:020}.lanceSpec
The checkpoint is stored as a Lance file containing an Arrow RecordBatch with the following schema:
Metadata (stored in Lance file metadata):
lance:checkpoint:dataset_created: Dataset creation timestamp (inherited, never changes)lance:checkpoint:created_at: Checkpoint creation timestampKey Design Decisions
{version:020}.lancenaming for easy lookupConfiguration
enabledmax_entriesmax_checkpoint_filesTrigger Flow
archive_versions() detailed flow:
Key Points:
Why Not a System Index?
We chose NOT to implement this as a system index for these core reasons:
Avoid polluting dataset version lineage: A system index would require creating a new manifest version each time checkpoint is updated. Since checkpoint happens during cleanup (not during normal writes), this would create unnecessary versions in the dataset's version history, polluting the timeline.
Self-contained files: Checkpoint files are standalone Lance files stored in
_checkpoint/directory. They don't need to be tied to any specific manifest version and can be managed independently.No index infrastructure dependencies: System indices require complex infrastructure (index metadata, cleanup, migration, etc.). Checkpoint files are simple, self-describing files that can be inspected/debugged without any special tools.
The key insight is: VersionCheckpoint is metadata about versions, not data to be indexed for queries. It doesn't need to be versioned alongside the data - it provides a historical view that transcends individual dataset versions.
Graceful Degradation
If the latest checkpoint file fails to load (corruption, partial write, etc.):
This ensures partial writes or corruption don't break the system - users simply get slightly older checkpoint data.
Storage Overhead
This is negligible compared to typical dataset sizes and provides significant query performance benefits.
Performance Overhead
Checkpoint runs as part of cleanup, which already scans all manifests. Since checkpoint can reuse that scan, the additional cost is minimal:
Implementation Details
Transaction Properties Size Limit
Transaction properties are stored as JSON strings. To prevent performance issues:
{"_truncated":true,"_size":<original_size>,"_data":<truncated_json>}transaction_propertieskey for debuggingCode Structure
Key methods:
load_or_new(): Load latest checkpoint or create empty oneadd_summaries(): Add new version summariesflush(): Write checkpoint to storagecleanup_old_checkpoints(): Prune old checkpoint filesBeta Was this translation helpful? Give feedback.
All reactions