Skip to content

feat(rust): introduce snapshot api#5886

Draft
majin1102 wants to merge 1 commit intolance-format:mainfrom
majin1102:snapshot
Draft

feat(rust): introduce snapshot api#5886
majin1102 wants to merge 1 commit intolance-format:mainfrom
majin1102:snapshot

Conversation

@majin1102
Copy link
Contributor

Implement #5885

@majin1102 majin1102 marked this pull request as draft February 4, 2026 16:00
@github-actions github-actions bot added the enhancement New feature or request label Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Code Review: feat(rust): introduce snapshot api

Summary

This PR adds a new Snapshot and Snapshots API for accessing dataset version history with lazy loading and timestamp-based filtering capabilities.


P0 Issues (Must Fix)

1. Timestamp Semantics Issue - last_modified vs created_at

The last_modified field from object store metadata represents when the manifest file was last modified on disk, not when the version was created. For newly committed manifests, the code uses Utc::now() which is the commit time, but for existing manifests read from storage, it uses the file's last_modified time. These can differ if:

  • The file was copied/moved
  • The file system was restored from backup
  • Object store replication delays

This inconsistency could cause within_timestamp() and earlier_than_timestamp() to return unexpected results.

Recommendation: Consider using the manifest's internal timestamp() field (already available via Manifest::timestamp()) for reliable version timestamps, or clearly document this limitation in the API.


P1 Issues (Should Fix)

1. Inefficient Repeated Listing in Filtering Methods

Methods like within(), earlier_than(), and later_than() each call list().await? separately, which does a full manifest listing every time. If a user calls multiple filtering methods, they'll hit the object store multiple times unnecessarily.

// Each of these does a separate listing
let a = snapshots.within(1, 4).await?;
let b = snapshots.earlier_than(3, 2).await?;

Recommendation: Consider caching the manifest list within the Snapshots struct, or document that users should call list() once and filter locally for multiple queries.

2. later_than Semantics Mismatch with Documentation

The method later_than(version, limit) is documented as "Get snapshots later than the given version" but the implementation uses >= version:

.filter(|s| s.version_number() >= version)  // includes the version itself

This is inconsistent with earlier_than which correctly uses < version. The same issue exists in later_than_timestamp.


Minor Suggestions (Non-blocking)

  • The Snapshot and Snapshots structs could benefit from documentation comments explaining their purpose and usage patterns
  • Consider making ManifestLocation visibility consistent (it's now exposed through the public Snapshot::manifest_location field)

Testing

Good test coverage overall. Tests appropriately use in-memory storage and cover edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant