-
Notifications
You must be signed in to change notification settings - Fork 0
[Phase 1.2] Define Storage trait and LocalStorage implementation (#11) #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add the core storage abstraction layer: - Define `Storage` trait with synchronous API for all backends - Define `StorageError` enum covering all failure modes - Define `ObjectMetadata` struct for object information - Define `SeekableStorage` extension trait for backends supporting seek - Implement `LocalStorage` as reference implementation - Add stub `OssStorage` (to be implemented in #13) - All storage types feature-gated under `cloud-storage` Testing: - 235 library tests pass with cloud-storage feature - LocalStorage fully tested with unit tests - All trait methods have documentation Acceptance Criteria (from #11): - [x] Create src/storage/mod.rs module file - [x] Define StorageError enum covering all failure modes - [x] Define ObjectMetadata struct with size, last_modified, content_type - [x] Define synchronous Storage trait with methods - [x] Define SeekableStorage extension trait - [x] Add module to src/lib.rs exports - [x] Write documentation for all public types Part of #9 - Distributed Roboflow with Alibaba Cloud (OSS + ACK) Depends on #10 (cloud-storage dependencies)
Add core dependencies for storage abstraction layer: - object_store: S3/OSS client library with AWS features - tokio: Async runtime for blocking cloud operations - url: URL parsing for storage schemes (oss://, s3://) - bytes: Efficient byte handling for I/O operations All dependencies are optional and gated under the new `cloud-storage` feature flag to avoid pulling in transitive dependencies for users who don't need cloud storage support. Part of #9 - Distributed Roboflow with Alibaba Cloud
Greptile OverviewGreptile SummaryThis PR establishes a foundational storage abstraction layer for cloud storage support, with a complete LocalStorage implementation and OSS stub. Key Changes:
Issues Found:
The implementation is well-structured with clear separation of concerns, comprehensive error handling, and thorough documentation. LocalStorage is production-ready with extensive test coverage. Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Storage as Storage Trait
participant LocalStorage
participant FileSystem as File System
Note over Client,FileSystem: Write Operation
Client->>Storage: writer("data/file.txt")
Storage->>LocalStorage: writer("data/file.txt")
LocalStorage->>LocalStorage: full_path() → /root/data/file.txt
LocalStorage->>LocalStorage: ensure_parent() → create /root/data/
LocalStorage->>FileSystem: File::create()
FileSystem-->>LocalStorage: File handle
LocalStorage->>LocalStorage: wrap in BufWriter
LocalStorage-->>Client: Box<dyn Write>
Client->>Client: write_all(data)
Client->>Client: flush()
Note over Client,FileSystem: Read Operation
Client->>Storage: reader("data/file.txt")
Storage->>LocalStorage: reader("data/file.txt")
LocalStorage->>LocalStorage: full_path() → /root/data/file.txt
LocalStorage->>FileSystem: File::open()
alt File exists
FileSystem-->>LocalStorage: File handle
LocalStorage->>LocalStorage: wrap in BufReader
LocalStorage-->>Client: Box<dyn Read>
Client->>Client: read_to_end()
else File not found
FileSystem-->>LocalStorage: NotFound error
LocalStorage-->>Client: StorageError::NotFound
end
Note over Client,FileSystem: Seekable Read Operation
Client->>Storage: seekable_reader("data/file.txt")
Storage->>LocalStorage: seekable_reader("data/file.txt")
LocalStorage->>FileSystem: File::open()
FileSystem-->>LocalStorage: File handle
LocalStorage->>LocalStorage: wrap in BufReader (supports Seek)
LocalStorage-->>Client: Box<dyn SeekRead>
Client->>Client: seek(SeekFrom::Start(100))
Client->>Client: read()
Note over Client,FileSystem: Metadata Operations
Client->>Storage: metadata("data/file.txt")
Storage->>LocalStorage: metadata("data/file.txt")
LocalStorage->>FileSystem: fs::metadata()
FileSystem-->>LocalStorage: Metadata
LocalStorage->>LocalStorage: build ObjectMetadata
LocalStorage-->>Client: ObjectMetadata{size, last_modified, ...}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 2 comments
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: Cargo.toml
Line: 94:118
Comment:
The `cloud-storage` feature is referenced throughout the code but is not defined in `Cargo.toml`. This will cause compilation failures when users try to use the feature.
```suggestion
[features]
dataset-hdf5 = ["dep:hdf5"]
dataset-parquet = ["dep:polars"]
dataset-depth = ["dep:png"]
dataset-all = ["dataset-hdf5", "dataset-parquet", "dataset-depth"]
# Python bindings via PyO3 (extension-module mode to prevent libpython linking)
python = ["pyo3", "robocodec/python"]
# Python extension module (alias for python, since extension-module is now default)
extension-module = ["python"]
# GPU compression (experimental)
# Enables GPU-accelerated compression via nvCOMP
# Requires: NVIDIA GPU, CUDA toolkit, nvCOMP library (Linux)
gpu = []
# Use jemalloc as global allocator on Linux (better for concurrent workloads)
# On macOS, the default allocator is already excellent and jemalloc is not used
jemalloc = ["dep:tikv-jemallocator", "robocodec/jemalloc"]
# CLI support for binaries (profiler, etc.)
cli = ["dep:clap"]
# Profiling support for profiler binary (flamegraph generation)
profiling = ["dep:pprof", "cli"]
# CPU-aware WindowLog detection using CPUID (x86_64 only)
cpuid = ["dep:raw-cpuid"]
# io_uring support for Linux (high-performance async I/O)
# Requires: Linux 5.6+ kernel
io-uring-io = ["dep:io-uring"]
# Cloud storage support (Alibaba OSS, Amazon S3)
cloud-storage = []
```
How can I resolve this? If you propose a fix, please make it concise. |
This completes Phase 1 of the storage abstraction: ## Changes - Add URL parsing (src/storage/url.rs): - StorageUrl enum with Local, S3, Oss variants - Support for file://, s3://, oss:// schemes - Query parameter parsing for endpoint override - Comprehensive tests - Add StorageFactory (src/storage/factory.rs): - StorageConfig with credential management - Environment variable loading (OSS/AWS keys) - Automatic backend selection from URL - Credential chain: env vars → config - Make Storage trait dyn-compatible: - Changed all impl AsRef<Path> to &Path - Now supports Arc<dyn Storage> trait objects - Update LocalStorage for dyn-compatibility - Update OssStorage stub for dyn-compatibility - Export new types in lib.rs ## Testing - 261 library tests pass with cloud-storage feature - All URL parsing variants tested - StorageFactory credential handling tested Part of #9 - Distributed Roboflow with Alibaba Cloud (OSS + ACK) Depends on #11 (Storage trait), #10 (dependencies)
Summary
Define the core storage abstraction layer with URL parsing and factory pattern.
Changes
Storage Trait and Error Types (#11)
Storagetrait with synchronous API for all backendsStorageErrorenum covering all failure modesObjectMetadatastruct for object informationSeekableStorageextension trait for seekable backends&Pathinstead ofimpl AsRef<Path>)LocalStorage Backend (#23)
LocalStorageimplementingStorageandSeekableStorageURL Parsing (#24)
StorageUrlenum with Local, S3, Oss variantsfile://,s3://,oss://schemesStorageFactory (#25)
StorageConfigwith credential managementStorageFactoryfor automatic backend selection from URLOssStorage Stub
OssStoragestruct with credentials configurationTesting
Closes #11, #23, #24, #25
Part of #9 - Distributed Roboflow with Alibaba Cloud (OSS + ACK)