Skip to content

Conversation

@zhexuany
Copy link
Contributor

@zhexuany zhexuany commented Jan 30, 2026

Summary

Define the core storage abstraction layer with URL parsing and factory pattern.

Changes

Storage Trait and Error Types (#11)

  • Storage trait with synchronous API for all backends
  • StorageError enum covering all failure modes
  • ObjectMetadata struct for object information
  • SeekableStorage extension trait for seekable backends
  • Dyn-compatible design (uses &Path instead of impl AsRef<Path>)

LocalStorage Backend (#23)

  • LocalStorage implementing Storage and SeekableStorage
  • Comprehensive unit tests (11 tests)
  • Buffered I/O with BufReader/BufWriter

URL Parsing (#24)

  • StorageUrl enum with Local, S3, Oss variants
  • Support for file://, s3://, oss:// schemes
  • Query parameter parsing for endpoint override
  • FromStr implementation for all URL types

StorageFactory (#25)

  • StorageConfig with credential management
  • Environment variable loading (OSS_ACCESS_KEY_ID, etc.)
  • StorageFactory for automatic backend selection from URL
  • AWS credential fallback support

OssStorage Stub

Testing

  • 261 library tests pass with cloud-storage feature
  • All URL parsing variants tested
  • StorageFactory credential handling tested

Closes #11, #23, #24, #25

Part of #9 - Distributed Roboflow with Alibaba Cloud (OSS + ACK)

Add the core storage abstraction layer:

- Define `Storage` trait with synchronous API for all backends
- Define `StorageError` enum covering all failure modes
- Define `ObjectMetadata` struct for object information
- Define `SeekableStorage` extension trait for backends supporting seek
- Implement `LocalStorage` as reference implementation
- Add stub `OssStorage` (to be implemented in #13)
- All storage types feature-gated under `cloud-storage`

Testing:
- 235 library tests pass with cloud-storage feature
- LocalStorage fully tested with unit tests
- All trait methods have documentation

Acceptance Criteria (from #11):
- [x] Create src/storage/mod.rs module file
- [x] Define StorageError enum covering all failure modes
- [x] Define ObjectMetadata struct with size, last_modified, content_type
- [x] Define synchronous Storage trait with methods
- [x] Define SeekableStorage extension trait
- [x] Add module to src/lib.rs exports
- [x] Write documentation for all public types

Part of #9 - Distributed Roboflow with Alibaba Cloud (OSS + ACK)
Depends on #10 (cloud-storage dependencies)
Add core dependencies for storage abstraction layer:

- object_store: S3/OSS client library with AWS features
- tokio: Async runtime for blocking cloud operations
- url: URL parsing for storage schemes (oss://, s3://)
- bytes: Efficient byte handling for I/O operations

All dependencies are optional and gated under the new `cloud-storage`
feature flag to avoid pulling in transitive dependencies for users
who don't need cloud storage support.

Part of #9 - Distributed Roboflow with Alibaba Cloud
@greptile-apps
Copy link

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

This PR establishes a foundational storage abstraction layer for cloud storage support, with a complete LocalStorage implementation and OSS stub.

Key Changes:

  • Defined Storage trait with synchronous API covering all essential operations (read, write, exists, metadata, list, delete, copy, create_dir)
  • Added SeekableStorage extension trait for backends supporting seek operations
  • Implemented comprehensive LocalStorage backend with 11 unit tests covering all operations
  • Created OssStorage stub with proper structure for future implementation
  • Added feature-gated exports in src/lib.rs

Issues Found:

  • Missing cloud-storage feature definition in Cargo.toml - this will cause compilation failures when users try to enable the feature
  • Minor code style issue with error handling in LocalStorage::writer

The implementation is well-structured with clear separation of concerns, comprehensive error handling, and thorough documentation. LocalStorage is production-ready with extensive test coverage.

Confidence Score: 4/5

  • Safe to merge after fixing the missing cloud-storage feature flag in Cargo.toml
  • The implementation is solid with comprehensive tests and documentation. The main issue is the missing feature flag in Cargo.toml which will cause compilation failures. LocalStorage is production-ready, and the trait design is well-thought-out.
  • Cargo.toml requires the cloud-storage feature definition before merge

Important Files Changed

Filename Overview
src/lib.rs Added storage module exports behind cloud-storage feature flag
src/storage/mod.rs Well-designed storage traits with comprehensive error handling and documentation
src/storage/local.rs Complete LocalStorage implementation with thorough unit tests (11 tests)
src/storage/oss.rs Stub implementation with proper structure for future OSS integration
Cargo.toml Missing cloud-storage feature flag definition (critical for compilation)

Sequence Diagram

sequenceDiagram
    participant Client
    participant Storage as Storage Trait
    participant LocalStorage
    participant FileSystem as File System
    
    Note over Client,FileSystem: Write Operation
    Client->>Storage: writer("data/file.txt")
    Storage->>LocalStorage: writer("data/file.txt")
    LocalStorage->>LocalStorage: full_path() → /root/data/file.txt
    LocalStorage->>LocalStorage: ensure_parent() → create /root/data/
    LocalStorage->>FileSystem: File::create()
    FileSystem-->>LocalStorage: File handle
    LocalStorage->>LocalStorage: wrap in BufWriter
    LocalStorage-->>Client: Box<dyn Write>
    Client->>Client: write_all(data)
    Client->>Client: flush()
    
    Note over Client,FileSystem: Read Operation
    Client->>Storage: reader("data/file.txt")
    Storage->>LocalStorage: reader("data/file.txt")
    LocalStorage->>LocalStorage: full_path() → /root/data/file.txt
    LocalStorage->>FileSystem: File::open()
    alt File exists
        FileSystem-->>LocalStorage: File handle
        LocalStorage->>LocalStorage: wrap in BufReader
        LocalStorage-->>Client: Box<dyn Read>
        Client->>Client: read_to_end()
    else File not found
        FileSystem-->>LocalStorage: NotFound error
        LocalStorage-->>Client: StorageError::NotFound
    end
    
    Note over Client,FileSystem: Seekable Read Operation
    Client->>Storage: seekable_reader("data/file.txt")
    Storage->>LocalStorage: seekable_reader("data/file.txt")
    LocalStorage->>FileSystem: File::open()
    FileSystem-->>LocalStorage: File handle
    LocalStorage->>LocalStorage: wrap in BufReader (supports Seek)
    LocalStorage-->>Client: Box<dyn SeekRead>
    Client->>Client: seek(SeekFrom::Start(100))
    Client->>Client: read()
    
    Note over Client,FileSystem: Metadata Operations
    Client->>Storage: metadata("data/file.txt")
    Storage->>LocalStorage: metadata("data/file.txt")
    LocalStorage->>FileSystem: fs::metadata()
    FileSystem-->>LocalStorage: Metadata
    LocalStorage->>LocalStorage: build ObjectMetadata
    LocalStorage-->>Client: ObjectMetadata{size, last_modified, ...}

Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Jan 30, 2026

Additional Comments (1)

Cargo.toml
The cloud-storage feature is referenced throughout the code but is not defined in Cargo.toml. This will cause compilation failures when users try to use the feature.

[features]
dataset-hdf5 = ["dep:hdf5"]
dataset-parquet = ["dep:polars"]
dataset-depth = ["dep:png"]
dataset-all = ["dataset-hdf5", "dataset-parquet", "dataset-depth"]
# Python bindings via PyO3 (extension-module mode to prevent libpython linking)
python = ["pyo3", "robocodec/python"]
# Python extension module (alias for python, since extension-module is now default)
extension-module = ["python"]
# GPU compression (experimental)
# Enables GPU-accelerated compression via nvCOMP
# Requires: NVIDIA GPU, CUDA toolkit, nvCOMP library (Linux)
gpu = []
# Use jemalloc as global allocator on Linux (better for concurrent workloads)
# On macOS, the default allocator is already excellent and jemalloc is not used
jemalloc = ["dep:tikv-jemallocator", "robocodec/jemalloc"]
# CLI support for binaries (profiler, etc.)
cli = ["dep:clap"]
# Profiling support for profiler binary (flamegraph generation)
profiling = ["dep:pprof", "cli"]
# CPU-aware WindowLog detection using CPUID (x86_64 only)
cpuid = ["dep:raw-cpuid"]
# io_uring support for Linux (high-performance async I/O)
# Requires: Linux 5.6+ kernel
io-uring-io = ["dep:io-uring"]
# Cloud storage support (Alibaba OSS, Amazon S3)
cloud-storage = []
Prompt To Fix With AI
This is a comment left during a code review.
Path: Cargo.toml
Line: 94:118

Comment:
The `cloud-storage` feature is referenced throughout the code but is not defined in `Cargo.toml`. This will cause compilation failures when users try to use the feature.

```suggestion
[features]
dataset-hdf5 = ["dep:hdf5"]
dataset-parquet = ["dep:polars"]
dataset-depth = ["dep:png"]
dataset-all = ["dataset-hdf5", "dataset-parquet", "dataset-depth"]
# Python bindings via PyO3 (extension-module mode to prevent libpython linking)
python = ["pyo3", "robocodec/python"]
# Python extension module (alias for python, since extension-module is now default)
extension-module = ["python"]
# GPU compression (experimental)
# Enables GPU-accelerated compression via nvCOMP
# Requires: NVIDIA GPU, CUDA toolkit, nvCOMP library (Linux)
gpu = []
# Use jemalloc as global allocator on Linux (better for concurrent workloads)
# On macOS, the default allocator is already excellent and jemalloc is not used
jemalloc = ["dep:tikv-jemallocator", "robocodec/jemalloc"]
# CLI support for binaries (profiler, etc.)
cli = ["dep:clap"]
# Profiling support for profiler binary (flamegraph generation)
profiling = ["dep:pprof", "cli"]
# CPU-aware WindowLog detection using CPUID (x86_64 only)
cpuid = ["dep:raw-cpuid"]
# io_uring support for Linux (high-performance async I/O)
# Requires: Linux 5.6+ kernel
io-uring-io = ["dep:io-uring"]
# Cloud storage support (Alibaba OSS, Amazon S3)
cloud-storage = []
```

How can I resolve this? If you propose a fix, please make it concise.

This completes Phase 1 of the storage abstraction:

## Changes

- Add URL parsing (src/storage/url.rs):
  - StorageUrl enum with Local, S3, Oss variants
  - Support for file://, s3://, oss:// schemes
  - Query parameter parsing for endpoint override
  - Comprehensive tests

- Add StorageFactory (src/storage/factory.rs):
  - StorageConfig with credential management
  - Environment variable loading (OSS/AWS keys)
  - Automatic backend selection from URL
  - Credential chain: env vars → config

- Make Storage trait dyn-compatible:
  - Changed all impl AsRef<Path> to &Path
  - Now supports Arc<dyn Storage> trait objects

- Update LocalStorage for dyn-compatibility
- Update OssStorage stub for dyn-compatibility
- Export new types in lib.rs

## Testing

- 261 library tests pass with cloud-storage feature
- All URL parsing variants tested
- StorageFactory credential handling tested

Part of #9 - Distributed Roboflow with Alibaba Cloud (OSS + ACK)
Depends on #11 (Storage trait), #10 (dependencies)
@zhexuany zhexuany changed the title [Phase 1.2] Define Storage trait and LocalStorage implementation (#11) feat: add storage abstraction layer with URL parsing Jan 30, 2026
@zhexuany zhexuany merged commit 8ff7ec4 into main Jan 30, 2026
16 checks passed
@zhexuany zhexuany deleted the feature/issue-11 branch January 30, 2026 20:46
@zhexuany zhexuany changed the title feat: add storage abstraction layer with URL parsing [Phase 1.2] Define Storage trait and LocalStorage implementation (#11) Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 1.2] Define Storage trait and error types

1 participant