-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
area/distributedDistributed coordination and TiKVDistributed coordination and TiKVpriority/criticalMust be done first, blocks other workMust be done first, blocks other worksize/MMedium: 3-5 daysMedium: 3-5 daystype/featureNew feature or functionalityNew feature or functionality
Description
Summary
Add TiKV client integration and define the key-value schema for distributed coordination. This is the foundation for all distributed features.
Parent Epic
- [Epic] Distributed Roboflow with Alibaba Cloud (OSS + ACK) #9 Distributed Roboflow with TiKV Coordination
Dependencies
- None (foundation issue)
- Enables: [Phase 1.1] Add core dependencies for storage abstraction (#10) #29, [Phase 1.2] Define Storage trait and LocalStorage implementation (#11) #30, chore: update .gitignore #31, [Phase 2.1] Implement OSS/S3 backend using object_store #32, [Phase 5] Frame-level checkpoint with TiKV and multipart resume #19
TiKV Schema Design
Key Patterns
| Key | Value | Purpose |
|---|---|---|
/jobs/{file_hash} |
JobRecord | Job definition and status |
/locks/{resource} |
LockRecord | Distributed locks with TTL |
/state/{file_hash} |
CheckpointState | Frame-level checkpoint |
/heartbeat/{pod_id} |
HeartbeatRecord | Worker liveness |
/system/scanner_lock |
LockRecord | Scanner leadership |
JobRecord Schema
struct JobRecord {
id: String, // UUID
source_key: String, // S3/OSS object key
source_bucket: String,
source_size: u64,
status: JobStatus, // Pending | Processing | Completed | Failed | Dead
owner: Option<String>, // Pod ID when Processing
attempts: u32,
max_attempts: u32,
created_at: DateTime<Utc>,
updated_at: DateTime<Utc>,
error: Option<String>,
output_prefix: String,
config_hash: String,
}Tasks
4.1.1 Add Dependencies
- Add
tikv-clientunder feature flagdistributed - Add feature to Cargo.toml
4.1.2 Create Module Structure
src/distributed/mod.rssrc/distributed/tikv/submodule
4.1.3 Define Schema Types
- JobRecord, JobStatus, LockRecord, HeartbeatRecord
- Serde serialization support
4.1.4 Implement Key Builders
- Consistent key prefix:
/roboflow/v1/ - Static methods for each key type
4.1.5 Implement TiKV Client Wrapper
- Connection pooling
- Basic CRUD: get, put, delete, scan
- Batch operations: batch_get, batch_put
4.1.6 Implement Transactional Operations
- CAS (Compare-And-Swap) for atomic updates
- Multi-key transactions
4.1.7 Add Configuration
- TikvConfig with pd_endpoints, timeouts
- Environment variable support: TIKV_PD_ENDPOINTS
4.1.8 Error Handling
- TikvError enum with retry logic
- Map to RoboflowError
Acceptance Criteria
- Feature flag
distributedcompiles - Schema types defined with serde
- TikvClient connects to cluster
- CRUD and batch operations work
- Transactional CAS works
- Configuration via env vars
- Unit tests pass
Files to Create
src/distributed/mod.rssrc/distributed/tikv/mod.rssrc/distributed/tikv/client.rssrc/distributed/tikv/schema.rssrc/distributed/tikv/error.rssrc/distributed/tikv/config.rs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/distributedDistributed coordination and TiKVDistributed coordination and TiKVpriority/criticalMust be done first, blocks other workMust be done first, blocks other worksize/MMedium: 3-5 daysMedium: 3-5 daystype/featureNew feature or functionalityNew feature or functionality