Skip to content

[Phase 4.1] Add TiKV client and define distributed schema #40

@zhexuany

Description

@zhexuany

Summary

Add TiKV client integration and define the key-value schema for distributed coordination. This is the foundation for all distributed features.

Parent Epic

Dependencies

TiKV Schema Design

Key Patterns

Key Value Purpose
/jobs/{file_hash} JobRecord Job definition and status
/locks/{resource} LockRecord Distributed locks with TTL
/state/{file_hash} CheckpointState Frame-level checkpoint
/heartbeat/{pod_id} HeartbeatRecord Worker liveness
/system/scanner_lock LockRecord Scanner leadership

JobRecord Schema

struct JobRecord {
    id: String,                    // UUID
    source_key: String,            // S3/OSS object key
    source_bucket: String,
    source_size: u64,
    status: JobStatus,             // Pending | Processing | Completed | Failed | Dead
    owner: Option<String>,         // Pod ID when Processing
    attempts: u32,
    max_attempts: u32,
    created_at: DateTime<Utc>,
    updated_at: DateTime<Utc>,
    error: Option<String>,
    output_prefix: String,
    config_hash: String,
}

Tasks

4.1.1 Add Dependencies

  • Add tikv-client under feature flag distributed
  • Add feature to Cargo.toml

4.1.2 Create Module Structure

  • src/distributed/mod.rs
  • src/distributed/tikv/ submodule

4.1.3 Define Schema Types

  • JobRecord, JobStatus, LockRecord, HeartbeatRecord
  • Serde serialization support

4.1.4 Implement Key Builders

  • Consistent key prefix: /roboflow/v1/
  • Static methods for each key type

4.1.5 Implement TiKV Client Wrapper

  • Connection pooling
  • Basic CRUD: get, put, delete, scan
  • Batch operations: batch_get, batch_put

4.1.6 Implement Transactional Operations

  • CAS (Compare-And-Swap) for atomic updates
  • Multi-key transactions

4.1.7 Add Configuration

  • TikvConfig with pd_endpoints, timeouts
  • Environment variable support: TIKV_PD_ENDPOINTS

4.1.8 Error Handling

  • TikvError enum with retry logic
  • Map to RoboflowError

Acceptance Criteria

  • Feature flag distributed compiles
  • Schema types defined with serde
  • TikvClient connects to cluster
  • CRUD and batch operations work
  • Transactional CAS works
  • Configuration via env vars
  • Unit tests pass

Files to Create

  • src/distributed/mod.rs
  • src/distributed/tikv/mod.rs
  • src/distributed/tikv/client.rs
  • src/distributed/tikv/schema.rs
  • src/distributed/tikv/error.rs
  • src/distributed/tikv/config.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/distributedDistributed coordination and TiKVpriority/criticalMust be done first, blocks other worksize/MMedium: 3-5 daystype/featureNew feature or functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions