-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
area/distributedDistributed coordination and TiKVDistributed coordination and TiKVarea/pipelinePipeline processingPipeline processingpriority/highHigh priorityHigh prioritysize/LLarge: 1-2 weeksLarge: 1-2 weekstype/featureNew feature or functionalityNew feature or functionality
Description
Summary
Integrate the existing pipeline with checkpoint hooks for distributed mode. Add distributed context and checkpoint triggers at stage boundaries.
Status: PARTIALLY COMPLETE β π§
β Completed (via #72, #73)
- DistributedPipelineContext equivalent -
Worker.process_job()now integrates with TiKV - Source stage hook -
ProgressCallbackwith byte offset tracking (viamessages_processed) - Parser stage hook - Frame index tracking via frame counting
- Checkpoint trigger - Configurable by frame count and time interval
- Resume from checkpoint - Worker checks for existing checkpoints on job start
π Remaining Enhancements (Optional)
These are optional refinements to the current implementation:
-
Processor stage hook (video encoder state)
- Current: Checkpoints track frame progress
- Enhancement: Track video encoder state for GPU encoder resume
- Priority: Low - only needed if using GPU encoders ([Phase 8] Add NVENC GPU video encoding [PARALLEL β³]Β #49)
-
Uploader stage hook (detailed multipart state)
- Current: Upload coordinator tracks completion at episode level
- Enhancement: Track in-progress multipart upload IDs for part-by-part resume
- Priority: Medium - useful for large video uploads
- Note: See TODO in
WorkerCheckpointCallbackabout episode-level tracking
Recommendation
This issue can be closed with the understanding that:
- Frame-level checkpointing works ([Phase 1] Add checkpoint save during pipeline processingΒ #73)
- Resume from checkpoint works
- Upload state tracking is simplified (completion-based, not multipart-detailed)
If detailed multipart upload resume is needed, create a new issue specifically for that feature.
Original Issue Content Below
Integrate the existing pipeline with checkpoint hooks for distributed mode. Add distributed context and checkpoint triggers at stage boundaries.
Pipeline Context
Add DistributedPipelineContext to carry:
- TiKV client reference
- Job ID and file hash
- Pod ID
- Checkpoint manager
- Initial checkpoint state (for resume)
Checkpoint Hooks
Inject at stage boundaries:
Source β [checkpoint: byte_offset] β Parser
Parser β [checkpoint: frame_idx] β Processor
Processor β [checkpoint: encoder_state] β Uploader
Uploader β [checkpoint: parts[]] β Complete
Acceptance Criteria
- DistributedPipelineContext defined
- Pipeline wrapper with hooks
- Source stage tracks byte offset
- Parser stage tracks frame index
- Processor stage tracks encoder state (optional)
- Uploader stage tracks parts (optional)
- Checkpoint trigger at intervals
- Resume works from any checkpoint
- Completion handler works
- No data loss or duplication
- Integration test: interrupt and resume (basic test passes)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/distributedDistributed coordination and TiKVDistributed coordination and TiKVarea/pipelinePipeline processingPipeline processingpriority/highHigh priorityHigh prioritysize/LLarge: 1-2 weeksLarge: 1-2 weekstype/featureNew feature or functionalityNew feature or functionality