fix(dataset): reload episodes metadata before batch video encoding #2379

daikw · 2025-11-04T08:14:38Z

Summary

Fixes an IndexError that occurs when using --resume with --dataset.video_encoding_batch_size > 1 during dataset recording.

Problem

When resuming recording with batch encoding enabled, the in-memory self.meta.episodes dataset was not being updated with newly recorded episodes. This caused an IndexError when trying to access episode metadata during batch encoding:

IndexError: Invalid key: 2 is out of bounds for size 2

The issue occurred because:

New episodes were saved to parquet files
self.meta.total_episodes was updated
But self.meta.episodes (HF Dataset) remained stale
Batch encoding tried to access episodes beyond the original size

Solution

Reload the episodes metadata at the start of _batch_save_episode_video() to ensure all episode data is available before accessing episode indices.

This follows the same pattern already used in line 1199 where episodes are reloaded when switching to a new chunk/file.

Test Plan

The fix should be tested by:

Recording a dataset with batch encoding disabled or with small batch size
Resuming recording with --resume=true and larger --dataset.video_encoding_batch_size
Verifying that batch encoding completes without IndexError

Example command that previously failed:

uv run lerobot-record \
  --robot.type=so101_follower \
  --robot.port=/dev/ttyACM1 \
  --teleop.type=so101_leader \
  --teleop.port=/dev/ttyACM0 \
  --dataset.repo_id=test/dataset \
  --dataset.video_encoding_batch_size=10 \
  --resume=true

🤖 Generated with Claude Code

When using --resume with batch encoding enabled, the in-memory self.meta.episodes dataset was not being updated with newly recorded episodes. This caused an IndexError when trying to access episode metadata during batch encoding. The issue occurred because: 1. New episodes were saved to parquet files 2. self.meta.total_episodes was updated 3. But self.meta.episodes (HF Dataset) remained stale 4. Batch encoding tried to access episodes beyond the original size This fix reloads the episodes metadata at the start of _batch_save_episode_video() to ensure all episode data is available. Fixes IndexError: Invalid key: X is out of bounds for size Y 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull Request Overview

This PR fixes an IndexError that occurs when resuming dataset recording with batch video encoding enabled. The issue was caused by stale episode metadata not being synchronized with newly recorded episodes when using the --resume flag.

Adds a metadata reload operation before batch video encoding
Ensures self.meta.episodes contains all available episodes before accessing them by index
Follows existing pattern used elsewhere in the codebase for metadata synchronization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…h encoding The previous fix to reload episodes before batch encoding was incomplete. When batch encoding is triggered, the metadata buffer may not have been flushed to disk yet, causing load_episodes() to fail with a NoneType error. This fix ensures that: 1. Metadata buffer is flushed to disk before attempting to reload 2. Episodes are reloaded to get the latest metadata 3. Batch encoding can proceed with complete episode information Fixes: TypeError: 'NoneType' object is not subscriptable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…encoding The ParquetWriter buffers data and only writes complete files when closed. Simply flushing the buffer is not sufficient - the file remains incomplete and cannot be read by PyArrow, resulting in: "Parquet magic bytes not found in footer" This fix: 1. Calls _close_writer() instead of _flush_metadata_buffer() 2. Ensures the ParquetWriter is properly closed and data is fully written 3. A new writer will be created on the next metadata write operation Fixes: pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

When using batch encoding, temporary images must be preserved until batch encoding completes. The previous code deleted images immediately after each episode, causing FileNotFoundError when batch encoding tried to access them. This fix: 1. Skip image deletion in save_episode() when using batch encoding 2. Delete images after each episode's video is encoded in batch mode 3. Ensures images are available for batch encoding while cleaning up afterward Fixes: FileNotFoundError: No images found in .../episode-000000 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The resume logic in _save_episode_video() was checking only for the existence of episodes, not whether those episodes had video metadata. In batch encoding scenarios: 1. Episodes 0-9 are recorded with metadata (no video metadata yet) 2. Batch encoding starts and reloads episodes 3. _save_episode_video(video_key, 0) is called 4. episode_index == 0, so it enters the first-episode branch 5. self.meta.episodes exists and has length > 0 6. Code tries to access videos/{video_key}/chunk_index 7. KeyError: this key doesn't exist yet (videos not encoded) This fix adds a check to verify that video metadata actually exists before treating it as a resume case. This prevents KeyError when batch encoding a new dataset or episodes without prior video metadata. Fixes: KeyError: 'videos/observation.images.overhead/chunk_index' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Two critical fixes for batch encoding reliability: 1. **Defer image cleanup to after successful batch encoding** Problem: Images were deleted inside _batch_save_episode_video() loop, but if an exception occurred after encoding (e.g., during parquet save), the retry in VideoEncodingManager.__exit__ would fail with FileNotFoundError. Solution: Move image cleanup to save_episode() and VideoEncodingManager.__exit__, ensuring cleanup happens only after the entire batch encoding succeeds. This allows retries to access the images if needed. 2. **Add null check for video metadata values** Problem: Checking only for key existence wasn't sufficient - the key can exist in the parquet schema but have NULL values, causing: "TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'" Solution: Add explicit check that video metadata values are not None before treating as resume case. Fixes: - FileNotFoundError: No images found during batch encoding retry - TypeError in update_chunk_file_indices with NoneType 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings November 4, 2025 08:14

Copilot AI reviewed Nov 4, 2025

View reviewed changes

daikw marked this pull request as draft November 4, 2025 08:33

daikw and others added 4 commits November 4, 2025 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(dataset): reload episodes metadata before batch video encoding #2379

fix(dataset): reload episodes metadata before batch video encoding #2379

Uh oh!

daikw commented Nov 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(dataset): reload episodes metadata before batch video encoding #2379

Are you sure you want to change the base?

fix(dataset): reload episodes metadata before batch video encoding #2379

Uh oh!

Conversation

daikw commented Nov 4, 2025

Summary

Problem

Solution

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant