-
Notifications
You must be signed in to change notification settings - Fork 594
LeRobot v3 dataloader #12071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
LeRobot v3 dataloader #12071
Conversation
|
Web viewer built successfully.
View image diff on kitdiff. Note: This comment is updated whenever you push a commit. |
565efa4 to
424b394
Compare
e66b915 to
700430e
Compare
700430e to
0fa3f7b
Compare
0fa3f7b to
c7a0085
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for LeRobot v3 datasets to the Rerun dataloader. LeRobot v3 introduces changes to the dataset format, including the use of Parquet files for episode metadata and tasks (instead of JSONL files in v2), and feature-specific chunk/file indices for videos that allow multiple episodes to share video files more efficiently. The implementation follows the established patterns from the v2 dataloader.
Key Changes
- Implements v3 dataset loading with episode data caching for improved performance
- Adds video timestamp-based filtering to extract precise video segments per episode
- Includes feature-specific file metadata tracking for more flexible video/image organization
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/python/release_checklist/check_lerobot_v3_dataloader.py |
New test file for validating v3 dataset loading, consistent with v2 test structure |
crates/store/re_data_loader/src/loader_lerobot.rs |
Updated to support v3 datasets by removing the "unsupported" error and adding v3 loading function |
crates/store/re_data_loader/src/lerobot/mod.rs |
Updated module documentation to reflect v2 and v3 support |
crates/store/re_data_loader/src/lerobot/datasetv3.rs |
Complete v3 implementation with episode caching, video filtering, and Parquet-based metadata loading |
crates/store/re_data_loader/Cargo.toml |
Adds re_video dependency required for video processing |
Cargo.lock |
Updated lock file with new dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .sample_data_in_stream_format(&chunk) | ||
| .with_context(|| { | ||
| format!( | ||
| "Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format" | ||
| ) | ||
| })?; |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent indentation: the method call chain should be properly indented. The .sample_data_in_stream_format(&chunk) call on line 505 should align with the start of the chain, and video on line 504 should be indented to match standard Rust formatting conventions.
| .sample_data_in_stream_format(&chunk) | |
| .with_context(|| { | |
| format!( | |
| "Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format" | |
| ) | |
| })?; | |
| .sample_data_in_stream_format(&chunk) | |
| .with_context(|| { | |
| format!( | |
| "Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format" | |
| ) | |
| })?; |
MichaelGrupp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use pathlib instead of os.path
| import os | ||
| from argparse import Namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import os | |
| from argparse import Namespace | |
| from argparse import Namespace | |
| from pathlib import Path |
| # That means the `recording_id` needs to be set to "episode_0", otherwise the LeRobot dataloader | ||
| # will create a new recording for episode 0, instead of merging it into the existing recording. | ||
| # If you don't set it, you'll end up with 4 recordings, an empty one and the 3 episodes. | ||
| rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This f string interpolation looks unnecessary. Nicer modern way with pathlib is also:
| rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0") | |
| rec = rr.script_setup(args, Path(__file__).name, recording_id="episode_0") |
| rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0") | ||
|
|
||
| # load dataset from huggingface | ||
| dataset_path = os.path.dirname(__file__) + "/.datasets/v30_apple_storage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dataset_path = os.path.dirname(__file__) + "/.datasets/v30_apple_storage" | |
| dataset_path = Path(__file__).parent / ".datasets/v30_apple_storage" |
| # NOTE: This dataloader works by creating a new recording for each episode. | ||
| # So that means we need to log the README to each recording. | ||
| for i in range(3): | ||
| rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id=f"episode_{i}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id=f"episode_{i}") | |
| rec = rr.script_setup(args, Path(__file__).name, recording_id=f"episode_{i}") |
c787f09 to
28e096b
Compare
Related
Edit
New issue
What
This includes support for LeRobot datasets v3.0