Skip to content

Conversation

@oxkitsune
Copy link
Member

@oxkitsune oxkitsune commented Dec 3, 2025

Related

Edit
New issue

What

This includes support for LeRobot datasets v3.0

@oxkitsune oxkitsune added 📺 re_viewer affects re_viewer itself include in changelog labels Dec 3, 2025
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Web viewer built successfully.

Result Commit Link Manifest
17ec891 https://rerun.io/viewer/pr/12071 +nightly +main

View image diff on kitdiff.

Note: This comment is updated whenever you push a commit.

@oxkitsune oxkitsune mentioned this pull request Dec 3, 2025
3 tasks
@oxkitsune oxkitsune added do-not-merge Do not merge this PR feat-dataloader Everything related to data loaders labels Dec 3, 2025
@oxkitsune oxkitsune force-pushed the gijs/lerobot-datasetv2-refactor branch from 565efa4 to 424b394 Compare December 4, 2025 15:22
@oxkitsune oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from e66b915 to 700430e Compare December 4, 2025 15:57
Base automatically changed from gijs/lerobot-datasetv2-refactor to main December 4, 2025 16:55
@oxkitsune oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from 700430e to 0fa3f7b Compare December 4, 2025 17:04
@oxkitsune oxkitsune removed the do-not-merge Do not merge this PR label Dec 5, 2025
@oxkitsune oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from 0fa3f7b to c7a0085 Compare December 5, 2025 11:15
@oxkitsune oxkitsune requested a review from Copilot December 8, 2025 13:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for LeRobot v3 datasets to the Rerun dataloader. LeRobot v3 introduces changes to the dataset format, including the use of Parquet files for episode metadata and tasks (instead of JSONL files in v2), and feature-specific chunk/file indices for videos that allow multiple episodes to share video files more efficiently. The implementation follows the established patterns from the v2 dataloader.

Key Changes

  • Implements v3 dataset loading with episode data caching for improved performance
  • Adds video timestamp-based filtering to extract precise video segments per episode
  • Includes feature-specific file metadata tracking for more flexible video/image organization

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/python/release_checklist/check_lerobot_v3_dataloader.py New test file for validating v3 dataset loading, consistent with v2 test structure
crates/store/re_data_loader/src/loader_lerobot.rs Updated to support v3 datasets by removing the "unsupported" error and adding v3 loading function
crates/store/re_data_loader/src/lerobot/mod.rs Updated module documentation to reflect v2 and v3 support
crates/store/re_data_loader/src/lerobot/datasetv3.rs Complete v3 implementation with episode caching, video filtering, and Parquet-based metadata loading
crates/store/re_data_loader/Cargo.toml Adds re_video dependency required for video processing
Cargo.lock Updated lock file with new dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +505 to +510
.sample_data_in_stream_format(&chunk)
.with_context(|| {
format!(
"Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format"
)
})?;
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: the method call chain should be properly indented. The .sample_data_in_stream_format(&chunk) call on line 505 should align with the start of the chain, and video on line 504 should be indented to match standard Rust formatting conventions.

Suggested change
.sample_data_in_stream_format(&chunk)
.with_context(|| {
format!(
"Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format"
)
})?;
.sample_data_in_stream_format(&chunk)
.with_context(|| {
format!(
"Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format"
)
})?;

Copilot uses AI. Check for mistakes.
Copy link
Member

@MichaelGrupp MichaelGrupp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use pathlib instead of os.path

Comment on lines 3 to 4
import os
from argparse import Namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import os
from argparse import Namespace
from argparse import Namespace
from pathlib import Path

# That means the `recording_id` needs to be set to "episode_0", otherwise the LeRobot dataloader
# will create a new recording for episode 0, instead of merging it into the existing recording.
# If you don't set it, you'll end up with 4 recordings, an empty one and the 3 episodes.
rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This f string interpolation looks unnecessary. Nicer modern way with pathlib is also:

Suggested change
rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0")
rec = rr.script_setup(args, Path(__file__).name, recording_id="episode_0")

rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0")

# load dataset from huggingface
dataset_path = os.path.dirname(__file__) + "/.datasets/v30_apple_storage"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataset_path = os.path.dirname(__file__) + "/.datasets/v30_apple_storage"
dataset_path = Path(__file__).parent / ".datasets/v30_apple_storage"

# NOTE: This dataloader works by creating a new recording for each episode.
# So that means we need to log the README to each recording.
for i in range(3):
rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id=f"episode_{i}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id=f"episode_{i}")
rec = rr.script_setup(args, Path(__file__).name, recording_id=f"episode_{i}")

@oxkitsune oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from c787f09 to 28e096b Compare December 12, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat-dataloader Everything related to data loaders include in changelog 📺 re_viewer affects re_viewer itself

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't load lerobot v3.0 datasets

3 participants