LeRobot v3 dataloader #12071

oxkitsune · 2025-12-03T11:12:23Z

What

This PR is on top of Refactor LeRobot v2 dataloader #12066 *

This includes support for LeRobot datasets v3.0

github-actions · 2025-12-03T11:12:46Z

Web viewer built successfully.

Result	Commit	Link	Manifest
✅	`17ec891`	https://rerun.io/viewer/pr/12071	`+nightly` `+main`

View image diff on kitdiff.

^{Note: This comment is updated whenever you push a commit.}

Copilot

Pull request overview

This PR adds support for LeRobot v3 datasets to the Rerun dataloader. LeRobot v3 introduces changes to the dataset format, including the use of Parquet files for episode metadata and tasks (instead of JSONL files in v2), and feature-specific chunk/file indices for videos that allow multiple episodes to share video files more efficiently. The implementation follows the established patterns from the v2 dataloader.

Key Changes

Implements v3 dataset loading with episode data caching for improved performance
Adds video timestamp-based filtering to extract precise video segments per episode
Includes feature-specific file metadata tracking for more flexible video/image organization

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tests/python/release_checklist/check_lerobot_v3_dataloader.py`	New test file for validating v3 dataset loading, consistent with v2 test structure
`crates/store/re_data_loader/src/loader_lerobot.rs`	Updated to support v3 datasets by removing the "unsupported" error and adding v3 loading function
`crates/store/re_data_loader/src/lerobot/mod.rs`	Updated module documentation to reflect v2 and v3 support
`crates/store/re_data_loader/src/lerobot/datasetv3.rs`	Complete v3 implementation with episode caching, video filtering, and Parquet-based metadata loading
`crates/store/re_data_loader/Cargo.toml`	Adds `re_video` dependency required for video processing
`Cargo.lock`	Updated lock file with new dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-08T13:27:39Z

crates/store/re_data_loader/src/lerobot/datasetv3.rs

+            .sample_data_in_stream_format(&chunk)
+            .with_context(|| {
+                format!(
+                    "Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format"
+                )
+            })?;


Inconsistent indentation: the method call chain should be properly indented. The .sample_data_in_stream_format(&chunk) call on line 505 should align with the start of the chain, and video on line 504 should be indented to match standard Rust formatting conventions.

Suggested change

.sample_data_in_stream_format(&chunk)

.with_context(|| {

format!(

"Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format"

)

})?;

.sample_data_in_stream_format(&chunk)

.with_context(|| {

format!(

"Failed to convert sample {sample_idx} for feature '{observation}' to the expected codec stream format"

)

})?;

crates/store/re_data_loader/src/lerobot/datasetv3.rs

MichaelGrupp

Let's use pathlib instead of os.path

MichaelGrupp · 2025-12-11T20:25:39Z

tests/python/release_checklist/check_lerobot_v3_dataloader.py

+import os
+from argparse import Namespace


Suggested change

import os

from argparse import Namespace

from argparse import Namespace

from pathlib import Path

MichaelGrupp · 2025-12-11T20:31:45Z

tests/python/release_checklist/check_lerobot_v3_dataloader.py

+    # That means the `recording_id` needs to be set to "episode_0", otherwise the LeRobot dataloader
+    # will create a new recording for episode 0, instead of merging it into the existing recording.
+    # If you don't set it, you'll end up with 4 recordings, an empty one and the 3 episodes.
+    rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0")


This f string interpolation looks unnecessary. Nicer modern way with pathlib is also:

Suggested change

rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0")

rec = rr.script_setup(args, Path(__file__).name, recording_id="episode_0")

MichaelGrupp · 2025-12-11T20:34:28Z

tests/python/release_checklist/check_lerobot_v3_dataloader.py

+    rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0")
+
+    # load dataset from huggingface
+    dataset_path = os.path.dirname(__file__) + "/.datasets/v30_apple_storage"


Suggested change

dataset_path = os.path.dirname(__file__) + "/.datasets/v30_apple_storage"

dataset_path = Path(__file__).parent / ".datasets/v30_apple_storage"

MichaelGrupp · 2025-12-11T20:36:48Z

tests/python/release_checklist/check_lerobot_v3_dataloader.py

+    # NOTE: This dataloader works by creating a new recording for each episode.
+    # So that means we need to log the README to each recording.
+    for i in range(3):
+        rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id=f"episode_{i}")


Suggested change

rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id=f"episode_{i}")

rec = rr.script_setup(args, Path(__file__).name, recording_id=f"episode_{i}")

oxkitsune added 📺 re_viewer affects re_viewer itself include in changelog labels Dec 3, 2025

oxkitsune mentioned this pull request Dec 3, 2025

Support LeRobotDataset v3.0 #11931

Closed

3 tasks

oxkitsune added do-not-merge Do not merge this PR feat-dataloader Everything related to data loaders labels Dec 3, 2025

oxkitsune force-pushed the gijs/lerobot-datasetv2-refactor branch from 565efa4 to 424b394 Compare December 4, 2025 15:22

oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from e66b915 to 700430e Compare December 4, 2025 15:57

Base automatically changed from gijs/lerobot-datasetv2-refactor to main December 4, 2025 16:55

oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from 700430e to 0fa3f7b Compare December 4, 2025 17:04

oxkitsune removed the do-not-merge Do not merge this PR label Dec 5, 2025

oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from 0fa3f7b to c7a0085 Compare December 5, 2025 11:15

oxkitsune requested a review from Copilot December 8, 2025 13:21

Copilot started reviewing on behalf of oxkitsune December 8, 2025 13:22 View session

Copilot AI reviewed Dec 8, 2025

View reviewed changes

MichaelGrupp requested changes Dec 11, 2025

View reviewed changes

oxkitsune added 8 commits December 12, 2025 09:51

LeRobot v3 dataloader

b2d392e

no more extra file loading

f6e8890

remove println

024a79c

prevent frame from next video

bf8ecf1

clean up episode data loading

477152d

code org

de1a19b

nits

c38178e

use pathlib api

28e096b

oxkitsune force-pushed the gijs/lerobot-datasetv3.0 branch from c787f09 to 28e096b Compare December 12, 2025 08:56

re_sdk_types

17ec891

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LeRobot v3 dataloader #12071

LeRobot v3 dataloader #12071

oxkitsune commented Dec 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

MichaelGrupp left a comment

Uh oh!

MichaelGrupp Dec 11, 2025

Uh oh!

MichaelGrupp Dec 11, 2025

Uh oh!

MichaelGrupp Dec 11, 2025

Uh oh!

MichaelGrupp Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	rec = rr.script_setup(args, f"{os.path.basename(__file__)}", recording_id="episode_0")
	rec = rr.script_setup(args, Path(__file__).name, recording_id="episode_0")

	dataset_path = os.path.dirname(__file__) + "/.datasets/v30_apple_storage"
	dataset_path = Path(__file__).parent / ".datasets/v30_apple_storage"

LeRobot v3 dataloader #12071

Are you sure you want to change the base?

LeRobot v3 dataloader #12071

Conversation

oxkitsune commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related

What

Uh oh!

github-actions bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MichaelGrupp left a comment

Choose a reason for hiding this comment

Uh oh!

MichaelGrupp Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelGrupp Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelGrupp Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

MichaelGrupp Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oxkitsune commented Dec 3, 2025 •

edited

Loading

github-actions bot commented Dec 3, 2025 •

edited

Loading