Skip to content

LeRobotDataset v2.1 #711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 25, 2025
Merged

LeRobotDataset v2.1 #711

merged 20 commits into from
Feb 25, 2025

Conversation

aliberts
Copy link
Collaborator

@aliberts aliberts commented Feb 10, 2025

What this does

This PR introduces aims to improve the usability of LeRobotDataset. We increase CODEBASE_VERSION from v2.0 to v2.1 as changes are backward compatible with v2.0.

What do I need to do?

Simply run this script on your dataset to update the stats

python lerobot/common/datasets/v21/convert_dataset_v20_to_v21.py \
    --repo-id=repo/id

This will:

  • Generate per-episodes stats and writes them in episodes_stats.jsonl
  • Check consistency between these new stats and the old ones.
  • Remove the deprecated stats.json.
  • Update codebase_version in info.json.
  • Push this new version to the hub on the main branch and tags it with v2.1.

Changes

  • Replaces global stats.json with per-episode stats episodes_stats.jsonl. Episodes stats are then aggregated over selected episodes at initialization of the dataset. Stats computation speed is greatly improved thanks to subsampling of images. Per-episode stats #521
dataset_root/
  ├── data
  ├── meta
  │   ├── episodes.jsonl
+ │   ├── episodes_stats.jsonl
  │   ├── info.json
- │   ├── stats.json
  │   └── tasks.jsonl
  └── videos

TODOs in later PRs

  • Use standard hf_dataset.set_format("torch") instead of custom hf_dataset.set_transform(hf_transform_to_torch)
  • Multi dataset, features mapping
  • Update visualization for multi-task episodes

How it was tested

  • Improves test_datasets
  • Adds test_compute_stats

@aliberts aliberts marked this pull request as ready for review February 19, 2025 15:03
@aliberts aliberts added enhancement Suggestions for new features or improvements dataset Issues regarding data inputs, processing, or datasets labels Feb 20, 2025
@Cadene Cadene self-requested a review February 25, 2025 09:47
Copy link
Collaborator

@Cadene Cadene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! Thanks

Comment on lines -381 to -382
Note: If you didn't push your dataset yet, add `--control.local_files_only=true`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch ;)

@aliberts aliberts merged commit 3354d91 into main Feb 25, 2025
5 checks passed
@aliberts aliberts deleted the user/aliberts/2025_02_10_dataset_v2.1 branch February 25, 2025 14:27
JIy3AHKO pushed a commit to vertix/lerobot that referenced this pull request Feb 27, 2025
Co-authored-by: Remi <remi.cadene@huggingface.co>
Co-authored-by: Remi Cadene <re.cadene@gmail.com>
@johnMinelli
Copy link

I suppose that the following script is not anymore the way to publish dataset+stats ?

stats = compute_stats(lerobot_dataset, batch_size, num_workers)

@imstevenpmwork
Copy link
Collaborator

@johnMinelli Please refer to: #881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Issues regarding data inputs, processing, or datasets enhancement Suggestions for new features or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants