Speedup `save_episode()` by optimizing video encoding | (⚡️ Performance) | #2350

lokiledev · 2025-10-31T13:23:31Z

What this does

Combine several optimizations regarding image writing and video encoding in order to reduce the time it takes to save an episode.

Directly call ffmpeg command instead of using pyav wrapper.
This is the most impactful one, current implementation loads each image sequentially, which is an io bottleneck, then submits to pyav. The fix directly calls the ffmpeg command, delegating the image loading to ffmpeg which efficiently handles IO and encoding.
Set image compression level to 0 (instead of 1)
This has already been improved by refactor(datasets): add compress_level parameter to write_image() and set it to 1 #2135.
Since ssds are really fast now it's faster to dump uncompressed images on disk than to waste CPU time to compress them. I measured a smaller time to save an episode using this. The tradeoff is using more temporary storage while recording an episode.
Use the fastest encoder preset and enable maximum level of parallelism of av1 params

Use preset=13 the tradeoff is
- Use less cpu to encode
- Decoding is faster, great for dataloader: 11.26s -> 8.42s
- file size is a little bit bigger: 9.93 mb -> 11.30mb
- compression quality is bit lower: PSNR 40.40 -> 39.83
Enable maximum level of parallelism :lp=6.
This uses potentially more ram but exploits all available cpu cores during encoding.

How it was tested

Wrote an external benchmark that saves dummy action/observation but uses real source images and measures:

duration of dataset.save_episode()
image compression quality (psnr & ssim)
dataloader time (random access and sequential)
dataset size

I didn't run the existing video benchmarks with this PR, if someone can double check that would be great.

I also wrote this to optimize offline processing, so it uses as much cpu core as possible during encoding.
It might impact real time recording.
However the current design already delays the encoding to after the end of the episode, so I consider it is acceptable to use all available resources there to speedup the encoding.

It is more efficient at doing IO and exploiting all cpu cores. benchmark results: step_time: 0.73s save_time: 18.58s psnr: 40.51 ssim: 0.98 dataset_size_mb: 10.82 random_access_time: 12.64s sequential_access_time: 10.22s,

Images are stored in a temporary folder, and ssds are now really fast so it's ok to tradeoff image size for cpu. Benchmarks results: step_time: 0.62s save_time: 13.31s psnr: 40.51 ssim: 0.98 dataset_size_mb: 10.82 random_access_time: 10.53s sequential_access_time: 7.92s

It makes the file a little bit bigger, but the encoding is a bit faster and also the dataloading is actually improved, and the quality is still good. Benchmark resutl: step_time: 0.75s save_time: 11.39s psnr: 39.83 ssim: 0.98 dataset_size_mb: 11.30 random_access_time: 11.91s sequential_access_time: 8.82s

Copilot

Pull Request Overview

This PR refactors video encoding to use direct ffmpeg subprocess calls instead of PyAV bindings, removes an unused PIL import, updates logging statements to use %s formatting, and changes the default PNG compression level from 1 to 0 for faster image writing.

Replaces PyAV-based video encoding with subprocess-based ffmpeg commands
Removes unused PIL import and updates logging to use % formatting
Changes default PNG compression level from 1 to 0

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/lerobot/datasets/video_utils.py	Refactored `encode_video_frames` to use subprocess ffmpeg commands instead of PyAV, removed unused PIL import, and updated logging statements to use % formatting
src/lerobot/datasets/image_writer.py	Changed default `compress_level` parameter from 1 to 0 in `write_image` function

Comments suppressed due to low confidence (1)

src/lerobot/datasets/image_writer.py:83

Documentation is outdated. The docstring states the default value is 1, but the parameter default has been changed to 0 on line 71. Update the docstring to reflect 'Defaults to 0.'

            image, as used by PIL.Image.save(). Defaults to 1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/lerobot/datasets/video_utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Loik Le Devehat <lokiledev@gmail.com>

lokiledev · 2025-10-31T13:39:59Z

Note: I tested the fast-decode option but didn't find any speed-up in dataloader (both random access and sequential).
It could be removed from the api.

lokiledev added 3 commits October 31, 2025 13:55

Directly call ffmpeg command instead of using pyav

1c9960e

It is more efficient at doing IO and exploiting all cpu cores. benchmark results: step_time: 0.73s save_time: 18.58s psnr: 40.51 ssim: 0.98 dataset_size_mb: 10.82 random_access_time: 12.64s sequential_access_time: 10.22s,

Copilot AI review requested due to automatic review settings October 31, 2025 13:23

Copilot AI reviewed Oct 31, 2025

View reviewed changes

src/lerobot/datasets/video_utils.py Outdated Show resolved Hide resolved

lokiledev and others added 2 commits October 31, 2025 14:27

Update src/lerobot/datasets/video_utils.py

b882183

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Loik Le Devehat <lokiledev@gmail.com>

cleanup logging and imports

53c1331

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speedup `save_episode()` by optimizing video encoding | (⚡️ Performance) | #2350

Speedup `save_episode()` by optimizing video encoding | (⚡️ Performance) | #2350

lokiledev commented Oct 31, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

lokiledev commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Speedup save_episode() by optimizing video encoding | (⚡️ Performance) | #2350

Are you sure you want to change the base?

Speedup save_episode() by optimizing video encoding | (⚡️ Performance) | #2350

Conversation

lokiledev commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

How it was tested

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

lokiledev commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Speedup `save_episode()` by optimizing video encoding | (⚡️ Performance) | #2350

Speedup `save_episode()` by optimizing video encoding | (⚡️ Performance) | #2350

lokiledev commented Oct 31, 2025 •

edited

Loading