Implement Real-Time Action Chunking (RTC) for SmolVLA #1521

ben-z · 2025-07-17T06:34:50Z

What this does

This PR implements Real-Time Action Chunking (RTC) for SmolVLA. This greatly improves the smoothness of async inference compared to the existing methods.

This PR builds on top of #1514 (which builds on #1486, which builds on #1480), which is experimental itself. But I believe we will figure out a way to polish these PRs one-by-one. Please review those PRs first!

New SmolVLA parameters:

inference_enable_rtc: Whether to enable RTC
inference_rtc_d: Inference delay in ticks. If unset (-1), automatically determine based on the round trip inference delay. This determines the number of steps to determine the hard mask.
inference_rtc_soft_mask_length: The length of the soft mask (blend new chunk with old chunk) in ticks. If unset (-1), automatically determine based on the completion progress of the previous chunk. This parameter is not configurable in the paper (defaults to be automatically determined), but I observed that decreasing the soft mask length to a small proportion of the chunk size (e.g. set to 15 for chunk size 50) improves the responsiveness without compromising on the smoothness.
inference_rtc_beta: maximum guidance weight. See paper for details. I just set it to 5, which is what that the paper authors found to be good.
inference_rtc_debug: Add debug printing, at the cost of slower denoising process.

TODOs:

Finalize the async inference interface to support policy parameters (set once at startup) and runtime parameters (can change for each inference, e.g. min/mean/max roundtrip inference latency. Perhaps the client can have some logic to send stats like roundtrip latency to the server. Or the server could infer it using timestamps).
Currently, this PR probably breaks the async inference pipeline of other models (e.g. ACT, pi0) due to the need to pass in additional runtime parameters for RTC. This can be fixed with a small refactor.
There are many layers of function calls between the client and the function that executes the policy. This was fine when we didn't have any runtime parameters. Now that we do, we may want to refactor the policy server or the grpc interface to better support this use case.
For some reason, the dt parameter in the SmolVLA model is negative, and the denoising function is written in a slightly different way than the RTC paper. Will need to revisit the SmolVLA paper to see which convention we should adopt.
Add demo videos
Add tests

How it was tested

Tested the core denoising function manually to make sure the implementation behaves correctly (e.g. hard/soft masking behave as expected). This was done during development, just by running the SmolVLA model.
Tested the async functionality with and without RTC. There's a pretty clear difference between the two.

https://x.com/un1c0rnioz/status/1946128982262579460

RTC.vs.forward.mixing.mov

The implementation has 1 additional configurable parameter, inference_rtc_soft_mask_length, on top of the paper, to make soft masking horizon configurable and alleviate this issue:

RTC.end_s.75.vs.end_s.s.stuck.mov

How to checkout & try? (for the reviewer)

This PR is under development, please read the code first! The instructions for running async inference mostly apply, but this PR contains some interface changes to allow policy configuration passthrough. Here's the command I'm testing with, for reference:

HF_USER=$(huggingface-cli whoami | head -n 1)
echo "Hugging Face user: $HF_USER"
python lerobot/scripts/server/robot_client.py  \
  --server_address=127.0.0.1:18080 \
  --robot.type=so101_follower \
  --robot.port=$F1_PORT \
  --robot.cameras="${CAMERA_CONFIG}" \
  --robot.id=f1 \
  --policy.path=${HF_USER}/smolvla_so101_die_mat4_b64_lr5e-4_cs100_nas100_robo_200000 \
  --task="Grasp the die and put it on the mat." \
  --policy.device=cuda \
  --policy.compile_model=true \
  --actions_per_chunk=100 \
  --chunk_size_threshold=1.0 \
  --aggregate_fn_name=latest_only \
  --debug_visualize_queue_size=true \
  --policy.inference_enable_rtc=true \
  --policy.inference_rtc_d=15 \
  --policy.inference_rtc_soft_mask_length=15

…er/helper2424/updated_merge_proto

helper2424 · 2025-07-17T06:36:50Z

Niice - RTC, is cool thing

…er/helper2424/updated_merge_proto

for more information, see https://pre-commit.ci

Co-authored-by: Ben Zhang <ben.zhang@uwaterloo.ca>

It's easy to get stuck in an infinite update loop, since the check for whether we have hit the chunk threshold is at the end of the control loop and the queue merge logic is at the beginning of the control loop. One way to fix this is to move the queue update logic and the observation gathering step to the policy client process

…speed up inference

… chunk size For some reason, 1MiB chunks send faster than 2MiB chunks when sending ~2.5MiB of data.

Traceback (most recent call last): File /Users/ben/Projects/lerobot/src/lerobot/scripts/server/robot_client.py, line 80, in <module> from lerobot.transport.utils import grpc_channel_options, send_bytes_in_chunks File /Users/ben/Projects/lerobot/src/lerobot/transport/utils.py, line 75, in <module> def receive_bytes_in_chunks(iterator, queue: Queue | None, shutdown_event: Event, log_prefix: str = ): # ruff: noqa TypeError: unsupported operand type(s) for |: 'method' and 'NoneType' https://www.reddit.com/r/learnpython/comments/1jaar9e/comment/mhltq2t/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

…lect_action interface

for more information, see https://pre-commit.ci

ben-z · 2025-07-20T04:25:22Z

Rebased this PR onto #1480. Once that’s merged, the diff here should shrink. I made some changes to the policy server to support RTC; while we could split those out, some are interleaved with RTC updates in the same commits, so keeping them here might be simpler. If we go with this approach, we can close #1486 and #1514.

for more information, see https://pre-commit.ci

atharva-18 · 2025-08-07T20:31:52Z

Hey, do we have any plans to implement RTC for diffusion as well?

helper2424 · 2025-08-08T04:53:54Z

@atharva-18 Hey, I am doing another PR with RTC for pi0 & Smovla - here #1698.

@ben-z made great job - I want to use this PR and introduce RTC before changing in the network architecture, because it erqures extending Proto interfaces in advance. (Of course will add him as co-author everywhere).

Regarding the original question - RTC originally works only with flow matching. I am not 100% familiar with diffusions on the low level, but I am not sure that it will be straitforward impementation. Probably it will required another technique to interpolate chunks between each other.

helper2424 and others added 9 commits July 10, 2025 17:50

Merge together proto files and refactor Async inference

416a8b8

Fixup for Async inference

09c7f34

Drop not reuqired changes

e50a2fc

Fix tests

10c688f

Drop old async files

e060896

Merge branch 'main' into user/helper2424/updated_merge_proto

6b6727f

Drop chunk_size param

58a82d3

Merge branch 'main' of https://github.com/huggingface/lerobot into us…

2c8c73b

…er/helper2424/updated_merge_proto

Merge branch 'main' of https://github.com/huggingface/lerobot into us…

ef232a6

…er/helper2424/updated_merge_proto

pkooij added the policies Items related to robot policies label Jul 17, 2025

helper2424 and others added 19 commits July 18, 2025 15:12

Merge branch 'main' of https://github.com/huggingface/lerobot into us…

baa0fcd

…er/helper2424/updated_merge_proto

Fix versions

60ea278

[pre-commit.ci] auto fixes from pre-commit.com hooks

77158e6

for more information, see https://pre-commit.ci

Fix wrong fix

53ba25b

Co-authored-by: Ben Zhang <ben.zhang@uwaterloo.ca>

Refactor Async architecture

c87951a

Update client architecture

7639101

Fix sleeping logic

a10dcb0

Debug robot client

e0a6fb2

Get observations in the policy client thread

2bf3252

Make the client logging more consistent

b0f2082

Prepare for RTC implementation

b43ccd7

Implement RTC denoising with static d and s

5b95e6b

Remove debugging statements and move denoising to no_grad context to …

acfa524

…speed up inference

Minor comment fix

a65575c

Hard-code RTC parameters for testing

c50bfb0

Decrease logging in robot_client

f892e6f

Enable policy server to populate rtc_s and rtc_d

b77fee0

Optimize async client latency by using compression and tuning message…

f4acc31

… chunk size For some reason, 1MiB chunks send faster than 2MiB chunks when sending ~2.5MiB of data.

ben-z and others added 6 commits July 19, 2025 20:44

Use configurable soft mask length

a79a96f

Update log to show softmask length

f727ebb

Use soft_mask instead of softmask in logging

ec78758

Log soft mask length properly

49a05c9

Refactor smolvla to use the notation in the paper (t,d,s)

9a055f1

Fix bad merge

0b378e4

ben-z force-pushed the benz/rtc branch from 3764890 to b2d30ac Compare July 20, 2025 03:58

ben-z and others added 2 commits July 20, 2025 04:11

Make all policies compatible with the new predict_action_chunk and se…

dca5ecd

…lect_action interface

ben-z force-pushed the benz/rtc branch from af4b276 to dca5ecd Compare July 20, 2025 04:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

524d52b

for more information, see https://pre-commit.ci

ben-z and others added 10 commits July 19, 2025 21:27

Add docs for filter_args_recursive

5fd21bc

Rename s_end to s

3497da2

Fix variable naming conflict

2b930fd

Fix softmask logic and use policy parameters properly

5fa967b

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1f66e5

for more information, see https://pre-commit.ci

Fix log prefix for policy server receive_bytes_in_chunks

5b0ec8a

[pre-commit.ci] auto fixes from pre-commit.com hooks

c81a65e

for more information, see https://pre-commit.ci

Add back warning for A_tau_d_err too high

c06a46d

Add inference_rtc_debug flag for debug printing

ed5bd24

Address ruff errors

1ca8fbe

ben-z force-pushed the benz/rtc branch from 4c458d9 to 1ca8fbe Compare July 20, 2025 07:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

ddbcc49

for more information, see https://pre-commit.ci

ben-z mentioned this pull request Jul 20, 2025

[Async Inference] Merge Protos & refactoring #1480

Merged

Rename logged variable to reduce confusion

cf28ede

ben-z mentioned this pull request Jul 22, 2025

Simplify async inference architecture #1514

Closed

pkooij assigned ben-z Aug 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Real-Time Action Chunking (RTC) for SmolVLA #1521

Implement Real-Time Action Chunking (RTC) for SmolVLA #1521

ben-z commented Jul 17, 2025 •

edited

Loading

Uh oh!

helper2424 commented Jul 17, 2025

Uh oh!

ben-z commented Jul 20, 2025

Uh oh!

atharva-18 commented Aug 7, 2025

Uh oh!

helper2424 commented Aug 8, 2025

Uh oh!

Uh oh!

Implement Real-Time Action Chunking (RTC) for SmolVLA #1521

Are you sure you want to change the base?

Implement Real-Time Action Chunking (RTC) for SmolVLA #1521

Conversation

ben-z commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

How it was tested

How to checkout & try? (for the reviewer)

Uh oh!

helper2424 commented Jul 17, 2025

Uh oh!

ben-z commented Jul 20, 2025

Uh oh!

atharva-18 commented Aug 7, 2025

Uh oh!

helper2424 commented Aug 8, 2025

Uh oh!

Uh oh!

ben-z commented Jul 17, 2025 •

edited

Loading