-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Implement Real-Time Action Chunking (RTC) for SmolVLA #1521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…er/helper2424/updated_merge_proto
…er/helper2424/updated_merge_proto
Niice - RTC, is cool thing |
…er/helper2424/updated_merge_proto
for more information, see https://pre-commit.ci
Co-authored-by: Ben Zhang <ben.zhang@uwaterloo.ca>
It's easy to get stuck in an infinite update loop, since the check for whether we have hit the chunk threshold is at the end of the control loop and the queue merge logic is at the beginning of the control loop. One way to fix this is to move the queue update logic and the observation gathering step to the policy client process
…speed up inference
… chunk size For some reason, 1MiB chunks send faster than 2MiB chunks when sending ~2.5MiB of data.
Traceback (most recent call last): File /Users/ben/Projects/lerobot/src/lerobot/scripts/server/robot_client.py, line 80, in <module> from lerobot.transport.utils import grpc_channel_options, send_bytes_in_chunks File /Users/ben/Projects/lerobot/src/lerobot/transport/utils.py, line 75, in <module> def receive_bytes_in_chunks(iterator, queue: Queue | None, shutdown_event: Event, log_prefix: str = ): # ruff: noqa TypeError: unsupported operand type(s) for |: 'method' and 'NoneType' https://www.reddit.com/r/learnpython/comments/1jaar9e/comment/mhltq2t/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
…lect_action interface
for more information, see https://pre-commit.ci
Rebased this PR onto #1480. Once that’s merged, the diff here should shrink. I made some changes to the policy server to support RTC; while we could split those out, some are interleaved with RTC updates in the same commits, so keeping them here might be simpler. If we go with this approach, we can close #1486 and #1514. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Hey, do we have any plans to implement RTC for diffusion as well? |
@atharva-18 Hey, I am doing another PR with RTC for pi0 & Smovla - here #1698. @ben-z made great job - I want to use this PR and introduce RTC before changing in the network architecture, because it erqures extending Proto interfaces in advance. (Of course will add him as co-author everywhere). Regarding the original question - RTC originally works only with flow matching. I am not 100% familiar with diffusions on the low level, but I am not sure that it will be straitforward impementation. Probably it will required another technique to interpolate chunks between each other. |
What this does
This PR implements Real-Time Action Chunking (RTC) for SmolVLA. This greatly improves the smoothness of async inference compared to the existing methods.
This PR builds on top of #1514 (which builds on #1486, which builds on #1480), which is experimental itself. But I believe we will figure out a way to polish these PRs one-by-one. Please review those PRs first!
New SmolVLA parameters:
inference_enable_rtc
: Whether to enable RTCinference_rtc_d
: Inference delay in ticks. If unset (-1), automatically determine based on the round trip inference delay. This determines the number of steps to determine the hard mask.inference_rtc_soft_mask_length
: The length of the soft mask (blend new chunk with old chunk) in ticks. If unset (-1), automatically determine based on the completion progress of the previous chunk. This parameter is not configurable in the paper (defaults to be automatically determined), but I observed that decreasing the soft mask length to a small proportion of the chunk size (e.g. set to 15 for chunk size 50) improves the responsiveness without compromising on the smoothness.inference_rtc_beta
: maximum guidance weight. See paper for details. I just set it to 5, which is what that the paper authors found to be good.inference_rtc_debug
: Add debug printing, at the cost of slower denoising process.TODOs:
dt
parameter in the SmolVLA model is negative, and the denoising function is written in a slightly different way than the RTC paper. Will need to revisit the SmolVLA paper to see which convention we should adopt.How it was tested
https://x.com/un1c0rnioz/status/1946128982262579460
RTC.vs.forward.mixing.mov
The implementation has 1 additional configurable parameter,
inference_rtc_soft_mask_length
, on top of the paper, to make soft masking horizon configurable and alleviate this issue:RTC.end_s.75.vs.end_s.s.stuck.mov
How to checkout & try? (for the reviewer)
This PR is under development, please read the code first! The instructions for running async inference mostly apply, but this PR contains some interface changes to allow policy configuration passthrough. Here's the command I'm testing with, for reference: