Skip to content

Latest commit

 

History

History
158 lines (105 loc) · 5.34 KB

README.md

File metadata and controls

158 lines (105 loc) · 5.34 KB

🍀 CLOVER

The official implementation of our NeurIPS 2024 paper:
Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma and Hongyang Li

📜 Preprint:

📬 If you have any questions, please feel free to contact: Qingwen Bu ( qwbu01@sjtu.edu.cn )

Full code and checkpoints release is coming soon. Please stay tuned.🦾

🔥 Highlight

  • 🍀 ​CLOVER employs a text-conditioned video diffusion model for generating visual plans as reference inputs, then these sub-goals guide the feedback-driven policy to generate actions with an error measurement strategy.

  • Owing to the closed-loop attribute, ​CLOVER is robust to visual distraction and object variation:

  • This closed-loop mechanism enables achieving the desired states accurately and reliably, thereby facilitating the execution of long-term tasks:

cook-fish.mp4

📢 News

  • [2024/09/16] We released our paper on arXiv.

📌 TODO list

  • Training script for visual planner
  • Checkpoints release (Scheduled Release Date: Mid-October, 2024)
  • Evaluation codes on CALVIN (Scheduled Release Date: Mid-October, 2024)
  • Policy training codes on CALVIN (Estimated Release Period: November, 2024)

🎮 Getting started

Our training are conducted with PyTorch 1.13.1, CUDA 11.7, Ubuntu 22.04, and NVIDIA Tesla A100 (80 GB). The closed-loop evaluation on CALVIN is run on a system with NVIDIA RTX 3090.

We did further testing with PyTorch 2.2.0 + CUDA 11.8, and the training also goes fine.

  1. (Optional) We use conda to manage the environment.
conda create -n clover python=3.8
conda activate clover
  1. Install dependencies.
cd visual_planner
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install git+https://github.com/hassony2/torch_videovision
pip install -e .
  1. Installation of CALVIN simulator.
git clone --recurse-submodules https://github.com/mees/calvin.git
export CALVIN_ROOT=$(pwd)/calvin
cd $CALVIN_ROOT
sh install.sh

💿 Checkpoints

We release model weights of our Visual Planner and Feedback-driven Policy at HuggingFace.

Training: Visual Planner

  • Requirement

    The visual planner requires 24 GB GPU VRAM with a batch size of 4 (per GPU), video length of 8 and image size of 128.

  • Preparation

    • We use OpenAI-CLIP to encode task instructions for conditioning.
  • Initiate training of the visual planner (video diffusion model) on CALVIN

    Please modify accelerate_cfg.yaml first according to your setup.

accelerate launch --config_file accelerate_cfg.yaml train.py \
    --learning_rate 1e-4 \
    --train_num_steps 300000 \
    --save_and_sample_every 10000 \
    --train_batch_size 32 \
    --sample_per_seq 8 \
    --sampling_step 5 \
    --with_text_conditioning \
    --diffusion_steps 100 \
    --sample_steps 10 \
    --with_depth \
    --flow_reg \
    --results_folder *path_to_save_your_ckpts*

Evaluation

  • Preparation

    1. Set your CALVIN and checkpoint path at FeedbackPolicy/eval_calvin.sh
    2. We train our policy with input size of 192*192, please modify the config file correspondingly in VC-1 Config with img_size: 192 and use_cls: False.
  • Initiate evaluation on CALVIN simply with

cd ./FeedbackPolicy
bash eval_calvin.sh

📝 Citation

If you find the project helpful for your research, please consider citing our paper:

@article{bu2024clover,
  title={Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation},
  author={Bu, Qingwen and Zeng, Jia and Chen, Li and Yang, Yanchao and Zhou, Guyue and Yan, Junchi and Luo, Ping and Cui, Heming and Ma, Yi and Li, Hongyang},
  journal={arXiv preprint arXiv:2409.09016},
  year={2024}
}

Acknowledgements

We thank AVDC and RoboFlamingo for their open-sourced work!