Skip to content

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

License

Notifications You must be signed in to change notification settings

yunhaif/reflect-vlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReflectVLM

Official implementation of "Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation".

Paper | Website | Video | Hugging Face

Contents

Installation

  1. Clone this repository
git clone git@github.com:yunhaif/reflect-vlm.git
cd reflect-vlm
  1. Install packages
conda create -n reflectvlm python=3.9 -y
conda activate reflectvlm
pip install -e .
  1. (Optional) Install additional packages if you want to train VLM policies.
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Simulation environment

Interacting with the environment

We provide a simple script to play with the simulation environment.

python scripts/interact.py

This will generate a task in MuJoCo with interactive visualization. You can interact with the environment by typing the actions. Just launch the script and follow the instructions — it works on Mac too!

Generating as many tasks as you want

The task generated by our procedural task generator is controlled by a seed. You can generate as many tasks as you want by simply changing the environment seed!

python scripts/interact.py --env_seed 1000001

Policy evaluation

Evaluating our checkpoints

Models are available on Hugging Face, including:

We provide scripts to run evaluation on the 100 procedurally-generated test tasks. Models will be automatically downloaded from Hugging Face.

To evaluate the base policy:

bash scripts/eval_base_vlm.sh

To evaluate our post-trained policy with reflection:

bash scripts/eval_reflect_vlm.sh {sim|diffusion}

Choose either sim or diffusion as the dynamics model used in the reflection mechanism.

Building your own agent

You can add your own agent under the agent folder. Create a new class and implement the act() method to process observation and get action.

class MyAgent:
    def __init__(self, ...):
        ... # initialize model etc.

    def act(self, img, goal_img, inp):
        """
        Args:
            img: the current image
            goal_img: the goal image
            inp: the input prompt
        Returns:
            action: a string of action
        """
        action = ...    # get action from model
        return action

Policy training

Coming soon...

Diffusion model

The script scripts/diffusion_demo.py can be used to test diffusion generation:

python scripts/diffusion_demo.py 

We provide some sample images under assets/images/diffusion_examples.

Citation

If you find our work useful in your research, please consider citing with the following BibTeX:

@misc{feng2025reflective,
  title={Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation}, 
  author={Yunhai Feng and Jiaming Han and Zhuoran Yang and Xiangyu Yue and Sergey Levine and Jianlan Luo},
  year={2025},
  eprint={2502.16707},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2502.16707}, 
}

License & Acknowledgements

This repository is licensed under the MIT license. LLaVA is licensed under the Apache 2.0 license.
Part of the simulation environment is adapted from Metaworld and mjctrl.

About

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published