ReflectVLM

Official implementation of "Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation".

Paper | Website | Video | Hugging Face

Installation

Clone this repository

git clone git@github.com:yunhaif/reflect-vlm.git
cd reflect-vlm

Install packages

conda create -n reflectvlm python=3.9 -y
conda activate reflectvlm
pip install -e .

(Optional) Install additional packages if you want to train VLM policies.

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Simulation environment

Interacting with the environment

We provide a simple script to play with the simulation environment.

python scripts/interact.py

This will generate a task in MuJoCo with interactive visualization. You can interact with the environment by typing the actions. Just launch the script and follow the instructions — it works on Mac too!

Generating as many tasks as you want

The task generated by our procedural task generator is controlled by a seed. You can generate as many tasks as you want by simply changing the environment seed!

python scripts/interact.py --env_seed 1000001

Policy evaluation

Evaluating our checkpoints

Models are available on Hugging Face, including:

ReflectVLM-llava-v1.5-13b-base: a base VLM policy trained on a fixed expert dataset.
ReflectVLM-llava-v1.5-13b-post-trained: the VLM policy trained with our post-training strategy with reflection mechanism.
ReflectVLM-diffusion: the diffusion dynamics model.

We provide scripts to run evaluation on the 100 procedurally-generated test tasks. Models will be automatically downloaded from Hugging Face.

To evaluate the base policy:

bash scripts/eval_base_vlm.sh

To evaluate our post-trained policy with reflection:

bash scripts/eval_reflect_vlm.sh {sim|diffusion}

Choose either sim or diffusion as the dynamics model used in the reflection mechanism.

Building your own agent

You can add your own agent under the agent folder. Create a new class and implement the act() method to process observation and get action.

class MyAgent:
    def __init__(self, ...):
        ... # initialize model etc.

    def act(self, img, goal_img, inp):
        """
        Args:
            img: the current image
            goal_img: the goal image
            inp: the input prompt
        Returns:
            action: a string of action
        """
        action = ...    # get action from model
        return action

Policy training

Coming soon...

Diffusion model

The script scripts/diffusion_demo.py can be used to test diffusion generation:

python scripts/diffusion_demo.py

We provide some sample images under assets/images/diffusion_examples.

Citation

If you find our work useful in your research, please consider citing with the following BibTeX:

@misc{feng2025reflective,
  title={Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation}, 
  author={Yunhai Feng and Jiaming Han and Zhuoran Yang and Xiangyu Yue and Sergey Levine and Jianlan Luo},
  year={2025},
  eprint={2502.16707},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2502.16707}, 
}

License & Acknowledgements

This repository is licensed under the MIT license. LLaVA is licensed under the Apache 2.0 license.
Part of the simulation environment is adapted from Metaworld and mjctrl.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets/images		assets/images
llava		llava
roboworld		roboworld
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReflectVLM

Contents

Installation

Simulation environment

Interacting with the environment

Generating as many tasks as you want

Policy evaluation

Evaluating our checkpoints

Building your own agent

Policy training

Diffusion model

Citation

License & Acknowledgements

About

Releases

Packages

Languages

License

yunhaif/reflect-vlm

Folders and files

Latest commit

History

Repository files navigation

ReflectVLM

Contents

Installation

Simulation environment

Interacting with the environment

Generating as many tasks as you want

Policy evaluation

Evaluating our checkpoints

Building your own agent

Policy training

Diffusion model

Citation

License & Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages